What are the best models for code autocomplete (like cursor autocomplete)?
that's it. i decided to use my small GPU to host not a full coding assistant, but rather a good autocomplete, and invest the money i'd have spent on a huge GPU to pay for APIs.
but then which model to choose? i'm trying currently qwen 1.5B, heard some good things about startcoder 3B. what is your experience? are there really good autocomplete-specialized models out there? like many here, i'm looking for that cursor experience but in a cheaper way. I think the largest my GPU would be able to handle is something around 5B unquantized, maybe 14B with reasonable quantization.
also, are there benchmarks for this particular task? i've seen some benchmarks but haven't found their actual results.