From 8GB laptops to 192GB workstations — a hardware-tiered map of coding models. The wizard does this for you via llmfit; this page shows the math.
Pick a row that matches your hardware. Each tier lists models that perform well at that memory ceiling — assuming Q4–Q5 quantization and a typical 8k–16k context.
| Tier · Hardware | Recommended coding models |
|---|---|
| 24 GB M3 Pro (18GB) · RTX 4090 (24GB) |
Qwen2.5-Coder-7B — fast, efficient for most coding tasks DeepSeek-Coder-6.7B — strong code completion CodeLlama-13B — balanced performance Mistral-7B-Instruct — general purpose with coding ability |
| 32 GB M3 Max (36GB) · RTX 6000 Ada (48GB) |
Qwen2.5-Coder-14B — enhanced reasoning for refactoring DeepSeek-Coder-33B — advanced code generation CodeLlama-34B — large context window (16K) Phind-CodeLlama-34B — optimized for code search |
| 64 GB M2 Ultra (64GB) · H100 (80GB) |
Qwen2.5-Coder-32B — top-tier coding performance DeepSeek-Coder-V2-Lite — 16B with 128K context CodeLlama-70B — architecture design ceiling Mixtral-8x7B-Instruct — MoE, effective 47B |
| 128 GB+ Mac Studio (192GB) · H100 NVL (188GB) |
DeepSeek-Coder-V2-236B — state-of-the-art Qwen2.5-Coder-72B — multi-file reasoning Mixtral-8x22B — 141B params, expert routing CodeLlama-70B + long ctx — 100K+ tokens |
llmfit automates this. If you prefer to drive yourself, here's the procedure.
Determine the memory budget your model has to work with:
Estimate memory based on parameter count and quantization:
Balance quality vs memory:
Run a real session and watch for these signals:
ccl chat --model <name> --benchmarkWatch for the bad signs during a real workload:
If performance is poor, walk down one of these axes:
Running local? llmfit reads your hardware and picks a model that fits. Routing through 9router or OpenRouter? Skip this page and head straight to setup.