One alias. Claude Code or Codex on a local model. Skills, agents, MCP servers — all intact.
Developer time is expensive. Cloud limits are arbitrary. You deserve a local fallback that actually works.
You're deep in a refactor, context loaded, momentum built. Then boom. Rate limit hit. Everything's gone. Back to square one.
Consulting contracts. IP clauses. Air-gapped servers. You need local AI that's operationally viable — not a toy demo.
New tool. New configs. New muscle memory. You don't want to learn something new. You just wanted a backend swap.
Keep your tools. Swap the brain. Ship locally.
From zero to working local session in under 10 minutes. Auto-detects Ollama, LM Studio, and llama.cpp. Resumes on failure. No manual file surgery.
llmfit analyzes your RAM and GPU. It recommends the model that actually fits your machine — no more guessing or OOM crashes.
Your real ~/.claude and ~/.codex are never touched. All config stays isolated. Rollback in seconds — remove one folder.
After setup, type cc or cx. That's it. All your skills, agents, MCP servers, and statusline work exactly as before.
Code never leaves your machine. No telemetry. No phone-home. After model download, zero internet required. Perfect for air-gapped projects.
Claude Code + Ollama + gemma4:26b verified end-to-end. llama.cpp with GGUF models from Hugging Face also supported. Real working combos, not hypothetical support.
Run one curl command. The wizard auto-detects your installed runtimes, checks your hardware, and flags anything missing.
Answer a few prompts. Pick your runtime, pick your model (or let llmfit pick for you). The wizard wires everything up safely.
Type cc or cx. Your full Claude Code or Codex experience — locally. Every skill, every agent, every MCP server intact.
Available on PyPI — install with pip or uv, or use the one-line shell installer.
ollama launch support), LM Studio, and llama.cpp — all with auto-detection during setup. The wizard will find what you have installed. For llama.cpp, GGUF models can be downloaded directly from Hugging Face via the built-in huggingface-cli integration.
~/.claude and ~/.codex are untouched. All local config lives in .claude-codex-local/. To fully revert: remove that folder and the alias block added to your shell rc. Done.
llmfit to analyze your RAM and GPU, then recommends models that actually fit. Smaller machines get smaller, faster models. Larger machines unlock more capable ones.
claude or codex commands directly — they are completely unmodified. The cc / cx aliases are only for local sessions. Cloud and local coexist.
Release history following Keep a Changelog and Semantic Versioning.
llmfit system --json fields from the top level, but they are wrapped under a system key (#46)llama-server and offers its active model as a pick, so you can keep your warm process instead of re-pulling a GGUFollama pull, lms get, and Hugging Face CLI downloads, with a post-download summary and clean Ctrl-C abort (#39)mypy and ruffccl CLI commands: setup, doctor, find-model, and their flags — 26 tests total (#29, #32)--resume and --non-interactive flags are now available at the top-level ccl parser, so ccl --resume works without specifying the setup subcommand explicitly (#28, #30)hf (modern) and huggingface-cli (legacy) binary names (#21, #22)llmfit check is now optional — environment discovery no longer requires it; the wizard prompts to install it only when model selection is requested (#24, #26)ccl, replacing claude-codex-local and ccl-bridge. Command tree unchanged — ccl setup, ccl doctor, ccl find-model (#20)claude_codex_local.bridge → claude_codex_local.core. Anyone importing the module directly must switch to the new pathccl-bridge debug binary — its subcommands (profile, recommend, doctor, adapters) remain reachable via python -m claude_codex_local.core <cmd>bin/claude-codex-local bash wrapper and the top-level wizard.py duplicate — both predated the installable packageccl --version and new global flags --no-color (honors NO_COLOR), --verbose, --quietinstall.sh now runs pip install -e . so the ccl entry point lands in the virtualenv automaticallycc and cx can now coexist in the same shell rc file, each with its own idempotent fenced block (#16)llama-server integration and GGUF model support from Hugging Facepip install .[dev] optional extras group (pytest, ruff, mypy, bandit, detect-secrets, pre-commit)run_e2e_docker.sh under set -uinstall.sh) — no clone requiredollama launch integration as primary engine path~/.zshrc / ~/.bashrcguide.md generation after wizard completes--resume flag to pick up after a failed wizard step--non-interactive flag for CI-friendly setupfind-model subcommand for standalone llmfit recommendationsollama launch instead of isolated HOME and variant buildersource ~/.zshrc before first cc/cx runllmfit integration for hardware-aware model selection@pytest.mark.local marker for integration testsFree, open-source, MIT licensed. Takes under 10 minutes.