v0.8.3 · Open Source

Hit your limit? Need privacy?
Just swap the model.

One alias. Claude Code or Codex on a local model. Skills, agents, MCP servers — all intact.

MIT License · Python 3.10+ · Ollama · LM Studio · llama.cpp
bash — terminal
cc
Launching Claude Code with gemma4:26b...
Claude Code v2.1.100
gemma4:26b with medium effort · API Usage Billing
~/myproject/my-app
my-app | main [1] | gemma4:26b | 39e7a1ed-7086-41ab
-- INSERT -- ⏵⏵ accept edits on (shift+tab to cycle)

You shouldn't have to stop
because the cloud did.

Developer time is expensive. Cloud limits are arbitrary. You deserve a local fallback that actually works.

Your quota ran out — mid-session

You're deep in a refactor, context loaded, momentum built. Then boom. Rate limit hit. Everything's gone. Back to square one.

🔒

Your code can't leave your machine

Consulting contracts. IP clauses. Air-gapped servers. You need local AI that's operationally viable — not a toy demo.

🔧

Every local setup guide breaks your workflow

New tool. New configs. New muscle memory. You don't want to learn something new. You just wanted a backend swap.


Meet claude-codex-local

Keep your tools. Swap the brain. Ship locally.

🧙

9-Step Guided Wizard

From zero to working local session in under 10 minutes. Auto-detects Ollama, LM Studio, and llama.cpp. Resumes on failure. No manual file surgery.

🧠

Hardware-Aware Model Selection

llmfit analyzes your RAM and GPU. It recommends the model that actually fits your machine — no more guessing or OOM crashes.

🛡️

Zero Config Breakage

Your real ~/.claude and ~/.codex are never touched. All config stays isolated. Rollback in seconds — remove one folder.

One Alias to Rule Them All

After setup, type cc or cx. That's it. All your skills, agents, MCP servers, and statusline work exactly as before.

🔒

Offline & Private

Code never leaves your machine. No telemetry. No phone-home. After model download, zero internet required. Perfect for air-gapped projects.

Proven Paths

Claude Code + Ollama + gemma4:26b verified end-to-end. llama.cpp with GGUF models from Hugging Face also supported. Real working combos, not hypothetical support.


Three steps. Ten minutes.
Fully local.

01

Install

Run one curl command. The wizard auto-detects your installed runtimes, checks your hardware, and flags anything missing.

02

Configure

Answer a few prompts. Pick your runtime, pick your model (or let llmfit pick for you). The wizard wires everything up safely.

03

Run

Type cc or cx. Your full Claude Code or Codex experience — locally. Every skill, every agent, every MCP server intact.


One command to start

Available on PyPI — install with pip or uv, or use the one-line shell installer.

pip · recommended
$ pip install claude-codex-local
uv · alternative
$ uv tool install claude-codex-local
Or use the shell installer (no clone required)
bash <(curl -sSL https://raw.githubusercontent.com/luongnv89/claude-codex-local/main/install.sh)
Or install from source
git clone https://github.com/luongnv89/claude-codex-local.git
cd claude-codex-local
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
ccl
Claude Code + Ollama + gemma4:26b
Codex CLI + Ollama + gemma4:26b
Claude Code + LM Studio + Qwen3
Any + llama.cpp + GGUF (from Hugging Face)

Common questions

No. It keeps both tools exactly as-is and adds a local backend bridge. Your harness, your skills, your muscle memory — all intact. One alias, zero disruption.
Ollama (primary, with native ollama launch support), LM Studio, and llama.cpp — all with auto-detection during setup. The wizard will find what you have installed. For llama.cpp, GGUF models can be downloaded directly from Hugging Face via the built-in huggingface-cli integration.
Never. Your real ~/.claude and ~/.codex are untouched. All local config lives in .claude-codex-local/. To fully revert: remove that folder and the alias block added to your shell rc. Done.
It depends on the model. The wizard uses llmfit to analyze your RAM and GPU, then recommends models that actually fit. Smaller machines get smaller, faster models. Larger machines unlock more capable ones.
Yes. After the initial model download (handled by your runtime like Ollama), no internet connection is required. There is no telemetry, no phone-home, and no license server. Fully air-gap capable.
Instantly. Just run the official claude or codex commands directly — they are completely unmodified. The cc / cx aliases are only for local sessions. Cloud and local coexist.

Changelog

Release history following Keep a Changelog and Semantic Versioning.

v0.8.1 2026-04-17 Latest
Fixed
  • Machine specifications table now shows real CPU, RAM, and GPU values — wizard was reading llmfit system --json fields from the top level, but they are wrapped under a system key (#46)
  • llmfit ranking now uses available RAM instead of total — Speed/Balanced/Quality picks match what will actually fit on the host right now (#46)
  • Embedding and reranker models are hidden from the installed-models picker for Ollama and LM Studio — they cannot serve as chat coding models (#46)
  • Step 4 (formerly 2.4) model picker is grouped with visual separators: Running server / Suggested by llmfit / Installed on this machine / Other (#46)
v0.8.0 2026-04-17
Added
  • + vLLM backend adapter with unit and e2e test coverage — high-throughput inference engine now joins Ollama, LM Studio, and llama.cpp as a first-class engine option
  • + Wizard detects an already-running llama-server and offers its active model as a pick, so you can keep your warm process instead of re-pulling a GGUF
  • + Wizard pre-populates the model picker with models discovered on-host and recommendation profile picks (#35, #36)
  • + Wizard welcome banner now shows the installed version and repository URL (#37)
  • + Live progress for ollama pull, lms get, and Hugging Face CLI downloads, with a post-download summary and clean Ctrl-C abort (#39)
  • + Fuzzy-search fallback for missing Hugging Face GGUF repos — wizard suggests up to 3 closest matches and re-prompts if none are found (#38)
Fixed
  • Post-review polish for the fuzzy fallback and KI wizard flow (#45)
  • vLLM adapter type annotations and lint warnings cleared under mypy and ruff
  • Removed a stray agent worktree gitlink that broke CI on fresh clones
v0.7.0 2026-04-12
Added
  • + Machine specifications table (CPU cores/name, RAM total/available, GPU details) displayed during environment discovery step (#31)
  • + Comprehensive e2e test suite covering all ccl CLI commands: setup, doctor, find-model, and their flags — 26 tests total (#29, #32)
Fixed
  • --resume and --non-interactive flags are now available at the top-level ccl parser, so ccl --resume works without specifying the setup subcommand explicitly (#28, #30)
v0.6.0 2026-04-11
Added
  • + ASCII 3D welcome banner with project tagline displayed at wizard startup (#23, #25)
Fixed
  • HuggingFace CLI detection now checks both hf (modern) and huggingface-cli (legacy) binary names (#21, #22)
  • llmfit check is now optional — environment discovery no longer requires it; the wizard prompts to install it only when model selection is requested (#24, #26)
v0.5.0 2026-04-11
Changed · Breaking
  • ~ Single canonical CLI binary. The package now installs one entry point, ccl, replacing claude-codex-local and ccl-bridge. Command tree unchanged — ccl setup, ccl doctor, ccl find-model (#20)
  • ~ Internal rename: claude_codex_local.bridgeclaude_codex_local.core. Anyone importing the module directly must switch to the new path
Removed
  • ccl-bridge debug binary — its subcommands (profile, recommend, doctor, adapters) remain reachable via python -m claude_codex_local.core <cmd>
  • Legacy bin/claude-codex-local bash wrapper and the top-level wizard.py duplicate — both predated the installable package
Added
  • + Top-level ccl --version and new global flags --no-color (honors NO_COLOR), --verbose, --quiet
  • + install.sh now runs pip install -e . so the ccl entry point lands in the virtualenv automatically
v0.4.0 2026-04-11
Added
  • + Smoke test now reports model speed in tokens/second so you can gauge throughput before you commit to a model (#17)
  • + Per-harness alias fences — cc and cx can now coexist in the same shell rc file, each with its own idempotent fenced block (#16)
v0.3.0 2026-04-11
Added
  • + llama.cpp backend adapter with llama-server integration and GGUF model support from Hugging Face
  • + Docker-based e2e test suite covering pip, uv, source, and extras install scenarios
  • + pip install .[dev] optional extras group (pytest, ruff, mypy, bandit, detect-secrets, pre-commit)
  • + GitHub Pages landing page with brand refresh and two-column hero layout
Fixed
  • Empty array expansion in run_e2e_docker.sh under set -u
v0.2.0 2026-04-10
Added
  • + One-command remote installer (install.sh) — no clone required
  • + ollama launch integration as primary engine path
  • + Shell alias installer with idempotent fenced block in ~/.zshrc / ~/.bashrc
  • + Personalized guide.md generation after wizard completes
  • + --resume flag to pick up after a failed wizard step
  • + --non-interactive flag for CI-friendly setup
  • + find-model subcommand for standalone llmfit recommendations
  • + Installable Python package structure for PyPI distribution
Changed
  • ~ Wizard now uses ollama launch instead of isolated HOME and variant builder
  • ~ LM Studio support moved to secondary/fallback path
Fixed
  • Shell alias block replaced idempotently on re-run (no more duplicates)
  • Users reminded to source ~/.zshrc before first cc/cx run
v0.1.0 2026-04-01 Initial release
Added
  • + Initial proof-of-concept: interactive wizard (8 steps)
  • + Harness support: Claude Code, Codex CLI
  • + Engine support: Ollama, LM Studio, llama.cpp
  • + llmfit integration for hardware-aware model selection
  • + Pre-commit hooks: ruff, mypy, bandit, detect-secrets
  • + pytest test suite with @pytest.mark.local marker for integration tests

Ready to run locally?

Free, open-source, MIT licensed. Takes under 10 minutes.