One alias. Claude Code or Codex on local models or cloud routing. Skills, agents, MCP servers — all intact.
Developer time is expensive. Cloud limits are arbitrary. You deserve a backend swap that actually works — local, alternative cloud, or hosted SaaS.
You're deep in a refactor, context loaded, momentum built. Then — rate limit. Everything gone. Back to square one.
Consulting contracts. IP clauses. Air-gapped servers. You need a backend that respects your constraints.
New tool. New configs. New muscle memory. You don't want to learn something new. You just wanted a backend swap.
Keep your tools. Swap the backend. Ship locally or route to cloud. Every skill, agent, MCP server intact.
From zero to working local session in under 10 minutes. Auto-detects Ollama, LM Studio, and llama.cpp. Resumes on failure. No manual file surgery.
llmfit analyzes RAM and GPU. Recommends models that actually fit — no more OOM crashes or guessing.
Your real ~/.claude and ~/.codex are never touched. All config isolated. Rollback in seconds.
Type cc/cx for local, cc9/cx9 for 9router, cco/cxo for OpenRouter. Skills, agents, MCP — intact.
Code never leaves your machine. No telemetry. No phone-home. After model download, zero internet. Air-gap ready.
Need cloud power without Anthropic limits? 9router routes to alternative providers — OpenAI, DeepSeek, more — while keeping your harness intact.
Want a hosted alternative? OpenRouter forwards calls to dozens of cloud models. Use cco/cxo/cpo aliases; local cc/cx/ccp untouched.
Drop in a Multi-Token Prediction GGUF (Qwen, GLM, DeepSeek MTP variants) and llama-server auto-launches with --spec-type draft-mtp for free decode speedups. GGUF metadata probe + filename fallback; LLAMACPP_MTP_ENABLED and LLAMACPP_SPEC_DRAFT_N_MAX let you override.
A Claude Code skill that benchmarks coding-agent workloads against your llama.cpp setup and recommends server flag profiles. Ship a faster local backend without hand-tuning.
Local or cloud routing — the wizard wires either path safely.
Run one curl command. The wizard auto-detects your runtimes, checks your hardware, and flags anything missing.
Answer a few prompts. Pick your runtime, pick your model — or let llmfit pick for you. The wizard wires everything up safely.
Type cc or cx for local. cc9/cx9 for 9router. cco/cxo for OpenRouter. Every skill, every agent, every MCP server intact.
From 8GB to 128GB+ — a hardware-tiered map of Qwen2.5-Coder, DeepSeek-Coder, CodeLlama, and more.
Available on PyPI. Pip, uv, or a one-line shell installer.
The questions that actually show up in issues, ordered by frequency.
ollama launch), LM Studio, llama.cpp (with MTP auto-detect for Qwen/GLM/DeepSeek MTP variants — auto-applies --spec-type draft-mtp for faster decode), vLLM — all auto-detected during setup. For llama.cpp, GGUF models download directly from Hugging Face via the built-in huggingface-cli integration.cc9/cx9 aliases alongside your existing local ones.openrouter.ai/api/v1. No daemon, just an API key. Adds cco/cxo/cpo aliases.
~/.claude and ~/.codex are untouched. All local config lives in .claude-codex-local/. To fully revert: remove that folder and the alias block added to your shell rc. Done.
llmfit to analyze your RAM and GPU, then recommends models that actually fit. Smaller machines get smaller, faster models. Larger machines unlock more capable ones.
claude and codex commands are completely unmodified. The cc/cx aliases are only for local sessions. Cloud and local coexist.
Following Keep a Changelog and SemVer.
http:// and default port auto-appended.cp → ccp (#120): cp alias no longer shadows the POSIX copy command.Local model, alternative cloud, or hosted SaaS — same harness, one alias. Free, open-source, MIT licensed. Under 10 minutes.