Changelog — claude-codex-local

v0.16.0

2026-05-22

Latest

Added

Auto-fetch available models during remote engine model selection (#134): when the wizard's step 4 model picker runs against a remote engine endpoint (Ollama, llama.cpp, or vLLM), it now calls the remote API to list available models instead of showing a local-only picker. This removes the "model not found" guesswork for remote setups — you see exactly what the remote server has installed.
Smart remote endpoint URL scheme detection (#134): if the user enters a bare IP or hostname (e.g. 192.168.1.100:11434) during the local-vs-remote wizard prompt, the step now auto-prepends http:// and the engine's default port if missing, so typos like gpu-box.local or 192.168.1.100:8000 produce a valid URL instead of a confusing connection error.
Test coverage for the remote model-fetch and URL normalization (#134): new unit tests cover the auto-fetch path (probe_remote_models), the URL-scheme normalizer (_normalize_url), the VLLM_BASE_URL env-key extraction for remote vLLM, and the error-handling boundaries (connection refused, 404, JSON parse failure).
Interactive local-vs-remote prompt during engine selection (#122): the wizard now asks whether the chosen engine is local on this machine or a remote endpoint. Selecting remote prompts for the base URL (and, for vllm, an API key), stores the value in the engine's *_BASE_URL env var inside the helper script, and skips the local install/launch path entirely.
Test coverage for the interactive remote-engine wizard path (#125): new unit and integration tests exercise the local-vs-remote prompt, env-keyfile materialization with chmod 0600, and the remote branching in healthcheck, info, and start_server for llamacpp.

Fixed

llamacpp remote-mode branching (#123): the llama.cpp helper script, healthcheck, info, and start_server no longer assume a local llama-server binary when LLAMACPP_BASE_URL points at a remote endpoint. Remote endpoints now skip binary discovery, model-file checks, and the spawn path; healthcheck targets the remote URL directly.
Rename Pi local shortcut cp → ccp (#120): the wizard-installed cp alias shadowed the standard POSIX copy command. The Pi local helper script and short alias are now ccp (long alias pi-local is unchanged). Re-running ccl setup migrates existing installs automatically.

Changed

llamacpp-tuner skill no-ops cleanly when llamacpp is remote (#124): the tuner detects a remote LLAMACPP_BASE_URL and exits with a friendly message rather than attempting to introspect a non-existent local binary.
README and wizard walkthrough lead with the interactive remote-engine flow (#126): the quickstart and wizard documentation now show the local-vs-remote prompt and remote-endpoint setup as the primary path.

Full compare diff →

v0.14.0

2026-05-20

Added

MTP (Multi-Token Prediction) support for llama.cpp (#102, #103): llama-server auto-launches with --spec-type draft-mtp --spec-draft-n-max 5 for MTP variants. Detection runs two passes: a GGUF metadata probe (architecture-specific *.mtp.* keys, *.nextn_predict_layers cross-arch convention, or MTP in general.name/general.architecture) and a filename fallback (*mtp*, word-bounded, case-insensitive). Env vars LLAMACPP_MTP_ENABLED=0/1 and LLAMACPP_SPEC_DRAFT_N_MAX=N (range 1–16) override. Conflict guard recognizes --mmproj and -np/--parallel > 1 and disables MTP with a warning; out-of-range values surface as notes entries on the MTP result.
llamacpp-tuner skill: new Claude Code skill that helps optimize llama.cpp server configuration for coding-agent workloads. Includes a benchmark agent and configuration profiles.

Fixed

Resolve mypy and ruff failures from v0.13.1 CI.
Apply ruff format and rename ambiguous l identifier in bench_agent.py.

Chore

Level up llamacpp-tuner skill to A grade.

Full compare diff →

v0.13.1

2026-05-19

Added

ccl status command (#98, #101): new top-level subcommand that prints the current ccl setup and shortcut availability. Lists all 9 shortcuts (3 harnesses × 3 engine types) with aliases, selected model, engine name and live status, and an availability column (available / unavailable / unconfigured); follows with an overall setup summary. Engine health is checked via each adapter's healthcheck.

Fixed

ccl status consistency: per-harness inference so cx no longer inherits cc's engine; local availability requires both helper script and installed engine; router shortcuts (cc9 / cx9 / cp9, cco / cxo / cpo) need a wired-up alias before claiming available.
Default harness/engine/model only show wizard-state values; no longer fabricated from a single detected script.
Replace the buggy fence_tag.replace("9","").replace("o","") (which turned codex into cdex) with explicit lookup tables; fix _infer_engine_from_script's return annotation to admit None.

Tests · Chore

Add 18 unit tests pinning each cross-row / cross-summary inconsistency and the helper-inference regexes (llamacpp via --model / ANTHROPIC_CUSTOM_MODEL_OPTION, ollama via pi --provider ccl-ollama).
Ignore .gstack/ workspace directory.

Full compare diff →

v0.13.0

2026-05-19

Added

Cross-harness session bridge: ccl run auto-captures and auto-injects conversation context across Claude Code, Codex, and Pi (#62, #93)
Post-run capture from native session files into ~/.claude-codex-local/sessions/<harness>.jsonl
Pre-run injection (one-shot -p only): freshest other harness's transcript prepended as a [prior context, agent=…] block
Cwd-scoped, 7-day staleness cap, macOS symlink-aware path resolution
Opt out per-call with ccl run --no-context or globally via CCL_SESSION_BRIDGE=0
ccl session command group (list / show / sync / truncate / clear)
Best-effort token redaction (OpenAI, AWS, GitHub, Slack, GitLab, Google API) on every import
ccl run --native-params flag (#97, #99): forward everything after --native-params -- verbatim to the launched harness — escape hatch for options ccl does not wrap first-class (e.g. --dangerously-skip-permissions)
Wizard llmfit fallback (#95, #100): when step 1 hardware scan is deferred, opportunistically run llmfit and persist the result to the machine-profile cache

Documentation

README: rewrite Sharing Context Between Agents for the auto-bridge, scope guards, and interactive-capture / one-shot-inject asymmetry
README: theme-aware logo for dark/light mode rendering on GitHub
README: add PyPI download badges and experiment tag
docs.html: refresh ccl run and ccl session cards for the auto-bridge

Full compare diff →

v0.12.0

2026-05-16

Added

OpenRouter Integration: Add OpenRouter as a hosted-SaaS cloud-routing backend alongside 9router (#83)
New openrouter engine with OpenRouterAdapter mirroring the 9router shape
Helper scripts cco / cxo / cpo (Claude / Codex / Pi via OpenRouter)
Default model anthropic/claude-sonnet-4.6; override via CCL_OPENROUTER_MODEL
Deferred-secret API key storage (chmod 0600) reused from the 9router pattern
Smoke test sends a minimal request to the selected OpenRouter model
Doctor checks for OpenRouter key file mode, content, and model name validity

Fixed

OpenRouter smoke test: Now targets the selected OpenRouter model (#85)

Documentation

Update landing page release history for recent versions

Full compare diff →

v0.11.0

2026-05-16

Added

Pi Harness Support: Add Pi as a supported harness alongside Claude Code and Codex CLI, enabling model-agnostic terminal coding workflows (#59, #82)
Wire Pi into the wizard setup flow with dedicated configuration
Add cp alias for Pi + local model sessions
Support Pi-specific models.json configuration
Update documentation and guide generation for Pi workflows

Full compare diff →

v0.10.0

2026-05-10

Added

Non-interactive CLI: Add ccl run subcommand with -p/--prompt flag for scripted workflows (#70, #71)
vLLM Wizard Integration: Wire vLLM into the setup wizard as a selectable backend (#66)
9router Auto-install: Wizard offers to install 9router via npm when selected (#67)
Machine Profile Caching: Cache machine specs to avoid re-scanning hardware, speeding up wizard startup (#58, #75)
llama.cpp Enhancements: 128k context support, reasoning model smoke test, ccl serve command with auto-restart (#60)

Fixed

Wizard Component Recheck: Recheck selected setup components after user modifications (#79, #81)
vLLM Detection: Now checks CLI installation rather than server reachability, preventing false positives (#78)
llama.cpp Model Matching: Fix HuggingFace tag matching using existing _llamacpp_models_match helper (#64)
Machine Profile Cache: Write in-process cache to the correct symbol (#77)

Performance

Lazy llmfit Loading: Lazy init + cache-aware model picker, reducing unnecessary hardware scans (#79, #80)

Tests · Docs

Add comprehensive end-to-end test against a live vLLM server (#63)
Refresh documentation and brand assets for clarity and visual consistency (#76)

Full compare diff →

v0.9.0

2026-05-05

Added

9router Integration: Add 9router as a cloud-routing backend (#51, #52)
Router9Adapter with smoke test support
Extend wizard with 9router setup flow and API key management
Support cc9/cx9 aliases alongside existing cc/cx
Fence-tag derivation and doctor checks for 9router

Fixed

Wizard now honors forced setup preferences (#51)
Update DeepSeek model hub paths
Fix step 2 install-hint loop to show 9router URL (#51)

Refactor

Refactor wizard _alias_block and _write_helper_script to use 4-way dispatch (#51)
Extend WireResult with raw_env field for deferred shell expressions (#51)

Full compare diff →

v0.8.3

2026-04-24

Fixed

Retire qwen2.5-coder 0.5b verified path; remove related claims from README, docs, model mapping, and static site (#49)
Restore bootstrap docs to point users to ccl find-model instead of a hardcoded tiny model path (#49)

v0.8.2

2026-04-20

Fixed

Wizard step IDs renumbered from 2.x (2.1–2.8) to sequential integers (1–8) for consistent progress indicators (#47)
Documentation updated to reflect new sequential step numbering (1–11)
E2E and unit tests updated to reference the new step IDs

v0.8.1

2026-04-17

Fixed

Machine specs table now shows real CPU/RAM/GPU values — wizard was reading llmfit system --json from the wrong nesting level (#46)
llmfit ranking now uses available RAM instead of total; Speed/Balanced/Quality picks match what fits on the host right now (#46)
Embedding and reranker models hidden from installed-models picker — they cannot serve as chat coding models (#46)
Step 4 model picker grouped with visual separators (Running / Suggested / Installed / Other) (#46)

v0.8.0

2026-04-17

Added

vLLM backend adapter with unit and e2e test coverage — joins Ollama, LM Studio, and llama.cpp as a first-class engine
Wizard detects an already-running llama-server and offers its active model as a pick
Wizard pre-populates the model picker with discovered + recommended models (#35, #36)
Wizard welcome banner shows installed version and repository URL (#37)
Live progress for model downloads (Ollama, LMS, HF CLI) with bytes/speed/ETA and a post-download summary; clean Ctrl-C abort (#39)
Fuzzy-search fallback for Hugging Face GGUF downloads via the Hub's search API (#38)

Fixed

Post-review polish for the fuzzy fallback and wizard flow (#45)
vLLM adapter type annotations and lint warnings cleared (mypy, ruff)
Removed a stray agent worktree gitlink that broke CI on fresh clones

earlier

pre-0.8.0

For versions prior to v0.8.0, see the complete changelog on GitHub.

Complete changelog on GitHub →

Release/history