config/models.json is the authoritative catalog. Hot path (T1/T2) stays
local; cloud is consulted only for overview (T3), strategic (T4), and
gatekeeper (T5) calls. J named qwen3.5 + newer models (minimax-m2.7,
glm-5, qwen3-next) specifically — all mapped with real reachable IDs
verified against ollama.com/api/tags.
Tier shape:
- t1_hot mistral + qwen2.5 local — 50-200 calls/scenario
- t2_review qwen2.5 + qwen3 local — 5-14 calls/event
- t3_overview gpt-oss:120b cloud — 1-3 calls/scenario
- t4_strategic qwen3.5:397b + glm-4.7 — 1-10 calls/day
- t5_gatekeeper kimi-k2-thinking — 1-5 calls/day, audit-logged
Rate budgets are declared in-config — Ollama Cloud paid tier is generous
but we cap overview/strategic/gatekeeper so no single rogue scenario can
blow the day's quota.
Experimental rotation list wired but disabled by default. When enabled,
T4 randomly routes 10% of calls to a rotating minimax/GLM/qwen-next/
deepseek/nemotron/cogito/mistral-large candidate, logs comparisons, and
auto-promotes after 3 rotations of wins.
Playbook versioning SPEC embedded under `playbook_versioning` key: every
seed gets version + parent_id + retired_at + architecture_snapshot, so
when a schema migration breaks a playbook we can pinpoint which change
retired it. Implementation flagged for next sprint (touches gateway +
catalogd + mcp-server) — not wired here.
- scenario.ts now loads config/models.json at init, env vars still override
- mcp-server exposes /models/matrix read-only so UI can render it