1 Commits

Author SHA1 Message Date
root
03d723e7e6 Model matrix — 5 tiers, local hard workers + cloud overseers
config/models.json is the authoritative catalog. Hot path (T1/T2) stays
local; cloud is consulted only for overview (T3), strategic (T4), and
gatekeeper (T5) calls. J named qwen3.5 + newer models (minimax-m2.7,
glm-5, qwen3-next) specifically — all mapped with real reachable IDs
verified against ollama.com/api/tags.

Tier shape:
- t1_hot     mistral + qwen2.5 local       — 50-200 calls/scenario
- t2_review  qwen2.5 + qwen3 local         — 5-14 calls/event
- t3_overview gpt-oss:120b cloud           — 1-3 calls/scenario
- t4_strategic qwen3.5:397b + glm-4.7      — 1-10 calls/day
- t5_gatekeeper kimi-k2-thinking           — 1-5 calls/day, audit-logged

Rate budgets are declared in-config — Ollama Cloud paid tier is generous
but we cap overview/strategic/gatekeeper so no single rogue scenario can
blow the day's quota.

Experimental rotation list wired but disabled by default. When enabled,
T4 randomly routes 10% of calls to a rotating minimax/GLM/qwen-next/
deepseek/nemotron/cogito/mistral-large candidate, logs comparisons, and
auto-promotes after 3 rotations of wins.

Playbook versioning SPEC embedded under `playbook_versioning` key: every
seed gets version + parent_id + retired_at + architecture_snapshot, so
when a schema migration breaks a playbook we can pinpoint which change
retired it. Implementation flagged for next sprint (touches gateway +
catalogd + mcp-server) — not wired here.

- scenario.ts now loads config/models.json at init, env vars still override
- mcp-server exposes /models/matrix read-only so UI can render it
2026-04-20 19:24:41 -05:00