Ollama Pro plan went live today (39-model fleet on the same
OLLAMA_CLOUD_KEY) and OpenCode Zen was already wired in the gateway
but not consumed. Routing every gpt-oss call site to faster /
stronger replacements:
| Site | gpt-oss → replacement | Why |
|---|---|---|
| ollama_cloud default | gpt-oss:120b → deepseek-v3.2 | newest DeepSeek revision; live-probed `pong` |
| openrouter default | openai/gpt-oss-120b:free → x-ai/grok-4.1-fast | already the scrum LADDER's PRIMARY |
| modes.toml staffing_inference | openai/gpt-oss-120b:free → kimi-k2.6 | coding-specialized, on Ollama Pro |
| modes.toml doc_drift_check | gpt-oss:120b → gemini-3-flash-preview | speed leader for factual checks |
| scrum_master_pipeline tree-split MAP+REDUCE | gpt-oss:120b → gemini-3-flash-preview | latency-dominated path (5-20× per file) |
| bot/propose.ts CLOUD_MODEL | gpt-oss:120b → deepseek-v3.2 | same Ollama key, faster |
| mcp-server/observer.ts overseer label fallback | gpt-oss:120b → claude-opus-4-7 | matches new overseer model |
| crates/gateway/src/execution_loop overseer escalation | ollama_cloud/gpt-oss:120b → opencode/claude-opus-4-7 | frontier reasoning matters here — fires only after local self-correct fails twice; Zen pay-per-token cost is bounded |
Verification:
- `cargo check -p gateway --tests` — clean
- Live probes through localhost:3100/v1/chat:
- `opencode/claude-opus-4-7` → "pong"
- `gemini-3-flash-preview` (ollama_cloud) → "pong"
- `kimi-k2.6` (ollama_cloud) → "pong"
- `deepseek-v3.2` (ollama_cloud) → "Pong! 🏓"
Notes:
- kimi-k2:1t still upstream-broken (HTTP 500 on Ollama Pro probe today,
matches yesterday's memory). Replacement table never picks it.
- The Rust changes need a `systemctl restart lakehouse.service` to
take effect on the running gateway. TS callers reload on next run.
- aibridge/src/context.rs still has gpt-oss:{20b,120b} in its window-
size lookup table; harmless and kept for callers that pass it
explicitly as an override.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
92 lines
3.9 KiB
TOML
92 lines
3.9 KiB
TOML
# Mode router config — task_class → mode mapping
|
||
#
|
||
# `preferred_mode` is the first choice for a task class; `fallback_modes`
|
||
# get tried in order if the preferred one isn't available (LLM Team can
|
||
# return Unknown mode for some, OR the matrix has stronger signal for a
|
||
# fallback). `default_model` seeds the mode runner's model field if the
|
||
# caller doesn't override.
|
||
#
|
||
# Modes are dispatched against LLM Team UI (localhost:5000/api/run) for
|
||
# now; future Rust-native runners will short-circuit before the proxy.
|
||
# See crates/gateway/src/v1/mode.rs for the dispatch path.
|
||
|
||
[[task_class]]
|
||
name = "scrum_review"
|
||
# 2026-04-26 pass5 variance test (5 reps × 4 conditions, grok-4.1-fast,
|
||
# pathway_memory.rs): composed corpus LOST 5/5 vs isolation (Δ −1.8
|
||
# grounded findings, p=0.031). See docs/MODE_RUNNER_TUNING_PLAN.md.
|
||
# Default is now isolation — bug fingerprints + adversarial framing +
|
||
# file content carries strong models without matrix noise. The
|
||
# `codereview_lakehouse` matrix path remains available via force_mode
|
||
# (auto-downgrades to isolation on strong models — see the
|
||
# is_strong_model gate in crates/gateway/src/v1/mode.rs).
|
||
preferred_mode = "codereview_isolation"
|
||
fallback_modes = ["codereview_lakehouse", "codereview", "consensus", "ladder"]
|
||
default_model = "qwen3-coder:480b"
|
||
# Corpora kept defined so experimental modes (codereview_matrix_only,
|
||
# pass2/pass5 sweeps) and weak-model rescue rungs can still pull them.
|
||
# scrum_findings_v1 is built but EXCLUDED — bake-off showed 24% OOB
|
||
# line citations from cross-file drift, only safe with same-file gating.
|
||
matrix_corpus = ["lakehouse_arch_v1", "lakehouse_symbols_v1"]
|
||
|
||
[[task_class]]
|
||
name = "contract_analysis"
|
||
preferred_mode = "deep_analysis"
|
||
fallback_modes = ["research", "extract"]
|
||
default_model = "kimi-k2:1t"
|
||
matrix_corpus = "chicago_permits_v1"
|
||
|
||
[[task_class]]
|
||
name = "staffing_inference"
|
||
# Staffing-domain native enrichment runner — Pass 4 (2026-04-26).
|
||
# Same composer architecture as codereview_lakehouse but with staffing
|
||
# framing + workers corpus. Validates that the modes-as-prompt-molders
|
||
# pattern generalizes beyond code review.
|
||
preferred_mode = "staffing_inference_lakehouse"
|
||
fallback_modes = ["ladder", "consensus", "pipeline"]
|
||
# 2026-04-28: gpt-oss-120b:free → kimi-k2.6 via Ollama Pro. Coding-
|
||
# specialized, faster than gpt-oss, on the same OLLAMA_CLOUD_KEY so
|
||
# no extra provider hop.
|
||
default_model = "kimi-k2.6"
|
||
matrix_corpus = "workers_500k_v8"
|
||
|
||
[[task_class]]
|
||
name = "fact_extract"
|
||
preferred_mode = "extract"
|
||
fallback_modes = ["distill"]
|
||
default_model = "qwen2.5"
|
||
matrix_corpus = "kb_team_runs_v1"
|
||
|
||
[[task_class]]
|
||
name = "doc_drift_check"
|
||
preferred_mode = "drift"
|
||
fallback_modes = ["validator"]
|
||
# 2026-04-28: gpt-oss:120b → gemini-3-flash-preview via Ollama Pro.
|
||
# Speed leader on factual checking, same OLLAMA_CLOUD_KEY.
|
||
default_model = "gemini-3-flash-preview"
|
||
matrix_corpus = "distilled_factual_v20260423095819"
|
||
|
||
[[task_class]]
|
||
name = "pr_audit"
|
||
# Auditor's claim-vs-diff verification mode (2026-04-26 rebuild).
|
||
# Replaces the auditor's hand-rolled inference check with the mode-runner
|
||
# composer: pathway memory (PR-level patterns) + lakehouse_answers_v1
|
||
# corpus (prior accepted reviews + observer escalations) + adversarial
|
||
# JSON-shaped framing. Default model is paid Ollama Cloud kimi-k2:1t for
|
||
# strong claim-grounding; tie-breaker via auditor-side env override.
|
||
preferred_mode = "pr_audit"
|
||
fallback_modes = ["consensus", "ladder"]
|
||
# kimi-k2:1t broken upstream 2026-04-27 (Ollama Cloud 500 ISE, multi-hour
|
||
# sustained outage verified by repeated probes). deepseek-v3.1:671b is
|
||
# the drop-in substitute — proven working end-to-end through pr_audit
|
||
# during Phase 5 distillation acceptance testing.
|
||
default_model = "deepseek-v3.1:671b"
|
||
matrix_corpus = "lakehouse_answers_v1"
|
||
|
||
# Fallback when task_class isn't in the table — useful for ad-hoc calls
|
||
# during development that don't yet have a mapped mode.
|
||
[default]
|
||
preferred_mode = "pipeline"
|
||
fallback_modes = ["consensus", "ladder"]
|
||
default_model = "qwen3.5:latest"
|