lakehouse/config/modes.toml
root d475fc7fff infra: replace gpt-oss with Ollama Pro + OpenCode Zen across hot paths
Ollama Pro plan went live today (39-model fleet on the same
OLLAMA_CLOUD_KEY) and OpenCode Zen was already wired in the gateway
but not consumed. Routing every gpt-oss call site to faster /
stronger replacements:

| Site | gpt-oss → replacement | Why |
|---|---|---|
| ollama_cloud default | gpt-oss:120b → deepseek-v3.2 | newest DeepSeek revision; live-probed `pong` |
| openrouter default | openai/gpt-oss-120b:free → x-ai/grok-4.1-fast | already the scrum LADDER's PRIMARY |
| modes.toml staffing_inference | openai/gpt-oss-120b:free → kimi-k2.6 | coding-specialized, on Ollama Pro |
| modes.toml doc_drift_check | gpt-oss:120b → gemini-3-flash-preview | speed leader for factual checks |
| scrum_master_pipeline tree-split MAP+REDUCE | gpt-oss:120b → gemini-3-flash-preview | latency-dominated path (5-20× per file) |
| bot/propose.ts CLOUD_MODEL | gpt-oss:120b → deepseek-v3.2 | same Ollama key, faster |
| mcp-server/observer.ts overseer label fallback | gpt-oss:120b → claude-opus-4-7 | matches new overseer model |
| crates/gateway/src/execution_loop overseer escalation | ollama_cloud/gpt-oss:120b → opencode/claude-opus-4-7 | frontier reasoning matters here — fires only after local self-correct fails twice; Zen pay-per-token cost is bounded |

Verification:
- `cargo check -p gateway --tests` — clean
- Live probes through localhost:3100/v1/chat:
  - `opencode/claude-opus-4-7` → "pong"
  - `gemini-3-flash-preview` (ollama_cloud) → "pong"
  - `kimi-k2.6` (ollama_cloud) → "pong"
  - `deepseek-v3.2` (ollama_cloud) → "Pong! 🏓"

Notes:
- kimi-k2:1t still upstream-broken (HTTP 500 on Ollama Pro probe today,
  matches yesterday's memory). Replacement table never picks it.
- The Rust changes need a `systemctl restart lakehouse.service` to
  take effect on the running gateway. TS callers reload on next run.
- aibridge/src/context.rs still has gpt-oss:{20b,120b} in its window-
  size lookup table; harmless and kept for callers that pass it
  explicitly as an override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:13:30 -05:00

92 lines
3.9 KiB
TOML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Mode router config — task_class → mode mapping
#
# `preferred_mode` is the first choice for a task class; `fallback_modes`
# get tried in order if the preferred one isn't available (LLM Team can
# return Unknown mode for some, OR the matrix has stronger signal for a
# fallback). `default_model` seeds the mode runner's model field if the
# caller doesn't override.
#
# Modes are dispatched against LLM Team UI (localhost:5000/api/run) for
# now; future Rust-native runners will short-circuit before the proxy.
# See crates/gateway/src/v1/mode.rs for the dispatch path.
[[task_class]]
name = "scrum_review"
# 2026-04-26 pass5 variance test (5 reps × 4 conditions, grok-4.1-fast,
# pathway_memory.rs): composed corpus LOST 5/5 vs isolation (Δ 1.8
# grounded findings, p=0.031). See docs/MODE_RUNNER_TUNING_PLAN.md.
# Default is now isolation — bug fingerprints + adversarial framing +
# file content carries strong models without matrix noise. The
# `codereview_lakehouse` matrix path remains available via force_mode
# (auto-downgrades to isolation on strong models — see the
# is_strong_model gate in crates/gateway/src/v1/mode.rs).
preferred_mode = "codereview_isolation"
fallback_modes = ["codereview_lakehouse", "codereview", "consensus", "ladder"]
default_model = "qwen3-coder:480b"
# Corpora kept defined so experimental modes (codereview_matrix_only,
# pass2/pass5 sweeps) and weak-model rescue rungs can still pull them.
# scrum_findings_v1 is built but EXCLUDED — bake-off showed 24% OOB
# line citations from cross-file drift, only safe with same-file gating.
matrix_corpus = ["lakehouse_arch_v1", "lakehouse_symbols_v1"]
[[task_class]]
name = "contract_analysis"
preferred_mode = "deep_analysis"
fallback_modes = ["research", "extract"]
default_model = "kimi-k2:1t"
matrix_corpus = "chicago_permits_v1"
[[task_class]]
name = "staffing_inference"
# Staffing-domain native enrichment runner — Pass 4 (2026-04-26).
# Same composer architecture as codereview_lakehouse but with staffing
# framing + workers corpus. Validates that the modes-as-prompt-molders
# pattern generalizes beyond code review.
preferred_mode = "staffing_inference_lakehouse"
fallback_modes = ["ladder", "consensus", "pipeline"]
# 2026-04-28: gpt-oss-120b:free → kimi-k2.6 via Ollama Pro. Coding-
# specialized, faster than gpt-oss, on the same OLLAMA_CLOUD_KEY so
# no extra provider hop.
default_model = "kimi-k2.6"
matrix_corpus = "workers_500k_v8"
[[task_class]]
name = "fact_extract"
preferred_mode = "extract"
fallback_modes = ["distill"]
default_model = "qwen2.5"
matrix_corpus = "kb_team_runs_v1"
[[task_class]]
name = "doc_drift_check"
preferred_mode = "drift"
fallback_modes = ["validator"]
# 2026-04-28: gpt-oss:120b → gemini-3-flash-preview via Ollama Pro.
# Speed leader on factual checking, same OLLAMA_CLOUD_KEY.
default_model = "gemini-3-flash-preview"
matrix_corpus = "distilled_factual_v20260423095819"
[[task_class]]
name = "pr_audit"
# Auditor's claim-vs-diff verification mode (2026-04-26 rebuild).
# Replaces the auditor's hand-rolled inference check with the mode-runner
# composer: pathway memory (PR-level patterns) + lakehouse_answers_v1
# corpus (prior accepted reviews + observer escalations) + adversarial
# JSON-shaped framing. Default model is paid Ollama Cloud kimi-k2:1t for
# strong claim-grounding; tie-breaker via auditor-side env override.
preferred_mode = "pr_audit"
fallback_modes = ["consensus", "ladder"]
# kimi-k2:1t broken upstream 2026-04-27 (Ollama Cloud 500 ISE, multi-hour
# sustained outage verified by repeated probes). deepseek-v3.1:671b is
# the drop-in substitute — proven working end-to-end through pr_audit
# during Phase 5 distillation acceptance testing.
default_model = "deepseek-v3.1:671b"
matrix_corpus = "lakehouse_answers_v1"
# Fallback when task_class isn't in the table — useful for ad-hoc calls
# during development that don't yet have a mapped mode.
[default]
preferred_mode = "pipeline"
fallback_modes = ["consensus", "ladder"]
default_model = "qwen3.5:latest"