lakehouse/config/modes.toml
root 2dbc8dbc83
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
v1/mode: model-aware enrichment downgrade + 3 corpora + variance harness
Pass 5 (5 reps × 4 conditions × 1 file on grok-4.1-fast) showed composing
matrix corpora is anti-additive on strong models — composed lakehouse_arch
+ symbols LOST 5/5 head-to-head vs codereview_isolation (Δ −1.8 grounded
findings, p=0.031). Default flips to isolation; matrix path now auto-
downgrades when the resolved model is strong.

Mode runner:
- matrix_corpus is Vec<String> (string OR array via deserialize_string_or_vec)
- top_k=6 from each corpus, merge by score, take top 8 globally
- chunk tag prefers doc_id over source so reviewer sees [adr:009] vs [lakehouse_arch]
- is_weak_model() gate auto-downgrades codereview_lakehouse → codereview_isolation
  for strong models (default-strong; weak = :free suffix or local last-resort)
- LH_FORCE_FULL_ENRICHMENT=1 bypasses for diagnostic runs
- EnrichmentSources.downgraded_from records when the gate fires

Three corpora indexed via /vectors/index (5849 chunks total):
- lakehouse_arch_v1 — ADRs + phases + PRD + scrum spec (93 docs, 2119 chunks)
- scrum_findings_v1 — past scrum_reviews.jsonl (168 docs, 1260 chunks; EXCLUDED
  from defaults — 24% out-of-bounds line citations from cross-file drift)
- lakehouse_symbols_v1 — regex-extracted pub items + /// docs (656 docs, 2470 chunks)

Experiment infra:
- scripts/build_*_corpus.ts — re-runnable when source content changes
- scripts/mode_pass5_variance_paid.ts — N reps × M conditions on one file
- scripts/mode_pass5_summarize.ts — mean ± σ + head-to-head, parser handles
  numbered + path-with-line + path-with-symbol finding tables
- scripts/mode_compare.ts — groups by mode|corpus when sweeps span corpora
- scripts/mode_experiment.ts — default model bumped to x-ai/grok-4.1-fast,
  --corpus flag for per-call override

Decisions + open follow-ups: docs/MODE_RUNNER_TUNING_PLAN.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:29:17 -05:00

70 lines
2.8 KiB
TOML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Mode router config — task_class → mode mapping
#
# `preferred_mode` is the first choice for a task class; `fallback_modes`
# get tried in order if the preferred one isn't available (LLM Team can
# return Unknown mode for some, OR the matrix has stronger signal for a
# fallback). `default_model` seeds the mode runner's model field if the
# caller doesn't override.
#
# Modes are dispatched against LLM Team UI (localhost:5000/api/run) for
# now; future Rust-native runners will short-circuit before the proxy.
# See crates/gateway/src/v1/mode.rs for the dispatch path.
[[task_class]]
name = "scrum_review"
# 2026-04-26 pass5 variance test (5 reps × 4 conditions, grok-4.1-fast,
# pathway_memory.rs): composed corpus LOST 5/5 vs isolation (Δ 1.8
# grounded findings, p=0.031). See docs/MODE_RUNNER_TUNING_PLAN.md.
# Default is now isolation — bug fingerprints + adversarial framing +
# file content carries strong models without matrix noise. The
# `codereview_lakehouse` matrix path remains available via force_mode
# (auto-downgrades to isolation on strong models — see the
# is_strong_model gate in crates/gateway/src/v1/mode.rs).
preferred_mode = "codereview_isolation"
fallback_modes = ["codereview_lakehouse", "codereview", "consensus", "ladder"]
default_model = "qwen3-coder:480b"
# Corpora kept defined so experimental modes (codereview_matrix_only,
# pass2/pass5 sweeps) and weak-model rescue rungs can still pull them.
# scrum_findings_v1 is built but EXCLUDED — bake-off showed 24% OOB
# line citations from cross-file drift, only safe with same-file gating.
matrix_corpus = ["lakehouse_arch_v1", "lakehouse_symbols_v1"]
[[task_class]]
name = "contract_analysis"
preferred_mode = "deep_analysis"
fallback_modes = ["research", "extract"]
default_model = "kimi-k2:1t"
matrix_corpus = "chicago_permits_v1"
[[task_class]]
name = "staffing_inference"
# Staffing-domain native enrichment runner — Pass 4 (2026-04-26).
# Same composer architecture as codereview_lakehouse but with staffing
# framing + workers corpus. Validates that the modes-as-prompt-molders
# pattern generalizes beyond code review.
preferred_mode = "staffing_inference_lakehouse"
fallback_modes = ["ladder", "consensus", "pipeline"]
default_model = "openai/gpt-oss-120b:free"
matrix_corpus = "workers_500k_v8"
[[task_class]]
name = "fact_extract"
preferred_mode = "extract"
fallback_modes = ["distill"]
default_model = "qwen2.5"
matrix_corpus = "kb_team_runs_v1"
[[task_class]]
name = "doc_drift_check"
preferred_mode = "drift"
fallback_modes = ["validator"]
default_model = "gpt-oss:120b"
matrix_corpus = "distilled_factual_v20260423095819"
# Fallback when task_class isn't in the table — useful for ad-hoc calls
# during development that don't yet have a mapped mode.
[default]
preferred_mode = "pipeline"
fallback_modes = ["consensus", "ladder"]
default_model = "qwen3.5:latest"