Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Pass 5 (5 reps × 4 conditions × 1 file on grok-4.1-fast) showed composing matrix corpora is anti-additive on strong models — composed lakehouse_arch + symbols LOST 5/5 head-to-head vs codereview_isolation (Δ −1.8 grounded findings, p=0.031). Default flips to isolation; matrix path now auto- downgrades when the resolved model is strong. Mode runner: - matrix_corpus is Vec<String> (string OR array via deserialize_string_or_vec) - top_k=6 from each corpus, merge by score, take top 8 globally - chunk tag prefers doc_id over source so reviewer sees [adr:009] vs [lakehouse_arch] - is_weak_model() gate auto-downgrades codereview_lakehouse → codereview_isolation for strong models (default-strong; weak = :free suffix or local last-resort) - LH_FORCE_FULL_ENRICHMENT=1 bypasses for diagnostic runs - EnrichmentSources.downgraded_from records when the gate fires Three corpora indexed via /vectors/index (5849 chunks total): - lakehouse_arch_v1 — ADRs + phases + PRD + scrum spec (93 docs, 2119 chunks) - scrum_findings_v1 — past scrum_reviews.jsonl (168 docs, 1260 chunks; EXCLUDED from defaults — 24% out-of-bounds line citations from cross-file drift) - lakehouse_symbols_v1 — regex-extracted pub items + /// docs (656 docs, 2470 chunks) Experiment infra: - scripts/build_*_corpus.ts — re-runnable when source content changes - scripts/mode_pass5_variance_paid.ts — N reps × M conditions on one file - scripts/mode_pass5_summarize.ts — mean ± σ + head-to-head, parser handles numbered + path-with-line + path-with-symbol finding tables - scripts/mode_compare.ts — groups by mode|corpus when sweeps span corpora - scripts/mode_experiment.ts — default model bumped to x-ai/grok-4.1-fast, --corpus flag for per-call override Decisions + open follow-ups: docs/MODE_RUNNER_TUNING_PLAN.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
70 lines
2.8 KiB
TOML
70 lines
2.8 KiB
TOML
# Mode router config — task_class → mode mapping
|
||
#
|
||
# `preferred_mode` is the first choice for a task class; `fallback_modes`
|
||
# get tried in order if the preferred one isn't available (LLM Team can
|
||
# return Unknown mode for some, OR the matrix has stronger signal for a
|
||
# fallback). `default_model` seeds the mode runner's model field if the
|
||
# caller doesn't override.
|
||
#
|
||
# Modes are dispatched against LLM Team UI (localhost:5000/api/run) for
|
||
# now; future Rust-native runners will short-circuit before the proxy.
|
||
# See crates/gateway/src/v1/mode.rs for the dispatch path.
|
||
|
||
[[task_class]]
|
||
name = "scrum_review"
|
||
# 2026-04-26 pass5 variance test (5 reps × 4 conditions, grok-4.1-fast,
|
||
# pathway_memory.rs): composed corpus LOST 5/5 vs isolation (Δ −1.8
|
||
# grounded findings, p=0.031). See docs/MODE_RUNNER_TUNING_PLAN.md.
|
||
# Default is now isolation — bug fingerprints + adversarial framing +
|
||
# file content carries strong models without matrix noise. The
|
||
# `codereview_lakehouse` matrix path remains available via force_mode
|
||
# (auto-downgrades to isolation on strong models — see the
|
||
# is_strong_model gate in crates/gateway/src/v1/mode.rs).
|
||
preferred_mode = "codereview_isolation"
|
||
fallback_modes = ["codereview_lakehouse", "codereview", "consensus", "ladder"]
|
||
default_model = "qwen3-coder:480b"
|
||
# Corpora kept defined so experimental modes (codereview_matrix_only,
|
||
# pass2/pass5 sweeps) and weak-model rescue rungs can still pull them.
|
||
# scrum_findings_v1 is built but EXCLUDED — bake-off showed 24% OOB
|
||
# line citations from cross-file drift, only safe with same-file gating.
|
||
matrix_corpus = ["lakehouse_arch_v1", "lakehouse_symbols_v1"]
|
||
|
||
[[task_class]]
|
||
name = "contract_analysis"
|
||
preferred_mode = "deep_analysis"
|
||
fallback_modes = ["research", "extract"]
|
||
default_model = "kimi-k2:1t"
|
||
matrix_corpus = "chicago_permits_v1"
|
||
|
||
[[task_class]]
|
||
name = "staffing_inference"
|
||
# Staffing-domain native enrichment runner — Pass 4 (2026-04-26).
|
||
# Same composer architecture as codereview_lakehouse but with staffing
|
||
# framing + workers corpus. Validates that the modes-as-prompt-molders
|
||
# pattern generalizes beyond code review.
|
||
preferred_mode = "staffing_inference_lakehouse"
|
||
fallback_modes = ["ladder", "consensus", "pipeline"]
|
||
default_model = "openai/gpt-oss-120b:free"
|
||
matrix_corpus = "workers_500k_v8"
|
||
|
||
[[task_class]]
|
||
name = "fact_extract"
|
||
preferred_mode = "extract"
|
||
fallback_modes = ["distill"]
|
||
default_model = "qwen2.5"
|
||
matrix_corpus = "kb_team_runs_v1"
|
||
|
||
[[task_class]]
|
||
name = "doc_drift_check"
|
||
preferred_mode = "drift"
|
||
fallback_modes = ["validator"]
|
||
default_model = "gpt-oss:120b"
|
||
matrix_corpus = "distilled_factual_v20260423095819"
|
||
|
||
# Fallback when task_class isn't in the table — useful for ad-hoc calls
|
||
# during development that don't yet have a mapped mode.
|
||
[default]
|
||
preferred_mode = "pipeline"
|
||
fallback_modes = ["consensus", "ladder"]
|
||
default_model = "qwen3.5:latest"
|