3 Commits

Author SHA1 Message Date
root
56bf30cfd8 v1/mode: override knobs + staffing native runner + pass 2/3/4 harnesses
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Setup for the corpus-tightening experiment sweep (J 2026-04-26 — "now
is the only cheap window before the corpus gets large and refactoring
costs go up").

Override params on /v1/mode/execute (additive — old callers unaffected):
  force_matrix_corpus      — Pass 2: try alternate corpora per call
  force_relevance_threshold — Pass 2: sweep filter strictness
  force_temperature         — Pass 3: variance test

New native mode `staffing_inference_lakehouse` (Pass 4):
  - Same composer architecture as codereview_lakehouse
  - Staffing framing: coordinator producing fillable|contingent|
    unfillable verdict + ranked candidate list with playbook citations
  - matrix_corpus = workers_500k_v8
  - Validates that modes-as-prompt-molders generalizes beyond code
  - Framing explicitly says "do NOT fabricate workers" — the staffing
    analog of the lakehouse mode's symbol-grounding requirement

Three sweep harnesses:
  scripts/mode_pass2_corpus_sweep.ts — 4 corpora × 4 thresholds × 5 files
  scripts/mode_pass3_variance.ts     — 3 files × 3 temps × 5 reps
  scripts/mode_pass4_staffing.ts     — 5 fill requests through staffing mode

Each appends per-call rows to data/_kb/mode_experiments.jsonl which
mode_compare.ts already aggregates with grounding column.

Pass 1 (10 files × 5 modes broad sweep) currently running via the
existing scripts/mode_experiment.ts — gateway restart deferred until
it completes so the new override knobs aren't enabled mid-experiment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 01:55:12 -05:00
root
86f63a083d v1/mode: codereview_lakehouse native runner — modes are prompt-molders
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
J's framing (2026-04-26): "Modes are how you ask ONCE and get BETTER
information — they mold the data, hyperfocus the prompt on this
codebase's needs, so the model gets it right the first time without
the cascading retry ladder."

Built the first concrete native enrichment runner (codereview_lakehouse)
that composes every context primitive the gateway exposes:

  1. Focus file content (read from disk OR caller-supplied)
  2. Pathway memory bug_fingerprints for this file area (ADR-021
     preamble — "📚 BUGS PREVIOUSLY FOUND IN THIS FILE AREA")
  3. Matrix corpus search via the task_class's matrix_corpus
  4. Relevance filter (observer /relevance) drops adjacency pollution
  5. Assembles ONE precise prompt with system framing
  6. Single call to /v1/chat with the recommended model

POST /v1/mode/execute dispatches. Native mode → runs the composer.
Non-native mode → 501 NOT_IMPLEMENTED with hint (proxy to LLM Team
/api/run is queued).

Provider hint logic auto-routes by model name shape:
  - vendor/model[:tag] → openrouter
  - kimi-*/qwen3-coder*/deepseek-v*/mistral-large* → ollama_cloud
  - everything else → local ollama

Live test against crates/queryd/src/delta.rs (10593 bytes, 10
historical bug fingerprints, 2 matrix chunks dropped by relevance):
  - enriched_chars: 12876
  - response_chars: 16346 (14 findings with confidence percentages)
  - Model literally cited the pathway memory preamble in finding #7
  - One call to free-tier gpt-oss:120b produced what previously
    required the 9-rung escalation ladder

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:28:46 -05:00
root
d277efbfd2 v1/mode: task_class → mode/model router (decision-only, phase 1)
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
HANDOVER §queued (2026-04-25): "Mode router — port LLM Team multi-model
patterns. Pick the right TOOL/MODE for each task class via the matrix,
not cascade through models."

Two-stage architecture:
  1. Decision (POST /v1/mode) — pure recommendation, no execution.
     Returns {mode, model, decision: {source, fallbacks, matrix_corpus,
     notes}} so callers see WHY this mode was picked.
  2. Execution (future POST /v1/mode/execute) — proxy to LLM Team
     /api/run for modes not yet ported to native Rust runners. Not
     wired in this phase.

Splitting decision from execution lets us A/B-test the routing logic
without committing to running every recommendation. The decision
function is pure enough for exhaustive unit tests (3 added).

config/modes.toml — initial map for 5 task_classes (scrum_review,
contract_analysis, staffing_inference, fact_extract, doc_drift_check)
+ a default. matrix_corpus per task is reserved for the future
matrix-informed routing pass.

VALID_MODES list (24 modes) is kept in sync manually with LLM Team's
/api/run handler at /root/llm_team_ui.py:10581. Adding a mode here
without adding it upstream returns 400 from a future proxy.

GET /v1/mode/list — operator introspection so a UI can render the
registry table without re-parsing TOML.

Live-tested: 5 task classes match, unknown classes fall through to
default, force_mode override works + validates, bogus modes return
400 with the valid_modes list.

Updates reference_llm_team_modes.md memory — earlier note claiming
"only extract is registered" was wrong (all 25 are registered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:16:32 -05:00