lakehouse

Author	SHA1	Message	Date
root	2dbc8dbc83	v1/mode: model-aware enrichment downgrade + 3 corpora + variance harness Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts Pass 5 (5 reps × 4 conditions × 1 file on grok-4.1-fast) showed composing matrix corpora is anti-additive on strong models — composed lakehouse_arch + symbols LOST 5/5 head-to-head vs codereview_isolation (Δ −1.8 grounded findings, p=0.031). Default flips to isolation; matrix path now auto- downgrades when the resolved model is strong. Mode runner: - matrix_corpus is Vec<String> (string OR array via deserialize_string_or_vec) - top_k=6 from each corpus, merge by score, take top 8 globally - chunk tag prefers doc_id over source so reviewer sees [adr:009] vs [lakehouse_arch] - is_weak_model() gate auto-downgrades codereview_lakehouse → codereview_isolation for strong models (default-strong; weak = :free suffix or local last-resort) - LH_FORCE_FULL_ENRICHMENT=1 bypasses for diagnostic runs - EnrichmentSources.downgraded_from records when the gate fires Three corpora indexed via /vectors/index (5849 chunks total): - lakehouse_arch_v1 — ADRs + phases + PRD + scrum spec (93 docs, 2119 chunks) - scrum_findings_v1 — past scrum_reviews.jsonl (168 docs, 1260 chunks; EXCLUDED from defaults — 24% out-of-bounds line citations from cross-file drift) - lakehouse_symbols_v1 — regex-extracted pub items + /// docs (656 docs, 2470 chunks) Experiment infra: - scripts/build_*_corpus.ts — re-runnable when source content changes - scripts/mode_pass5_variance_paid.ts — N reps × M conditions on one file - scripts/mode_pass5_summarize.ts — mean ± σ + head-to-head, parser handles numbered + path-with-line + path-with-symbol finding tables - scripts/mode_compare.ts — groups by mode\|corpus when sweeps span corpora - scripts/mode_experiment.ts — default model bumped to x-ai/grok-4.1-fast, --corpus flag for per-call override Decisions + open follow-ups: docs/MODE_RUNNER_TUNING_PLAN.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:29:17 -05:00
root	56bf30cfd8	v1/mode: override knobs + staffing native runner + pass 2/3/4 harnesses Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts Setup for the corpus-tightening experiment sweep (J 2026-04-26 — "now is the only cheap window before the corpus gets large and refactoring costs go up"). Override params on /v1/mode/execute (additive — old callers unaffected): force_matrix_corpus — Pass 2: try alternate corpora per call force_relevance_threshold — Pass 2: sweep filter strictness force_temperature — Pass 3: variance test New native mode `staffing_inference_lakehouse` (Pass 4): - Same composer architecture as codereview_lakehouse - Staffing framing: coordinator producing fillable\|contingent\| unfillable verdict + ranked candidate list with playbook citations - matrix_corpus = workers_500k_v8 - Validates that modes-as-prompt-molders generalizes beyond code - Framing explicitly says "do NOT fabricate workers" — the staffing analog of the lakehouse mode's symbol-grounding requirement Three sweep harnesses: scripts/mode_pass2_corpus_sweep.ts — 4 corpora × 4 thresholds × 5 files scripts/mode_pass3_variance.ts — 3 files × 3 temps × 5 reps scripts/mode_pass4_staffing.ts — 5 fill requests through staffing mode Each appends per-call rows to data/_kb/mode_experiments.jsonl which mode_compare.ts already aggregates with grounding column. Pass 1 (10 files × 5 modes broad sweep) currently running via the existing scripts/mode_experiment.ts — gateway restart deferred until it completes so the new override knobs aren't enabled mid-experiment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 01:55:12 -05:00
root	7c47734287	v1/mode: parameterized runner + 5 enrichment-experiment modes Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts J's directive (2026-04-26): "Create different modes so we can really dial in the architecture before it gets further along — pinpoint the failures and strengths equally so I know what direction to go in. Loop theater happens when we don't pinpoint the most accurate path." Refactored execute() to switch on mode name → EnrichmentFlags preset. Five native modes designed as deliberate experiments — each isolates one architectural axis so the comparison matrix reads off what's doing work vs what's adding latency for nothing: codereview_lakehouse — all enrichment on (ceiling) codereview_null — raw file + generic prompt (baseline) codereview_isolation — file + pathway only (no matrix) codereview_matrix_only — file + matrix only (no pathway) codereview_playbook_only — pathway only, NO file content (lossy ceiling) Each call appends a row to data/_kb/mode_experiments.jsonl with full sources + response. LH_MODE_LOG_OFF=1 to suppress. scripts/mode_experiment.ts — sweeps files × modes serially, prints live progress with per-call enrichment stats. Defaults to OpenRouter free model so cloud quota doesn't gate experiments. scripts/mode_compare.ts — reads the JSONL, outputs per-file matrix + per-mode aggregate + mode-vs-baseline win/loss with avg finding delta. Heuristic finding-count from markdown table rows; pathway citation count from preamble references. scrum_master_pipeline.ts gets a mode-runner fast path gated by LH_USE_MODE_RUNNER=1: try /v1/mode/execute first, fall through to the existing ladder if response < LH_MODE_MIN_CHARS (default 2000) or anything errors. Off by default until A/B-validated. First experiment results (2 files × 5 modes via gpt-oss-120b:free): - codereview_null produces 12.6KB response with ZERO findings (proves adversarial framing is load-bearing) - codereview_playbook_only produces MORE findings than lakehouse on average (12 vs 9) at 73% the latency — pathway memory is the dominant signal driver - codereview_matrix_only underperforms isolation by ~0.5 findings while costing the same latency — matrix corpus likely underperforming for scrum_review task class Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 01:36:42 -05:00
root	86f63a083d	v1/mode: codereview_lakehouse native runner — modes are prompt-molders Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts J's framing (2026-04-26): "Modes are how you ask ONCE and get BETTER information — they mold the data, hyperfocus the prompt on this codebase's needs, so the model gets it right the first time without the cascading retry ladder." Built the first concrete native enrichment runner (codereview_lakehouse) that composes every context primitive the gateway exposes: 1. Focus file content (read from disk OR caller-supplied) 2. Pathway memory bug_fingerprints for this file area (ADR-021 preamble — "📚 BUGS PREVIOUSLY FOUND IN THIS FILE AREA") 3. Matrix corpus search via the task_class's matrix_corpus 4. Relevance filter (observer /relevance) drops adjacency pollution 5. Assembles ONE precise prompt with system framing 6. Single call to /v1/chat with the recommended model POST /v1/mode/execute dispatches. Native mode → runs the composer. Non-native mode → 501 NOT_IMPLEMENTED with hint (proxy to LLM Team /api/run is queued). Provider hint logic auto-routes by model name shape: - vendor/model[:tag] → openrouter - kimi-/qwen3-coder/deepseek-v/mistral-large → ollama_cloud - everything else → local ollama Live test against crates/queryd/src/delta.rs (10593 bytes, 10 historical bug fingerprints, 2 matrix chunks dropped by relevance): - enriched_chars: 12876 - response_chars: 16346 (14 findings with confidence percentages) - Model literally cited the pathway memory preamble in finding #7 - One call to free-tier gpt-oss:120b produced what previously required the 9-rung escalation ladder Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:28:46 -05:00
root	d277efbfd2	v1/mode: task_class → mode/model router (decision-only, phase 1) Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts HANDOVER §queued (2026-04-25): "Mode router — port LLM Team multi-model patterns. Pick the right TOOL/MODE for each task class via the matrix, not cascade through models." Two-stage architecture: 1. Decision (POST /v1/mode) — pure recommendation, no execution. Returns {mode, model, decision: {source, fallbacks, matrix_corpus, notes}} so callers see WHY this mode was picked. 2. Execution (future POST /v1/mode/execute) — proxy to LLM Team /api/run for modes not yet ported to native Rust runners. Not wired in this phase. Splitting decision from execution lets us A/B-test the routing logic without committing to running every recommendation. The decision function is pure enough for exhaustive unit tests (3 added). config/modes.toml — initial map for 5 task_classes (scrum_review, contract_analysis, staffing_inference, fact_extract, doc_drift_check) + a default. matrix_corpus per task is reserved for the future matrix-informed routing pass. VALID_MODES list (24 modes) is kept in sync manually with LLM Team's /api/run handler at /root/llm_team_ui.py:10581. Adding a mode here without adding it upstream returns 400 from a future proxy. GET /v1/mode/list — operator introspection so a UI can render the registry table without re-parsing TOML. Live-tested: 5 task classes match, unknown classes fall through to default, force_mode override works + validates, bogus modes return 400 with the valid_modes list. Updates reference_llm_team_modes.md memory — earlier note claiming "only extract is registered" was wrong (all 25 are registered). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:16:32 -05:00

5 Commits