Apply the highest-confidence findings from the Phase 0→42 forensic sweep
after four scrum-master iterations under the adversarial prompt. Each fix
is independently validated by a later scrum iteration scoring the same
file higher under the same bar.
Code changes
────────────
P5-001 — crates/gateway/src/auth.rs + main.rs
api_key_auth was marked #[allow(dead_code)] and never wrapped around
the router, so `[auth] enabled=true` logged a green message and
enforced nothing. Now wired via from_fn_with_state, with constant-time
header compare and /health exempted for LB probes.
P42-001 — crates/truth/src/lib.rs
TruthStore::check() ignored RuleCondition entirely — signature looked
like enforcement, body returned every action unconditionally. Added
evaluate(task_class, ctx) that actually walks FieldEquals / FieldEmpty /
FieldGreater / Always against a serde_json::Value via dot-path lookup.
check() kept for back-compat. Tests 14 → 24 (10 new exercising real
pass/fail semantics). serde_json moved to [dependencies].
P9-001 (partial) — crates/ingestd/src/service.rs
Added Optional<Journal> to IngestState + a journal.record_ingest() call
on /ingest/file success. Gateway wires it with `journal.clone()` before
the /journal nest consumes the original. First-ever internal mutation
journal event verified live (total_events_created 0→1 after probe).
Iter-4 scrum scored these files higher under same prompt:
ingestd/src/service.rs 3 → 6 (P9-001 visible)
truth/src/lib.rs 3 → 4 (P42-001 visible)
gateway/src/auth.rs 3 → 4 (P5-001 visible)
gateway/src/execution_loop 4 → 6 (indirect)
storaged/src/federation 3 → 4 (indirect)
Infrastructure additions
────────────────────────
* tests/real-world/scrum_master_pipeline.ts
- cloud-first ladder: kimi-k2:1t → deepseek-v3.1:671b → mistral-large-3:675b
→ gpt-oss:120b → devstral-2:123b → qwen3.5:397b (deep final thinker)
- LH_SCRUM_FORENSIC env: injects SCRUM_FORENSIC_PROMPT.md as adversarial preamble
- LH_SCRUM_PROPOSAL env: per-iter fix-wave doc override
- Confidence extraction (markdown + JSON), schema v4 KB rows with:
verdict, critical_failures_count, verified_components_count,
missing_components_count, output_format, gradient_tier
- Model trust profile written per file-accept to data/_kb/model_trust.jsonl
- Fire-and-forget POST to observer /event so by_source.scrum appears in /stats
* mcp-server/observer.ts — unchanged in shape, confirmed receiving scrum events
* ui/ — new Visual Control Plane on :3950
- Bun.serve with /data/{services,reviews,metrics,trust,overrides,findings,file,refactor_signals,search,logs/:svc,scrum_log}
- Views: MAP (D3 graph, 5 overlays) / TRACE (per-file iter timeline) /
TRAJECTORY (refactor signals + reverse index search) / METRICS (explainers
with SOURCE + GOOD lines) / KB (card grid with tooltips) / CONSOLE (per-service
journalctl tail, tabs for gateway/sidecar/observer/mcp/ctx7/auditor/langfuse)
- tryFetch always attempts JSON.parse (fix for observer returning JSON without content-type)
- renderNodeContext primitive-vs-object guard (fix for gateway /health string)
* docs/SCRUM_FIX_WAVE.md — iter-specific scope directing the scrum
* docs/SCRUM_FORENSIC_PROMPT.md — adversarial audit prompt (verdict/critical/verified schema)
* docs/SCRUM_LOOP_NOTES.md — iteration observations + fix-next-loop queue
* docs/SYSTEM_EVOLUTION_LAYERS.md — Layers 1-10 roadmap (trust profiling, execution DNA, drift sentinel, etc)
Measurements across iterations
──────────────────────────────
iter 1 (soft prompt, gpt-oss:120b): mean score 5.00/10
iter 3 (forensic, kimi-k2:1t): mean score 3.56/10 (−1.44 — bar raised)
iter 4 (same bar, post fixes): mean score 4.00/10 (+0.44 — fixes landed)
Score movement iter3→iter4: ↑5 ↓1 =12
21/21 first-attempt accept by kimi-k2:1t in iter 4
20/21 emitted forensic JSON (richer signal than markdown)
16 verified_components captured (proof-of-life, new metric)
Permission Gradient distribution: 0 auto · 16 dry_run · 4 sim · 1 block
Observer loop: by_source {scrum: 21, langfuse: 1985, phase24_audit: 1}
v1/usage: 224 requests, 477K tokens, all tracked
Signal classes per file (iter 3 → iter 4):
CONVERGING: 1 (ingestd/service.rs — fix clearly landed)
LOOPING: 4 (catalogd/registry, main, queryd/service, vectord/index_registry)
ORBITING: 1 (truth — novel findings surfacing as surface ones fix)
PLATEAU: 9 (scores flat with high confidence — diminishing returns)
MIXED: 6
Loop thesis status
──────────────────
A file's score rises only when the scrum confirms a real fix landed.
No false positives yet across 3 iterations. Fixes applied to 3 files all
raised their independent scores under the same adversarial prompt. Loop
is measurable, not hand-wavy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.1 KiB
Scrum Fix Wave — Phase-Sweep 2026-04-23
Purpose: Direct the scrum-master pipeline at concrete findings from the Phase 0→42 audit sweep, not at the high-level vision alone. Findings live in data/_kb/phase_sweep_findings.jsonl (19 items).
What the auditor expects you to produce per file
For each file the scrum sees: concrete code-level suggestions that close the listed findings. Not rewrites for style. Not vision drift. Land the invariant or admit the checkbox was premature.
Meta-pattern to fix (read this first)
The sweep surfaced one root cause repeated across 5 phases: primitives exist, cross-cutting enforcement doesn't. Auth, journal, access control, truth rules — the machinery is built, nothing calls it from the actual request path.
One PR-cluster retires the pattern:
- Identity. Auth middleware wires X-API-Key → extension
AgentIdentity { name, role, api_key_hash }. (P5-001) - Request pipeline.
/query/sqland/tools/*/callreadAgentIdentity, pass into queryd + tools handlers. - Access enforcement. Handlers call
access.can_access()/masked_columns()before returning data; log viaaccess.log_query(). (P13-001) - Mutation journaling. Every ingest / delta-write / tombstone-add / catalog-register calls the corresponding
journal.record_*. (P9-001) - Truth enforcement.
TruthStore::check()rewritten toevaluate(task_class, ctx) -> Vec<RuleOutcome>, actually walkingRuleConditionagainst context. (P42-001, P42-002)
After this cluster lands, Phases 5, 9, 13, 42 become "truly shipped" rather than "machinery shipped."
Findings by severity
🔴 High
- P9-001
journald/src/journal.rs,crates/gateway/src/main.rs, all mutation sites. Journal has zero internal callers. Everyingestd::service::upload_filesuccess, everyqueryd::delta::write_delta, everycatalogd::tombstones::add, everycatalogd::registry::registershould emit a journal event. Plus fix:event_counterresets on process restart — seed from max existingevent_idon rebuild or switch to UUID v7. - P13-001
crates/gateway/src/main.rs+crates/queryd/src/service.rs+crates/gateway/src/tools/service.rs.AccessControl.can_access/masked_columns/log_queryhave zero callers. Query path ignores role. Phase 17'sprofile_scoped_search(service.rs:1641) is the template — copy that shape. - P42-001
crates/truth/src/lib.rs:56.TruthStore::check(task_class)ignoresRuleConditionentirely. Signature needsevaluate(task_class, ctx: &serde_json::Value) -> Vec<RuleOutcome>that actually walks conditions. Update all 14 tests to exercise fail/pass semantics, not just storage.
🟡 Medium
- P5-001
crates/gateway/src/auth.rs+crates/gateway/src/main.rs:222-233.api_key_authis#[allow(dead_code)]. Wire withaxum::middleware::from_fn_with_state(api_key, auth::api_key_auth)on protected routes./healthstays public. - P10-001 Legacy datasets (including
candidates, 2.47M rows) have no PII flags. AddPOST /catalog/resync-metadatamirroring/catalog/resync-missing. - P14-001
crates/ingestd/src/schema_evolution.rs. Module has 5 passing tests and zero callers. AddPOST /catalog/datasets/by-name/{name}/schema-diff. When ADR-020register()returns 409, include amigration_rules[]body. - P20-001
config/models.jsonis spec-only — never loaded by Rust. Load intoshared::model_matrix::ModelMatrixat startup; delegateaibridge::context::context_window_forto matrix. - P21-001
generate_continuablehas one prod caller (rag.rs:171). Audit everyai_client.generate()site. Convert the truncation-prone + thinking-empty-prone sites (auditor paths, reranker, autotune) togenerate_continuable. - P39-001
ProviderAdaptertrait + adapters ship, zero callers./v1/chatinv1/mod.rs:152uses hardcodedmatch req.provider. Replace with adapter dispatch. - P40-001
config/routing.tomlis spec-only.RoutingEngine::newhas no callers. AddRoutingEngine::from_toml(path);V1Statecarriesrouting: Arc<RoutingEngine>;/v1/chatconsults it before provider match. - P42-002 Truth has no enforcement site.
/v1/chator execution_loop should calltruth.evaluate(task_class, ctx)post-response.
🟢 Low
- P1-001
crates/storaged/src/federation_service.rs:34-35. Bucket-qualified routes wire only PUT + GET. Add DELETE + LIST on/buckets/{bucket}/objects/{*key}. - P4-001 UI deploy stale. Either rebuild (
just ui-build+ restartlakehouse-ui.service) or amendPHASES.mdto note pre-Phase-9 drift. - P7-001
vectord::index_registry. Orphan index registrations (parquet deleted) still list in/vectors/indexes. Add a startup sweep +POST /vectors/resync-missing. - P12-001
crates/gateway/src/tools/service.rs. Audit row hasrow_count=null. PropagateQueryResponse.row_count+ addlatency_ms. - P20-002
crates/gateway/src/v1/mod.rs. No model-prefix auto-routing. Caller must setproviderexplicitly. Tie to P39-001 + P40-001 fix. - P21-002
crates/aibridge/src/context.rs.context_window_forhardcoded HashMap duplicatesconfig/models.json. Delegate once P20-001 lands. - P38-001
crates/gateway/src/execution_loop/mod.rs:1523. Testexecutor_prompt_includes_surfaced_candidatesfails on "W-1 Alice Smith" assertion. Either update prompt formatter or update test. - P40-002 Cost gating absent. Add
cost_ceiling_usd_per_hourtoRoutingEnginerule, pre-request check againstUsage.by_provider.
What "done" looks like for each file the scrum touches
- Name the specific finding(s) the file participates in.
- Show code-level diff (minimum: function signature + first 5 lines of body) for the fix.
- Call out any test that needs updating + one new test that would catch the bug on reintroduction.
- Flag if the fix is too big for one PR and should be split (most of the cross-cutting cluster wants a shared identity/middleware PR first, per-service PRs after).
Out of scope for this wave
- New features beyond what the findings describe.
- UI work (Phase 4 stale is known).
- DevOps / long-horizon domain work (Terraform/Ansible — Phase 43+).