Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.
WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.
WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
* UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
* REVISE: chains versions, parent.superseded_at + superseded_by stamped
* RETIRE: marks specific trace retired with reason, excluded from retrieval
* HISTORY: walks chain root→tip, cycle-safe
KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces
Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.1 KiB
Scrum Fix Wave — Phase-Sweep 2026-04-23
Purpose: Direct the scrum-master pipeline at concrete findings from the Phase 0→42 audit sweep, not at the high-level vision alone. Findings live in data/_kb/phase_sweep_findings.jsonl (19 items).
What the auditor expects you to produce per file
For each file the scrum sees: concrete code-level suggestions that close the listed findings. Not rewrites for style. Not vision drift. Land the invariant or admit the checkbox was premature.
Meta-pattern to fix (read this first)
The sweep surfaced one root cause repeated across 5 phases: primitives exist, cross-cutting enforcement doesn't. Auth, journal, access control, truth rules — the machinery is built, nothing calls it from the actual request path.
One PR-cluster retires the pattern:
- Identity. Auth middleware wires X-API-Key → extension
AgentIdentity { name, role, api_key_hash }. (P5-001) - Request pipeline.
/query/sqland/tools/*/callreadAgentIdentity, pass into queryd + tools handlers. - Access enforcement. Handlers call
access.can_access()/masked_columns()before returning data; log viaaccess.log_query(). (P13-001) - Mutation journaling. Every ingest / delta-write / tombstone-add / catalog-register calls the corresponding
journal.record_*. (P9-001) - Truth enforcement.
TruthStore::check()rewritten toevaluate(task_class, ctx) -> Vec<RuleOutcome>, actually walkingRuleConditionagainst context. (P42-001, P42-002)
After this cluster lands, Phases 5, 9, 13, 42 become "truly shipped" rather than "machinery shipped."
Findings by severity
🔴 High
- P9-001
journald/src/journal.rs,crates/gateway/src/main.rs, all mutation sites. Journal has zero internal callers. Everyingestd::service::upload_filesuccess, everyqueryd::delta::write_delta, everycatalogd::tombstones::add, everycatalogd::registry::registershould emit a journal event. Plus fix:event_counterresets on process restart — seed from max existingevent_idon rebuild or switch to UUID v7. - P13-001
crates/gateway/src/main.rs+crates/queryd/src/service.rs+crates/gateway/src/tools/service.rs.AccessControl.can_access/masked_columns/log_queryhave zero callers. Query path ignores role. Phase 17'sprofile_scoped_search(service.rs:1641) is the template — copy that shape. - P42-001
crates/truth/src/lib.rs:56.TruthStore::check(task_class)ignoresRuleConditionentirely. Signature needsevaluate(task_class, ctx: &serde_json::Value) -> Vec<RuleOutcome>that actually walks conditions. Update all 14 tests to exercise fail/pass semantics, not just storage.
🟡 Medium
- P5-001
crates/gateway/src/auth.rs+crates/gateway/src/main.rs:222-233.api_key_authis#[allow(dead_code)]. Wire withaxum::middleware::from_fn_with_state(api_key, auth::api_key_auth)on protected routes./healthstays public. - P10-001 Legacy datasets (including
candidates, 2.47M rows) have no PII flags. AddPOST /catalog/resync-metadatamirroring/catalog/resync-missing. - P14-001
crates/ingestd/src/schema_evolution.rs. Module has 5 passing tests and zero callers. AddPOST /catalog/datasets/by-name/{name}/schema-diff. When ADR-020register()returns 409, include amigration_rules[]body. - P20-001
config/models.jsonis spec-only — never loaded by Rust. Load intoshared::model_matrix::ModelMatrixat startup; delegateaibridge::context::context_window_forto matrix. - P21-001
generate_continuablehas one prod caller (rag.rs:171). Audit everyai_client.generate()site. Convert the truncation-prone + thinking-empty-prone sites (auditor paths, reranker, autotune) togenerate_continuable. - P39-001
ProviderAdaptertrait + adapters ship, zero callers./v1/chatinv1/mod.rs:152uses hardcodedmatch req.provider. Replace with adapter dispatch. - P40-001
config/routing.tomlis spec-only.RoutingEngine::newhas no callers. AddRoutingEngine::from_toml(path);V1Statecarriesrouting: Arc<RoutingEngine>;/v1/chatconsults it before provider match. - P42-002 Truth has no enforcement site.
/v1/chator execution_loop should calltruth.evaluate(task_class, ctx)post-response.
🟢 Low
- P1-001
crates/storaged/src/federation_service.rs:34-35. Bucket-qualified routes wire only PUT + GET. Add DELETE + LIST on/buckets/{bucket}/objects/{*key}. - P4-001 UI deploy stale. Either rebuild (
just ui-build+ restartlakehouse-ui.service) or amendPHASES.mdto note pre-Phase-9 drift. - P7-001
vectord::index_registry. Orphan index registrations (parquet deleted) still list in/vectors/indexes. Add a startup sweep +POST /vectors/resync-missing. - P12-001
crates/gateway/src/tools/service.rs. Audit row hasrow_count=null. PropagateQueryResponse.row_count+ addlatency_ms. - P20-002
crates/gateway/src/v1/mod.rs. No model-prefix auto-routing. Caller must setproviderexplicitly. Tie to P39-001 + P40-001 fix. - P21-002
crates/aibridge/src/context.rs.context_window_forhardcoded HashMap duplicatesconfig/models.json. Delegate once P20-001 lands. - P38-001
crates/gateway/src/execution_loop/mod.rs:1523. Testexecutor_prompt_includes_surfaced_candidatesfails on "W-1 Alice Smith" assertion. Either update prompt formatter or update test. - P40-002 Cost gating absent. Add
cost_ceiling_usd_per_hourtoRoutingEnginerule, pre-request check againstUsage.by_provider.
What "done" looks like for each file the scrum touches
- Name the specific finding(s) the file participates in.
- Show code-level diff (minimum: function signature + first 5 lines of body) for the fix.
- Call out any test that needs updating + one new test that would catch the bug on reintroduction.
- Flag if the fix is too big for one PR and should be split (most of the cross-cutting cluster wants a shared identity/middleware PR first, per-service PRs after).
Out of scope for this wave
- New features beyond what the findings describe.
- UI work (Phase 4 stale is known).
- DevOps / long-horizon domain work (Terraform/Ansible — Phase 43+).