Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.
WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.
WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
* UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
* REVISE: chains versions, parent.superseded_at + superseded_by stamped
* RETIRE: marks specific trace retired with reason, excluded from retrieval
* HISTORY: walks chain root→tip, cycle-safe
KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces
Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Scrum Master Pipeline — Spec + Current State
Status: Active iteration on branch scrum/auto-apply-19814 → PR #11 at git.agentview.dev/profit/lakehouse
Last iter: 9 (2026-04-24)
Branch commit head: f4cff66 (ADR-021 Phase D fix)
This doc is the single handoff artifact for the scrum-master + auto-apply + pathway-memory loop built during 2026-04-24 sessions. A fresh Claude Code session reading this + docs/DECISIONS.md (ADR-020 and ADR-021 specifically) + MEMORY.md should have the same context as the session that wrote it.
1. What the loop is
An autonomous review-and-commit pipeline that:
- Scrum master (
tests/real-world/scrum_master_pipeline.ts) — walks a target-file list, asks a 9-rung escalation ladder of cloud models to produce a forensic audit against PRD + a change proposal doc, retries with learning context until acceptance, emits a structured review row. - Pathway memory (
crates/vectord/src/pathway_memory.rs) — stores the full backtrack context of each review (attempts, KB chunks, flags, bug fingerprints) indexed by a narrow fingerprint (task_class + file_prefix + signal_class). On every new review, it prepends historical bug patterns as a preamble so the reviewer preempts recurrences. Retired pathways auto-exclude themselves from hot-swap eligibility. - Auto-applier (
tests/real-world/scrum_applier.ts) — filters schema_v4 review rows by gradient_tier + confidence, asksqwen3-coder:480bfor concreteold_string/new_stringpatches, runscargo check --workspace, commits on green OR reverts on red/warning-count-up/rationale-mismatch. - Observer (
mcp-server/observer.ts) — receives per-file/eventemissions, escalates failure clusters to LLM Team via/v1/chatwithqwen3-coder:480b. - Auditor (
auditor/audit.ts) — external N=3 consensus re-check of scrum findings; writes todata/_kb/audit_facts.jsonl.
The guiding principle: every KB write has a reader, every PR claim is diff-verifiable.
2. The 9-rung ladder (cloud-first, strongest-model-first)
Defined in tests/real-world/scrum_master_pipeline.ts at const LADDER:
| # | Provider | Model | Role |
|---|---|---|---|
| 1 | ollama_cloud | kimi-k2:1t |
flagship, 1T params |
| 2 | ollama_cloud | qwen3-coder:480b |
coding specialist, 480B |
| 3 | ollama_cloud | deepseek-v3.1:671b |
reasoning, 671B |
| 4 | ollama_cloud | mistral-large-3:675b |
deep analysis, 675B |
| 5 | ollama_cloud | gpt-oss:120b |
reliable workhorse |
| 6 | ollama_cloud | qwen3.5:397b |
dense 397B, final thinker |
| 7 | openrouter | openai/gpt-oss-120b:free |
free-tier rescue |
| 8 | openrouter | google/gemma-3-27b-it:free |
fastest rescue |
| 9 | ollama | qwen3.5:latest |
last-resort local |
Each attempt is evaluated by isAcceptable() (chars ≥ 3800 AND not a malformed JSON-only dump). On reject, the next rung sees a learning preamble with the prior rejection reason.
3. Tree-split reducer
Files larger than FILE_TREE_SPLIT_THRESHOLD = 6000 bytes get chunked into FILE_SHARD_SIZE = 3500-byte shards. Each shard gets summarized via a fast rung, summaries are concatenated with internal §N§ markers, then fed as a SCRATCHPAD to the reviewer. The §N§ markers are stripped before the reviewer sees the merged context so it cannot claim "(shard 3)" in titles.
Bug regime this fixed: pre-tree-split iters had reviewers claim fields were "missing" because the field was past the 6KB context cutoff, not actually absent.
4. Schema v4 KB rows
data/_kb/scrum_reviews.jsonl — one row per accepted review. Fields:
{
"file": "crates/queryd/src/service.rs",
"reviewed_at": "2026-04-24T11:06:56Z",
"accepted_model": "ollama_cloud/kimi-k2:1t",
"accepted_on_attempt": 1,
"attempts_made": 1,
"tree_split_fired": true,
"suggestions_preview": "<truncated-2000-char>",
"confidences_per_finding": [92, 90, 88, 85, 75],
"confidence_avg": 86,
"confidence_min": 75,
"findings_count": 5,
"gradient_tier": "dry_run", // auto ≥90 / dry_run ≥70 / simulation ≥50 / block <50
"gradient_tier_avg": "dry_run",
"alignment_score": 3, // 1-10 self-rated
"output_format": "forensic_json",
"verdict": "fail", // pass | needs_patch | fail
"critical_failures_count": 3,
"pseudocode_flags_count": 0,
"prd_mismatches_count": 4,
"missing_components_count": 6,
"verified_components_count": 2,
"risk_points_count": 3,
"schema_version": 4,
"scrum_master_reviewed": true,
// ADR-021 fields on pathway trace (NOT this row, see pathway_memory state.json)
"pathway_hot_swap_hit": false,
"pathway_id": null,
"pathway_similarity": null,
"pathway_success_rate": null,
"rungs_saved": 0
}
5. Applier hardened gates (landed 5e8d87b)
tests/real-world/scrum_applier.ts has 5 gates between emitter output and commit. A patch must pass ALL:
- Confidence gate — emitter's self-reported
confidence >= MIN_CONF(default 90; 85 with relaxed env). Rejected patches log reasonconfidence NN < MM. - Size gate — max 6 lines changed per patch. Prevents cascading cross-file refactors.
- Cargo-green gate —
cargo check --workspacemust pass. Red build →git checkout -- file. - Warning-count gate — workspace baseline warning count measured at start; after patch, new count must be
≤ baseline. Catches unused-import additions and dead_code-after-allow-removed patterns. THIS GATE CAUGHT 96b46cd's HashSet noise. - Rationale-diff token alignment — rationale text must share at least one non-stopword token with the patch's new_string. Catches "Add destructive SQL filter" claims on
use tracing;diffs.
Plus:
- Deny-list:
config/,ops/,auditor/,docs/,data/,mcp-server/,ui/,sidecar/,scripts/can't be auto-applied (human review required). - Branch guard: refuses to run on
main. - Dry-run workspace revert: in
COMMIT=0mode, file is reverted after check regardless of outcome — no state pollution between runs.
Every decision logs to data/_kb/auto_apply.jsonl (action: committed / build_red_reverted / warnings_increased_reverted / rationale_mismatch_reverted / all_rejected / no_patches / dry_run_would_commit).
6. Pathway memory (ADR-021)
Full spec: docs/DECISIONS.md ADR-021. Code: crates/vectord/src/pathway_memory.rs.
Three-layer matrix index for compounding semantic-correctness signal:
Fingerprint (narrow)
pathway_id = SHA256(task_class + "|" + file_prefix + "|" + signal_class) — first 2 path segments (crates/queryd) so related files in the same crate share pathways.
Embedding (similarity vector)
32-bucket L2-normalized token hash. Tokens include: task_class, file_path, signal_class, per-attempt model+rung+accepted flag, KB chunk source_docs, observer class, bridge libraries, sub-pipeline calls, semantic_flags, and bug_fingerprints (flag+pattern_key).
TS and Rust implementations byte-match — verified by smoke test showing cosine=1.0 on same input tokens. This is load-bearing for the TS-written traces to be searchable against the Rust-indexed space.
Hot-swap gate (5-factor AND)
narrow_fingerprint_matches
AND audit_consensus.pass != false (null OK during bootstrap)
AND replay_count >= 3 (probation)
AND success_rate >= 0.80
AND NOT retired
AND similarity(query_vec, stored.pathway_vec) >= 0.90
Replay bookkeeping: on hot-swap, replay_count++; if the recommended model succeeded, replays_succeeded++; if replay_count >= 3 AND success_rate < 0.80 → retired = true (sticky — prevents oscillation on noise).
Semantic-correctness layer (ADR-021)
Each PathwayTrace carries:
semantic_flags: Vec<SemanticFlag>— one of 9 variants:UnitMismatch,TypeConfusion,NullableConfusion,OffByOne,StaleReference,PseudoImpl,DeadCode,WarningNoise,BoundaryViolationbug_fingerprints: Vec<BugFingerprint>—{flag, pattern_key, example, occurrences}wherepattern_key = "{Flag}:{sorted-top-3-identifiers-joined-by-hyphen}". Stable across prose variation.type_hints_used: Vec<TypeHint>—{source, symbol, type_repr}. Phase E (not yet populated).
Pre-review enrichment: scrum calls POST /vectors/pathway/bug_fingerprints with {task_class, file_path, signal_class, limit} — returns aggregated fingerprints sorted by occurrences descending. If any, a 📚 PATHWAY MEMORY preamble is prepended to the reviewer prompt with "this file area had these patterns before — check for recurrences."
Post-review extractor (Phase D, scrum_master_pipeline.ts): walks reviewer markdown line-by-line, finds lines containing a SemanticFlag variant, extracts identifier-shaped backtick-quoted tokens, filters out flag names + Rust keywords (self/mut/async/etc), sorts and takes top 3, builds pattern_key = "{Flag}:{tokens}".
HTTP surface (on gateway port 3100)
| Endpoint | Purpose |
|---|---|
POST /vectors/pathway/insert |
write a full PathwayTrace |
POST /vectors/pathway/query |
hot-swap candidate check (returns {candidate: null} or {candidate: {...}}) |
POST /vectors/pathway/record_replay |
update replay_count + success_rate after hot-swap |
GET /vectors/pathway/stats |
totals + reuse_rate + replay_success_rate |
POST /vectors/pathway/bug_fingerprints |
aggregated fingerprints by narrow fingerprint (for pre-review preamble) |
State persistence
data/_pathway_memory/state.json — JSON dump of all buckets. Loaded at gateway boot (crates/gateway/src/main.rs has pwm.load_from_storage().await).
7. Current state (2026-04-24 end of session)
Commits on branch scrum/auto-apply-19814 since iter-5 baseline
| # | SHA | Subject |
|---|---|---|
| 1 | 25ea3de |
observer fix — route LLM Team escalation to /v1/chat qwen3-coder |
| 2 | 8b77d67 |
OpenRouter rescue ladder + tree-split reducer + first auto-applier |
| 3 | 96b46cd |
first auto-applied commit (later found misleading) |
| 4 | 5e8d87b |
cleanup + applier hardening (warning + rationale + dry-run gates) |
| 5 | 9cc0ceb |
P42-002 — truth gate into queryd /sql + /paged paths |
| 6 | 2f8b347 |
pathway_memory base (PathwayTrace, hot-swap, 18 tests) |
| 7 | 86901f8 |
queryd/delta.rs 6-line unit-mismatch fix |
| 8 | 92df0e9 |
ADR-021 spec |
| 9 | 0a0843b |
ADR-021 Phases A+B+C (semantic_flags, prompt tags, preamble endpoint) |
| 10 | ee31424 |
ADR-021 Phase D (fingerprint extractor) |
| 11 | f4cff66 |
Phase D fix — strip flag names + Rust keywords from pattern_keys |
Matrix index state
- 12 pathway traces in
data/_pathway_memory/state.json - 11 distinct bug fingerprints across 4 Flag categories on
crates/querydnarrow fingerprint (1 manually seeded + 10 extracted) - 0 hot-swaps fired (probation requires ≥3 replays per pathway; none reached yet)
Active in-flight
- Iter 9 complete; next iter 10+ will use cleaner fingerprint extractor (
f4cff66) - 4 "noisy" pattern_keys from iter-9-file-1 pre-fix run (e.g.,
DeadCode:DeadCode) — dormant, won't match future output, acceptable dead entries
Queued (not yet implemented)
- Phase E —
type_hints_usedpopulation fromcatalogdcolumn types, ArrowRecordBatch.schema(), Rust struct field types. Feeds typed context to reviewer prompt. - Auditor → pathway audit_consensus wire — activates the strict-audit gate (currently lenient: null bootstraps, only explicit
falseblocks). - VCP UI cards for "top bug fingerprints in last N iters" + "new patterns learned this iter"
8. How to run a new iteration
# Default 3 files (playbook_memory.rs, doc_drift.rs, auditor/audit.ts)
LH_SCRUM_FORENSIC=/home/profit/lakehouse/docs/SCRUM_FORENSIC_PROMPT.md \
LH_SCRUM_PROPOSAL=/home/profit/lakehouse/docs/SCRUM_FIX_WAVE.md \
bun run tests/real-world/scrum_master_pipeline.ts
# Targeted files:
LH_SCRUM_FILES="/home/profit/lakehouse/crates/queryd/src/delta.rs,/home/profit/lakehouse/crates/queryd/src/service.rs" \
LH_SCRUM_FORENSIC=... LH_SCRUM_PROPOSAL=... \
bun run tests/real-world/scrum_master_pipeline.ts
# Dry-run auto-applier against the latest scrum output:
LH_APPLIER_MIN_CONF=85 LH_APPLIER_MAX_FILES=10 \
LH_APPLIER_MODEL=qwen3-coder:480b \
LH_APPLIER_BRANCH=scrum/auto-apply-19814 \
bun run tests/real-world/scrum_applier.ts
# Actually commit (ONLY after dry-run looks clean):
LH_APPLIER_COMMIT=1 LH_APPLIER_MIN_CONF=85 LH_APPLIER_MAX_FILES=10 \
LH_APPLIER_MODEL=qwen3-coder:480b \
LH_APPLIER_BRANCH=scrum/auto-apply-19814 \
bun run tests/real-world/scrum_applier.ts
9. Verify services before running
# Gateway (port 3100) — must be up; pathway endpoints are here
curl -s http://localhost:3100/health # "lakehouse ok"
curl -s http://localhost:3100/vectors/pathway/stats # pathway memory totals
# UI (port 3950) — VCP dashboard + /data/pathway_stats aggregation
curl -s http://localhost:3950/data/pathway_stats
# Observer (port 3800) — event receiver + LLM Team escalation
curl -s http://localhost:3800/health 2>/dev/null || true
# Sidecar (port 3200) — Python embed
curl -s http://localhost:3200/health 2>/dev/null || true
# LLM Team (port 5000) — /api/run?mode=extract ONLY registered mode
# (others like code_review/patch/refactor return "Unknown mode")
curl -s http://localhost:5000/health 2>/dev/null || true
If gateway missing new routes after code change: cargo build --release -p gateway && sudo systemctl restart lakehouse.service.
If UI missing new routes: kill old bun run ui/server.ts and restart (not a systemd service right now).
10. Where things live (code pointers)
| Concern | File |
|---|---|
| Scrum orchestrator | tests/real-world/scrum_master_pipeline.ts |
| Scrum ladder constant | same file, const LADDER line ~92 |
| Tree-split reducer | same file, async function treeSplitFile |
| Forensic prompt preamble (loaded via env) | docs/SCRUM_FORENSIC_PROMPT.md |
| Fix-wave proposal preamble | docs/SCRUM_FIX_WAVE.md |
| Scrum iter notes | docs/SCRUM_LOOP_NOTES.md |
| Auto-applier | tests/real-world/scrum_applier.ts |
| Applier audit trail | data/_kb/auto_apply.jsonl |
| Scrum reviews KB | data/_kb/scrum_reviews.jsonl |
| Model trust journal | data/_kb/model_trust.jsonl |
| Pathway memory module | crates/vectord/src/pathway_memory.rs |
| Pathway HTTP handlers | crates/vectord/src/service.rs (bottom) |
| Pathway state on disk | data/_pathway_memory/state.json |
| VCP UI server | ui/server.ts |
| VCP UI client | ui/ui.js + ui/ui.css + ui/index.html |
| Observer | mcp-server/observer.ts |
| Auditor | auditor/audit.ts |
| LLM Team extract client | auditor/fact_extractor.ts |
| ADR-021 spec | docs/DECISIONS.md ADR-021 |
11. Key memory files a fresh session should read
From /root/.claude/projects/-home-profit/memory/:
project_scrum_pipeline.md— updated state of the scrum iterationsproject_first_auto_apply.md— 96b46cd story + cleanup + hardening evidence from iter 7feedback_semantic_correctness_via_matrix.md— J's insight on compounding, the ADR-021 rulefeedback_endpoint_probe_discipline.md— GET 405 is not endpoint validationreference_llm_team_modes.md— onlyextractis registered on port 5000feedback_scrum_cloud_first.md— scrum/audit/enrich pipelines use cloud firstfeedback_cloud_determinism.md— cloud N=3 consensus + qwen3-coder tie-breaker
12. Known gotchas
- Gateway restart needed after Rust route additions.
sudo systemctl restart lakehouse.service— the service is systemd-managed. - UI server needs manual restart after
ui/server.tschanges (no systemd unit). Kill oldbunpid, restart withbun run ui/server.ts &. - LLM Team mode
code_reviewdoesn't exist — onlyextractis registered in/root/llm_team_ui.py. Don't wire new features to "Unknown mode" endpoints. Seereference_llm_team_modes.md. - OpenRouter free-tier 429s during consensus probes are normal (rate-limited upstream). In the production ladder they hit as last-resort rescue with seconds-to-minutes gap; different traffic pattern than rapid-fire consensus runs.
- Openrouter minimax-m2.5:free has a 45s timeout — not in ladder, only for one-off probes.
- Probation period is 3 replays before hot-swap can fire. On a fresh install, no hot-swap fires until a pathway has been re-visited ≥3 times.