Audited stash-clean c7e3124 (30 commits past rerun-1 4840c10).
3 HIGH risks closed (R-002 internal/shared, R-003 internal/storeclient,
R-008 queryd/db.go). 3 advanced to partial (R-001 via fail-loud-bind +
opt-in auth, R-006 via g2_smoke_fixtures, R-007 via ADR-003 auth.go).
Biggest move: Agent Memory Correctness 4 → 9 — pathway Mem0 ops
(ADD/UPDATE/REVISE/RETIRE/HISTORY) all tested, including cycle-detection
and retired-trace-exclusion. Sprint 2 acceptance criteria are now
verified code, not design-bar work.
Two new findings:
- F1 (MED): cmd/{matrixd,observerd,pathwayd}/main_test.go absent —
reopens R-005 against new daemons.
- F2 (LOW): scripts/staffing_*/main.go flag-defaults reach
/home/profit/lakehouse/data/...
Evidence under reports/scrum/_evidence/rerun2/ (local; per
.gitkeep convention).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19 KiB
Audit Re-run #2 — 2026-04-29 (after Phases A–H + matrix §3.4 + workflow §3.8)
Baseline audit: reports/scrum/golang-lakehouse-scrum-test.md at commit 91edd43 — composite 35 / 60.
Rerun-1 head: 4840c10 — composite 43 / 60 (Δ baseline = +8).
Rerun-2 head: c7e3124 — 30 commits past rerun-1. Composite 50 / 60. Δ rerun-1 = +7. Δ baseline = +15.
This is the second delta document. Both prior reports remain immutable history. Working tree was dirty on entry (5 in-flight files under cmd/observerd/ + internal/{observer,workflow}/); audit ran on stashed-clean c7e3124 so the score reflects shipped state, not WIP.
What landed since rerun-1
| Commit | What |
|---|---|
4840c10 |
(rerun-1 baseline — 04_query refresh-tick race fix) |
125e1c8 |
tests close R-002 / R-003 / R-008 — internal/{shared,storeclient,queryd/db} Go tests |
6af0520 |
A: fail-loud on non-loopback bind — closes worst case of R-001 |
423a381 |
D: storaged per-prefix PUT cap — vectord _vectors/ → 4 GiB (ADR-002) |
0d18ffa |
ADR-003: inter-service auth posture — Bearer + IP allowlist |
1ec85b0 |
Batch 2: perf baseline — multi-sample + warmup + MAD threshold |
0f79bce |
Batch 3: cmd/<bin>/main_test.go × 6 — closes R-005 |
fb08232 |
Batch 4: embed fixture-mode — partial R-006 closure |
56844c3 |
embed cache — LRU at /v1/embed for repeat-query elimination |
8f4c16f |
mcpd: Go MCP SDK port — replaces Bun mcp-server tool surface |
fa56134 |
ADR-003 wiring: Bearer token + IP allowlist middleware |
ad1670d |
storaged cap smoke — verifies ADR-002 at 300 MiB |
2a6234f |
ADR-004 + internal/pathway: Mem0 versioned trace substrate |
afbb506 |
pathwayd: HTTP service over internal/pathway · 11/11 smoke gate |
f1c1883 |
vectord BatchAdd — single-lock variadic batch |
71b35fb |
SPEC §1 + §3.4: name matrix indexer as a port target |
a7620c8 |
PRD: name the product vision — small-model pipeline + 5-loop substrate |
c1d96b7 |
matrixd: multi-corpus retrieve+merge — SPEC §3.4 component 2 of 5 |
166470f |
corpusingest: extract reusable text→vector ingest pipeline |
0d1553c |
candidates corpus: first deep-field reality test on real staffing data |
9588bd8 |
matrix relevance filter — SPEC §3.4 component 3 of 5 |
3968ec8 |
matrix strong-model downgrade gate — SPEC §3.4 component 4 of 5 |
a97881d |
workers corpus + multi-corpus reality test — matrix indexer end-to-end |
31b4088 |
multi_corpus_e2e WORKERS_LIMIT knob + embed-text-not-sample-size finding |
06e7152 |
matrix playbook memory + boost — SPEC §3.4 component 5 of 5 (LEARNING LOOP) |
a730fc2 |
scrum fixes: 4 real findings landed, 4 false positives dismissed |
7f42089 |
D: embed-text iteration — clean negative finding (3 variants tested) |
57d0df1 |
E (partial): distillation port — scorer + contamination firewall |
be65f85 |
F: drift quantification — scorer drift first |
b199093 |
B: matrix metadata filter — post-retrieval structured gate |
6392772 |
C: bulk playbook record — operational rating wiring |
bc9ab93 |
H: observerd — autonomous-iteration witness loop (SPEC §2 port) |
97dd3f8 |
SPEC §3.5/§3.6/§3.7/§3.8 — name F/B/C as port targets + Archon-style workflow runner |
e30da6e |
§3.8 first slice: workflow runner skeleton + DAG executor + observerd integration |
c7e3124 |
§3.8 second slice: real modes wired (matrix.relevance/downgrade/search, distillation.score, drift.scorer) |
This is the wave that took the system from "G0+G2 substrate plus 500K validation" to "all five small-model-pipeline loops have at least a first port" (per project_small_model_pipeline_vision.md).
Score delta — double column
Same 6 dimensions, scored 0–10 with citations. Δ R1 = vs rerun-1 (4840c10); Δ Base = vs original audit (91edd43).
| Dimension | Base | R1 | R2 | Δ R1 | Δ Base | Evidence for the move |
|---|---|---|---|---|---|---|
| Reproducibility | 7 | 9 | 9 | 0 | +2 | just verify PASS in 31s wall (_evidence/rerun2/just_verify.log) — vet + 30 packages of go test -short + 9 core smokes. just doctor all-green for go/gcc/minio/ollama/secrets. 8 additional domain smokes also PASS (pathway, matrix, relevance, downgrade, observer, playbook, workflow, storaged_cap → _evidence/rerun2/smoke_*.log). New recipes: smoke-g2-fixtures (R-006 partial close) + smoke-storaged-cap. Still −1: no .github/workflows/; no fixture-mode for storage (only embed). |
| Test Coverage | 6 | 8 | 9 | +1 | +3 | 321 Go test functions across 40 test files (was 13 at baseline, ~77 at R1 — 3× the test surface). internal/shared has 4 test files (auth_test.go, bind_test.go, config_test.go, server_test.go); internal/storeclient/client_test.go exists; internal/queryd/db_test.go + registrar_test.go exist — R-002 / R-003 / R-008 all closed. Six original cmd binaries now have main_test.go (catalogd/embedd/ingestd/queryd/storaged/vectord) — R-005 mostly closed. Still −1: cmd/{matrixd,observerd,pathwayd,fake_ollama}/main_test.go absent — three of those are new daemons that need wiring tests. |
| Trust Boundary Safety | 7 | 7 | 9 | +2 | +2 | ADR-003 shipped (docs/DECISIONS.md §3): internal/shared/auth.go 64-line Bearer middleware with constant-time compare via crypto/subtle + IP allowlist (internal/shared/auth.go:62-64). 4 auth tests in auth_test.go cover wrong-token, raw-token-without-prefix, IP-only, both-required (internal/shared/auth_test.go:77,86,108,162). redactCreds still scrubs S3 keys from queryd error chain (internal/queryd/db.go). One fmt.Sprintf SQL site remains (internal/queryd/registrar.go:153) — properly escaped via quoteIdent + sqlEscape. 13 MaxBytesReader sites in cmd/, 5 loopback bindings. Still −1: auth is opt-in (empty token = G0 dev mode); no CORS posture (R-010); 2 /home/profit/lakehouse/... paths in scripts/staffing_*/main.go flag-defaults. |
| Agent Memory Correctness | 3 | 4 | 9 | +5 | +6 | All five SPEC §3.4 components shipped: corpus builders (internal/corpusingest), retrieve+merge (matrixd /matrix/search), relevance filter (internal/matrix/relevance.go 376 LoC + 289 LoC test), strong-model downgrade gate (internal/matrix/downgrade.go 137 LoC + 100 LoC test), playbook memory + boost (internal/matrix/playbook.go 196 LoC + 180 LoC test) — including the learning loop. Pathway substrate ratified (ADR-004, internal/pathway/store.go 381 LoC + 398 LoC test). Mem0-style ops all proven: TestAdd_AssignsUIDAndTimestamps, TestUpdate_ReplacesContentSameUID, TestRevise_LinksToPredecessorViaHistory, TestRevise_ChainOfThree_BackwardWalk, TestRetire_ExcludedFromSearch, TestRetire_StillAccessibleViaGet, TestHistory_CycleDetected, TestHistory_PredecessorMissing_TruncatesChain, TestAddIdempotent_RejectsEmptyUID — every Sprint 2 design-bar acceptance has a test. Observer ported (internal/observer/store.go 249 LoC + 193 LoC test). pathway smoke 11/11. Still −1: distillation port partial (scorer + firewall only — 57d0df1 "E (partial)"); drift is "scorer drift first" (be65f85) not full quantification. |
| Deployment Readiness | 4 | 5 | 5 | 0 | +1 | just doctor actionable per-dep install (scripts/doctor.sh); just install-hooks documented; pre-push hook still installed. Still −5: no REPLICATION.md, no secrets-go.toml.example, no deploy/systemd/*.service, no Dockerfile, no readiness vs. liveness split. Sprint 4 stories all open. |
| Maintainability | 8 | 8 | 9 | +1 | +1 | 4 ADRs ratified (was 1 at R1): ADR-001 foundational, ADR-002 storaged per-prefix cap, ADR-003 auth posture, ADR-004 pathway data model — the auth + cap + memory-model decisions are locked before downstream code retrofits them. Every binary still 100–400 LoC (no god-files). Per-package test files: every internal/ package has ≥1 test file (was: 5 packages had zero at baseline). CLAUDE_REFACTOR_GUARDRAILS.md codifies the maintenance discipline. tests/proof/FINAL_REPORT.md answers the 9 mandated questions. Still −1: no CONTRIBUTING.md; the proof harness adds 24-claim maintenance surface that needs keeping current. |
Composite: 35 → 43 → 50. 83% of max.
Code surface delta
| Metric | Baseline (91edd43) |
R1 (4840c10) |
R2 (c7e3124) |
Δ R1 |
|---|---|---|---|---|
| Total Go LoC | ~6,587 | ~7,800 (est) | 19,381 | ~2.5× |
| Go files | ~50 | ~62 | 93 | +31 |
| Test files | 13 | ~22 | 40 | +18 |
| Go test functions | ~77 | ~109 | 321 | +212 |
cmd/<bin>/ |
7 | 7 | 12 | +5 |
internal/<pkg>/ |
11 | 11 | 18 | +7 |
| Smoke scripts | 9 | 9 | 21 | +12 |
| ADRs ratified | 0 | 1 | 4 | +3 |
| Routes (cmd-level) | ~22 | ~22 | 37 | +15 |
| Untested cmd binaries | 6 / 7 | 6 / 7 | 4 / 12 | −2 abs, −1/3 ratio |
The wave is substrate-bearing, not throughput-bearing. Every internal package has tests; the gap is now the wiring layer for the 3 new daemons.
Risk register status updates
12 risks in reports/scrum/risk-register.md. Status table at c7e3124:
| Risk | Severity | Before R2 | After R2 | Evidence |
|---|---|---|---|---|
| R-001 queryd /sql RCE-eq off-loopback | HIGH | open | partial | 6af0520 fail-loud on non-loopback bind (closes worst case); ADR-003 + internal/shared/auth.go available to wrap; but auth is opt-in — needs deploy story decision before fully closing |
| R-002 internal/shared zero tests | HIGH | open | CLOSED | 4 test files (auth_test.go + bind_test.go + config_test.go + server_test.go), all PASS in just verify |
| R-003 internal/storeclient zero tests | HIGH | open | CLOSED | internal/storeclient/client_test.go, PASS |
| R-004 smokes not gated | MED | closed (R1) | CLOSED | unchanged from R1 |
| R-005 6/7 cmd/main.go untested | MED | partial | partial | 6 of original 7 closed (0f79bce Batch 3); 4 new daemons (fake_ollama/matrixd/observerd/pathwayd) reopen the gap on different surface |
| R-006 no fixture-only smokes | MED | open | partial | scripts/g2_smoke_fixtures.sh (fb08232) closes embed half via fake_ollama; storage half deferred |
| R-007 zero auth middleware | MED | open | partial | internal/shared/auth.go shipped with 4 tests (fa56134); opt-in by default until deploy posture decision |
| R-008 queryd/db.go untested | MED | open | CLOSED | internal/queryd/db_test.go + registrar_test.go (125e1c8) |
| R-009 registrar.go fmt.Sprintf SQL | LOW | open | open | unchanged — escaping via quoteIdent+sqlEscape is correct, regression test still missing |
| R-010 no CORS posture | LOW | open | open | unchanged — no Access-Control-* headers anywhere |
| R-011 g2 smoke model assertion | LOW | note | note | unchanged |
| R-012 empty tests/ dir | LOW | closed (R1) | CLOSED | unchanged from R1 |
Net since R1: 3 closed (R-002, R-003, R-008), 3 advanced to partial (R-001, R-006, R-007), R-005 stays partial on different surface, 3 unchanged.
Sprint backlog progress
Sprint 0 — Reproducibility Gate
| Story | R1 | R2 |
|---|---|---|
S0.1 just doctor |
DONE | DONE |
S0.2 just smoke-fixtures |
open | partial (smoke-g2-fixtures) |
S0.3 just verify + pre-push |
DONE | DONE |
S0.4 cmd/<bin>/main_test.go × 6 |
partial | partial → mostly DONE (6 of original 7; 3 new daemons absent) |
| S0.5 internal/shared, storeclient, queryd/db tests | open | DONE |
S0.6 tests/ dir cleanup |
DONE | DONE |
4 of 6 done, 2 partial. Highest-leverage open work: tests for the 3 new daemons + storage-half of fixture mode.
Sprint 1 — Trust Boundary Gate
- Replace SQL string interp with parameterized: still 1 site, properly escaped (R-009 LOW)
- Observer fail-open →
degraded/cycle: not yet codified — observer is ported but ADR-002-style fail-safe ADR not written - Auth/localhost-only guardrails: shipped (ADR-003 + auth.go), opt-in posture
- Schema validation per public endpoint: per-handler validation exists (validateKey etc.); not framework-level
Status: ~60% of Sprint 1 closed, observer fail-safe semantics ADR is the outstanding doc-only piece.
Sprint 2 — Memory Correctness Gate
| Story | R1 | R2 |
|---|---|---|
| ADD/UPDATE/REVISE/RETIRE/HISTORY tests | design-bar | DONE (internal/pathway/store_test.go) |
| Cycle detection tests | design-bar | DONE (TestHistory_CycleDetected) |
| Retired-trace exclusion tests | design-bar | DONE (TestRetire_ExcludedFromSearch) |
| Duplicate trace replay_count tests | design-bar | partial (TestAddIdempotent_RejectsEmptyUID; replay_count semantics) |
| Corrupted memory row recovery test | design-bar | open |
Status: Sprint 2 acceptance criteria mostly green — the core invariants are tested. Audit/event receipt on every memory mutation is the missing piece.
Sprint 3 — Agent Loop Reality Gate
- Deterministic mini corpus:
tests/proof/fixtures/exists - search → verify → observer review → playbook seal → second-run retrieval:
scripts/multi_corpus_e2e.sh+scripts/playbook_smoke.shexercise this; full chain viascripts/workflow_smoke.sh - Negative case observer rejects hallucinated claim: covered by observer_smoke (semantics open for review)
- Health endpoint content-type regression: covered by proof harness
00_health
Status: Sprint 3 has working substrate; explicit "single command proves the full loop" with input/output/verdict/receipt evidence is partial.
Sprint 4 — Deployment Gate
Status: unchanged from R1. No REPLICATION.md, no .env.example, no *.service units, no Dockerfile. just doctor is the closest piece. This is the largest open Sprint.
New findings from this rerun
Two real findings worth recording.
F1 — 3 new daemons lack cmd/<bin>/main_test.go
- Where:
cmd/matrixd/,cmd/observerd/,cmd/pathwayd/ - What: Same gap-class as R-005 was, just on net-new code. Each daemon mounts ≥4 routes (matrixd: 6, observerd: 4, pathwayd: 9 → 19 routes total) with no wiring test.
- Severity: MEDIUM. The internal packages backing each daemon (
internal/matrix,internal/observer,internal/pathway) have full unit tests — but no test provescmd/pathwayd/main.goactually wires/pathway/reviseto(*pathway.Store).Revise. A handler-rename refactor would silently break the route surface. - Action: Re-open R-005 against the new daemons. ~1 hr to add three
main_test.gofiles patterned oncmd/storaged/main_test.go.
F2 — scripts/staffing_*/main.go has hardcoded data paths in flag defaults
- Where:
scripts/staffing_candidates/main.go:217andscripts/staffing_workers/main.go:269reference/home/profit/lakehouse/data/datasets/{candidates,workers_500k}.parquet. - What: Flag defaults reach into the Rust legacy tree at
/home/profit/lakehouse/.... Throwaway driver scripts (not services), and the values are flag-overridable, but they couple the Go repo to the Rust filesystem layout. - Severity: LOW. Doesn't affect any service. Worth noting because audit Sprint 4 explicitly calls out "no hardcoded
/home/profitpaths" as an acceptance criterion. - Action: Either move the parquet under
golangLAKEHOUSE/data/(preferred for self-containment) or document the cross-tree dependency inRESEARCH_LOG_2026-04-28.mdand accept it.
What this rerun does NOT change
- Sprint 4 (deployment) remains the largest open gap. R-1 said this; R-2 says this; without
REPLICATION.md+ systemd units, the cutover from Rust atdevop.live/lakehouse/(G5) cannot be operator-validated. - Auth is opt-in. Empty-token default is fine for G0 development but means the moment any Go binary binds non-loopback in prod, a posture decision is required. R-001 + R-007 cannot fully close until that decision is recorded.
- CORS posture (R-010) is still unspecified. The Bun-served Rust UI handles browser CORS today; if a Go service ever fronts a browser, this needs a decision.
- Distillation and drift are first-port-only.
57d0df1ships scorer + contamination firewall (E partial);be65f85ships scorer-drift only (F first slice). The full distillation pipeline (sample export, audit_baselines lineage) and full drift signal are not yet ported.
Recommended next moves (ordered by leverage / cost)
- Three
main_test.gofiles formatrixd+observerd+pathwayd(~1 hr). Closes the regenerated R-005, ratchets every future route addition throughjust verify. - ADR-005: observer fail-safe semantics (~30 min, doc-only). The observer is ported (
internal/observer/store.go), but the upstream "verdict:accept on crash" anti-pattern still has no Go-side decision locked. Doing this now is half the cost of doing it after a regression. - Auth posture decision for non-loopback deploy (~1 hr, ADR or annotated decision in
RESEARCH_LOG). Locks R-001 + R-007 from "opt-in middleware exists" to "wired-by-default for X, opt-in for Y". Required input for any G5 cutover plan. - Sprint 4 minimal first slice (~3 hr):
secrets-go.toml.example+deploy/systemd/<bin>.service.tmpl× 12 binaries +REPLICATION.mdskeleton. Highest-leverage Sprint 4 starter; the systemd units mostly mirror Rust's layout. - Storage-half of fixture mode (~3 hr):
MockS3Storageinterface satisfyinginternal/storaged.Bucket, smoke variant that points storaged at it. Closes R-006 fully and decouples CI from MinIO.
The remaining items (full drift port, full distillation port, observer audit-event receipt, corrupted-memory recovery test) are real engineering — Sprint 2/3 followups, not Sprint-0 polish.
Methodology note — same as prior reports
All claims cite a file, line, or command. Evidence captured under reports/scrum/_evidence/rerun2/:
just_verify.log— full vet + 30 packages ×go test -short+ 9 core smokes, exit 0, 31s walljust_doctor.log— 5 dependency probes, all greengovet.log—go vet ./...exit 0gotest_short.log— full short-test passjust_list.log— recipe inventorysmoke_{pathway,matrix,relevance,downgrade,observer,playbook,workflow,storaged_cap}.log— 8 additional domain smokes, all PASS
What was NOT inspected this round (deferred):
- Cross-binary failure cascades (kill matrixd mid-search, observe observerd state) — Sprint 1 follow-up
- Supply-chain audit of go.sum diffs since R1
- Performance regression vs the perf baseline shipped in
1ec85b0—just proof performanceexists, not run here
Rerun-2 produced under the same "no vibes" rule as the original audit. The 50/60 reflects what's verifiably shipped at c7e3124, not what's planned. Working tree restored from stash after audit completion.