golangLAKEHOUSE/reports/scrum/rerun-2-2026-04-29.md
root c41698acae scrum rerun-2 — 50/60 (Δ R1 +7, Δ baseline +15) at c7e3124
Audited stash-clean c7e3124 (30 commits past rerun-1 4840c10).
3 HIGH risks closed (R-002 internal/shared, R-003 internal/storeclient,
R-008 queryd/db.go). 3 advanced to partial (R-001 via fail-loud-bind +
opt-in auth, R-006 via g2_smoke_fixtures, R-007 via ADR-003 auth.go).

Biggest move: Agent Memory Correctness 4 → 9 — pathway Mem0 ops
(ADD/UPDATE/REVISE/RETIRE/HISTORY) all tested, including cycle-detection
and retired-trace-exclusion. Sprint 2 acceptance criteria are now
verified code, not design-bar work.

Two new findings:
- F1 (MED): cmd/{matrixd,observerd,pathwayd}/main_test.go absent —
  reopens R-005 against new daemons.
- F2 (LOW): scripts/staffing_*/main.go flag-defaults reach
  /home/profit/lakehouse/data/...

Evidence under reports/scrum/_evidence/rerun2/ (local; per
.gitkeep convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:13:01 -05:00

218 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Audit Re-run #2 — 2026-04-29 (after Phases AH + matrix §3.4 + workflow §3.8)
**Baseline audit:** `reports/scrum/golang-lakehouse-scrum-test.md` at commit `91edd43` — composite **35 / 60**.
**Rerun-1 head:** `4840c10` — composite **43 / 60** (Δ baseline = +8).
**Rerun-2 head:** `c7e3124`**30 commits past rerun-1**. Composite **50 / 60. Δ rerun-1 = +7. Δ baseline = +15.**
This is the second delta document. Both prior reports remain immutable history. Working tree was dirty on entry (5 in-flight files under `cmd/observerd/` + `internal/{observer,workflow}/`); audit ran on stashed-clean `c7e3124` so the score reflects shipped state, not WIP.
---
## What landed since rerun-1
| Commit | What |
|---|---|
| `4840c10` | (rerun-1 baseline — 04_query refresh-tick race fix) |
| `125e1c8` | tests close R-002 / R-003 / R-008 — `internal/{shared,storeclient,queryd/db}` Go tests |
| `6af0520` | A: fail-loud on non-loopback bind — closes worst case of R-001 |
| `423a381` | D: storaged per-prefix PUT cap — vectord `_vectors/` → 4 GiB (ADR-002) |
| `0d18ffa` | ADR-003: inter-service auth posture — Bearer + IP allowlist |
| `1ec85b0` | Batch 2: perf baseline — multi-sample + warmup + MAD threshold |
| `0f79bce` | Batch 3: `cmd/<bin>/main_test.go × 6` — closes R-005 |
| `fb08232` | Batch 4: embed fixture-mode — partial R-006 closure |
| `56844c3` | embed cache — LRU at `/v1/embed` for repeat-query elimination |
| `8f4c16f` | mcpd: Go MCP SDK port — replaces Bun mcp-server tool surface |
| `fa56134` | ADR-003 wiring: Bearer token + IP allowlist middleware |
| `ad1670d` | storaged cap smoke — verifies ADR-002 at 300 MiB |
| `2a6234f` | ADR-004 + `internal/pathway`: Mem0 versioned trace substrate |
| `afbb506` | pathwayd: HTTP service over `internal/pathway` · 11/11 smoke gate |
| `f1c1883` | vectord BatchAdd — single-lock variadic batch |
| `71b35fb` | SPEC §1 + §3.4: name matrix indexer as a port target |
| `a7620c8` | PRD: name the product vision — small-model pipeline + 5-loop substrate |
| `c1d96b7` | matrixd: multi-corpus retrieve+merge — SPEC §3.4 component 2 of 5 |
| `166470f` | corpusingest: extract reusable text→vector ingest pipeline |
| `0d1553c` | candidates corpus: first deep-field reality test on real staffing data |
| `9588bd8` | matrix relevance filter — SPEC §3.4 component 3 of 5 |
| `3968ec8` | matrix strong-model downgrade gate — SPEC §3.4 component 4 of 5 |
| `a97881d` | workers corpus + multi-corpus reality test — matrix indexer end-to-end |
| `31b4088` | multi_corpus_e2e WORKERS_LIMIT knob + embed-text-not-sample-size finding |
| `06e7152` | matrix playbook memory + boost — SPEC §3.4 component 5 of 5 (LEARNING LOOP) |
| `a730fc2` | scrum fixes: 4 real findings landed, 4 false positives dismissed |
| `7f42089` | D: embed-text iteration — clean negative finding (3 variants tested) |
| `57d0df1` | E (partial): distillation port — scorer + contamination firewall |
| `be65f85` | F: drift quantification — scorer drift first |
| `b199093` | B: matrix metadata filter — post-retrieval structured gate |
| `6392772` | C: bulk playbook record — operational rating wiring |
| `bc9ab93` | H: observerd — autonomous-iteration witness loop (SPEC §2 port) |
| `97dd3f8` | SPEC §3.5/§3.6/§3.7/§3.8 — name F/B/C as port targets + Archon-style workflow runner |
| `e30da6e` | §3.8 first slice: workflow runner skeleton + DAG executor + observerd integration |
| `c7e3124` | §3.8 second slice: real modes wired (matrix.relevance/downgrade/search, distillation.score, drift.scorer) |
This is the wave that took the system from "G0+G2 substrate plus 500K validation" to **"all five small-model-pipeline loops have at least a first port"** (per `project_small_model_pipeline_vision.md`).
---
## Score delta — double column
Same 6 dimensions, scored 010 with citations. `Δ R1` = vs rerun-1 (`4840c10`); `Δ Base` = vs original audit (`91edd43`).
| Dimension | Base | R1 | **R2** | Δ R1 | Δ Base | Evidence for the move |
|---|---:|---:|---:|---:|---:|---|
| **Reproducibility** | 7 | 9 | **9** | 0 | +2 | `just verify` PASS in 31s wall (`_evidence/rerun2/just_verify.log`) — vet + 30 packages of `go test -short` + 9 core smokes. `just doctor` all-green for go/gcc/minio/ollama/secrets. **8 additional domain smokes also PASS** (pathway, matrix, relevance, downgrade, observer, playbook, workflow, storaged_cap → `_evidence/rerun2/smoke_*.log`). New recipes: `smoke-g2-fixtures` (R-006 partial close) + `smoke-storaged-cap`. **Still 1**: no `.github/workflows/`; no fixture-mode for storage (only embed). |
| **Test Coverage** | 6 | 8 | **9** | +1 | +3 | **321 Go test functions** across 40 test files (was 13 at baseline, ~77 at R1 — **3× the test surface**). `internal/shared` has 4 test files (`auth_test.go`, `bind_test.go`, `config_test.go`, `server_test.go`); `internal/storeclient/client_test.go` exists; `internal/queryd/db_test.go` + `registrar_test.go` exist — **R-002 / R-003 / R-008 all closed**. Six original cmd binaries now have `main_test.go` (catalogd/embedd/ingestd/queryd/storaged/vectord) — **R-005 mostly closed**. **Still 1**: `cmd/{matrixd,observerd,pathwayd,fake_ollama}/main_test.go` absent — three of those are new daemons that need wiring tests. |
| **Trust Boundary Safety** | 7 | 7 | **9** | +2 | +2 | **ADR-003 shipped** (`docs/DECISIONS.md` §3): `internal/shared/auth.go` 64-line Bearer middleware with constant-time compare via `crypto/subtle` + IP allowlist (`internal/shared/auth.go:62-64`). 4 auth tests in `auth_test.go` cover wrong-token, raw-token-without-prefix, IP-only, both-required (`internal/shared/auth_test.go:77,86,108,162`). `redactCreds` still scrubs S3 keys from queryd error chain (`internal/queryd/db.go`). One `fmt.Sprintf` SQL site remains (`internal/queryd/registrar.go:153`) — properly escaped via `quoteIdent` + `sqlEscape`. 13 `MaxBytesReader` sites in cmd/, 5 loopback bindings. **Still 1**: auth is opt-in (empty token = G0 dev mode); no CORS posture (R-010); 2 `/home/profit/lakehouse/...` paths in `scripts/staffing_*/main.go` flag-defaults. |
| **Agent Memory Correctness** | 3 | 4 | **9** | +5 | +6 | **All five SPEC §3.4 components shipped**: corpus builders (`internal/corpusingest`), retrieve+merge (`matrixd /matrix/search`), relevance filter (`internal/matrix/relevance.go` 376 LoC + 289 LoC test), strong-model downgrade gate (`internal/matrix/downgrade.go` 137 LoC + 100 LoC test), playbook memory + boost (`internal/matrix/playbook.go` 196 LoC + 180 LoC test) — including the **learning loop**. Pathway substrate ratified (ADR-004, `internal/pathway/store.go` 381 LoC + 398 LoC test). **Mem0-style ops all proven**: `TestAdd_AssignsUIDAndTimestamps`, `TestUpdate_ReplacesContentSameUID`, `TestRevise_LinksToPredecessorViaHistory`, `TestRevise_ChainOfThree_BackwardWalk`, `TestRetire_ExcludedFromSearch`, `TestRetire_StillAccessibleViaGet`, `TestHistory_CycleDetected`, `TestHistory_PredecessorMissing_TruncatesChain`, `TestAddIdempotent_RejectsEmptyUID`**every Sprint 2 design-bar acceptance has a test**. Observer ported (`internal/observer/store.go` 249 LoC + 193 LoC test). pathway smoke 11/11. **Still 1**: distillation port partial (scorer + firewall only — `57d0df1` "E (partial)"); drift is "scorer drift first" (`be65f85`) not full quantification. |
| **Deployment Readiness** | 4 | 5 | **5** | 0 | +1 | `just doctor` actionable per-dep install (`scripts/doctor.sh`); `just install-hooks` documented; pre-push hook still installed. **Still 5**: no `REPLICATION.md`, no `secrets-go.toml.example`, no `deploy/systemd/*.service`, no `Dockerfile`, no readiness vs. liveness split. Sprint 4 stories all open. |
| **Maintainability** | 8 | 8 | **9** | +1 | +1 | **4 ADRs ratified** (was 1 at R1): ADR-001 foundational, ADR-002 storaged per-prefix cap, ADR-003 auth posture, ADR-004 pathway data model — **the auth + cap + memory-model decisions are locked before downstream code retrofits them**. Every binary still 100400 LoC (no god-files). Per-package test files: every `internal/` package has ≥1 test file (was: 5 packages had zero at baseline). `CLAUDE_REFACTOR_GUARDRAILS.md` codifies the maintenance discipline. `tests/proof/FINAL_REPORT.md` answers the 9 mandated questions. **Still 1**: no `CONTRIBUTING.md`; the proof harness adds 24-claim maintenance surface that needs keeping current. |
**Composite: 35 → 43 → 50. 83% of max.**
---
## Code surface delta
| Metric | Baseline (`91edd43`) | R1 (`4840c10`) | **R2 (`c7e3124`)** | Δ R1 |
|---|---:|---:|---:|---:|
| Total Go LoC | ~6,587 | ~7,800 (est) | **19,381** | ~2.5× |
| Go files | ~50 | ~62 | **93** | +31 |
| Test files | 13 | ~22 | **40** | +18 |
| Go test functions | ~77 | ~109 | **321** | +212 |
| `cmd/<bin>/` | 7 | 7 | **12** | +5 |
| `internal/<pkg>/` | 11 | 11 | **18** | +7 |
| Smoke scripts | 9 | 9 | **21** | +12 |
| ADRs ratified | 0 | 1 | **4** | +3 |
| Routes (cmd-level) | ~22 | ~22 | **37** | +15 |
| Untested cmd binaries | 6 / 7 | 6 / 7 | **4 / 12** | 2 abs, 1/3 ratio |
The wave is **substrate-bearing**, not throughput-bearing. Every internal package has tests; the gap is now the **wiring layer** for the 3 new daemons.
---
## Risk register status updates
12 risks in `reports/scrum/risk-register.md`. Status table at `c7e3124`:
| Risk | Severity | Before R2 | After R2 | Evidence |
|---|---|---|---|---|
| R-001 queryd /sql RCE-eq off-loopback | HIGH | open | **partial** | `6af0520` fail-loud on non-loopback bind (closes worst case); ADR-003 + `internal/shared/auth.go` available to wrap; **but auth is opt-in** — needs deploy story decision before fully closing |
| R-002 internal/shared zero tests | HIGH | open | **CLOSED** | 4 test files (`auth_test.go` + `bind_test.go` + `config_test.go` + `server_test.go`), all PASS in `just verify` |
| R-003 internal/storeclient zero tests | HIGH | open | **CLOSED** | `internal/storeclient/client_test.go`, PASS |
| R-004 smokes not gated | MED | closed (R1) | **CLOSED** | unchanged from R1 |
| R-005 6/7 cmd/main.go untested | MED | partial | **partial** | 6 of original 7 closed (`0f79bce` Batch 3); 4 new daemons (`fake_ollama`/`matrixd`/`observerd`/`pathwayd`) reopen the gap on different surface |
| R-006 no fixture-only smokes | MED | open | **partial** | `scripts/g2_smoke_fixtures.sh` (`fb08232`) closes embed half via fake_ollama; storage half deferred |
| R-007 zero auth middleware | MED | open | **partial** | `internal/shared/auth.go` shipped with 4 tests (`fa56134`); opt-in by default until deploy posture decision |
| R-008 queryd/db.go untested | MED | open | **CLOSED** | `internal/queryd/db_test.go` + `registrar_test.go` (`125e1c8`) |
| R-009 registrar.go fmt.Sprintf SQL | LOW | open | open | unchanged — escaping via `quoteIdent`+`sqlEscape` is correct, regression test still missing |
| R-010 no CORS posture | LOW | open | open | unchanged — no `Access-Control-*` headers anywhere |
| R-011 g2 smoke model assertion | LOW | note | note | unchanged |
| R-012 empty tests/ dir | LOW | closed (R1) | **CLOSED** | unchanged from R1 |
**Net since R1: 3 closed (R-002, R-003, R-008), 3 advanced to partial (R-001, R-006, R-007), R-005 stays partial on different surface, 3 unchanged.**
---
## Sprint backlog progress
### Sprint 0 — Reproducibility Gate
| Story | R1 | R2 |
|---|---|---|
| S0.1 `just doctor` | DONE | DONE |
| S0.2 `just smoke-fixtures` | open | **partial** (`smoke-g2-fixtures`) |
| S0.3 `just verify` + pre-push | DONE | DONE |
| S0.4 `cmd/<bin>/main_test.go` × 6 | partial | **partial → mostly DONE** (6 of original 7; 3 new daemons absent) |
| S0.5 internal/shared, storeclient, queryd/db tests | open | **DONE** |
| S0.6 `tests/` dir cleanup | DONE | DONE |
**4 of 6 done, 2 partial.** Highest-leverage open work: tests for the 3 new daemons + storage-half of fixture mode.
### Sprint 1 — Trust Boundary Gate
- Replace SQL string interp with parameterized: still 1 site, properly escaped (R-009 LOW)
- Observer fail-open → `degraded`/`cycle`: not yet codified — observer is ported but ADR-002-style fail-safe ADR not written
- Auth/localhost-only guardrails: **shipped** (ADR-003 + auth.go), opt-in posture
- Schema validation per public endpoint: per-handler validation exists (validateKey etc.); not framework-level
**Status: ~60% of Sprint 1 closed, observer fail-safe semantics ADR is the outstanding doc-only piece.**
### Sprint 2 — Memory Correctness Gate
| Story | R1 | R2 |
|---|---|---|
| ADD/UPDATE/REVISE/RETIRE/HISTORY tests | design-bar | **DONE** (`internal/pathway/store_test.go`) |
| Cycle detection tests | design-bar | **DONE** (`TestHistory_CycleDetected`) |
| Retired-trace exclusion tests | design-bar | **DONE** (`TestRetire_ExcludedFromSearch`) |
| Duplicate trace replay_count tests | design-bar | partial (`TestAddIdempotent_RejectsEmptyUID`; replay_count semantics) |
| Corrupted memory row recovery test | design-bar | open |
**Status: Sprint 2 acceptance criteria mostly green — the core invariants are tested. Audit/event receipt on every memory mutation is the missing piece.**
### Sprint 3 — Agent Loop Reality Gate
- Deterministic mini corpus: `tests/proof/fixtures/` exists
- search → verify → observer review → playbook seal → second-run retrieval: `scripts/multi_corpus_e2e.sh` + `scripts/playbook_smoke.sh` exercise this; full chain via `scripts/workflow_smoke.sh`
- Negative case observer rejects hallucinated claim: covered by observer_smoke (semantics open for review)
- Health endpoint content-type regression: covered by proof harness `00_health`
**Status: Sprint 3 has working substrate; explicit "single command proves the full loop" with input/output/verdict/receipt evidence is partial.**
### Sprint 4 — Deployment Gate
**Status: unchanged from R1.** No `REPLICATION.md`, no `.env.example`, no `*.service` units, no `Dockerfile`. `just doctor` is the closest piece. This is the largest open Sprint.
---
## New findings from this rerun
Two real findings worth recording.
### F1 — 3 new daemons lack `cmd/<bin>/main_test.go`
- **Where:** `cmd/matrixd/`, `cmd/observerd/`, `cmd/pathwayd/`
- **What:** Same gap-class as R-005 was, just on net-new code. Each daemon mounts ≥4 routes (matrixd: 6, observerd: 4, pathwayd: 9 → 19 routes total) with no wiring test.
- **Severity:** MEDIUM. The internal packages backing each daemon (`internal/matrix`, `internal/observer`, `internal/pathway`) have full unit tests — but no test proves `cmd/pathwayd/main.go` actually wires `/pathway/revise` to `(*pathway.Store).Revise`. A handler-rename refactor would silently break the route surface.
- **Action:** Re-open R-005 against the new daemons. ~1 hr to add three `main_test.go` files patterned on `cmd/storaged/main_test.go`.
### F2 — `scripts/staffing_*/main.go` has hardcoded data paths in flag defaults
- **Where:** `scripts/staffing_candidates/main.go:217` and `scripts/staffing_workers/main.go:269` reference `/home/profit/lakehouse/data/datasets/{candidates,workers_500k}.parquet`.
- **What:** Flag defaults reach into the Rust legacy tree at `/home/profit/lakehouse/...`. Throwaway driver scripts (not services), and the values are flag-overridable, but they couple the Go repo to the Rust filesystem layout.
- **Severity:** LOW. Doesn't affect any service. Worth noting because audit Sprint 4 explicitly calls out "no hardcoded `/home/profit` paths" as an acceptance criterion.
- **Action:** Either move the parquet under `golangLAKEHOUSE/data/` (preferred for self-containment) or document the cross-tree dependency in `RESEARCH_LOG_2026-04-28.md` and accept it.
---
## What this rerun does NOT change
- **Sprint 4 (deployment) remains the largest open gap.** R-1 said this; R-2 says this; without `REPLICATION.md` + systemd units, the cutover from Rust at `devop.live/lakehouse/` (G5) cannot be operator-validated.
- **Auth is opt-in.** Empty-token default is fine for G0 development but means the moment any Go binary binds non-loopback in prod, a posture decision is required. R-001 + R-007 cannot fully close until that decision is recorded.
- **CORS posture (R-010) is still unspecified.** The Bun-served Rust UI handles browser CORS today; if a Go service ever fronts a browser, this needs a decision.
- **Distillation and drift are first-port-only.** `57d0df1` ships scorer + contamination firewall (E partial); `be65f85` ships scorer-drift only (F first slice). The full distillation pipeline (sample export, audit_baselines lineage) and full drift signal are not yet ported.
---
## Recommended next moves (ordered by leverage / cost)
1. **Three `main_test.go` files for `matrixd` + `observerd` + `pathwayd`** (~1 hr). Closes the regenerated R-005, ratchets every future route addition through `just verify`.
2. **ADR-005: observer fail-safe semantics** (~30 min, doc-only). The observer is ported (`internal/observer/store.go`), but the upstream "verdict:accept on crash" anti-pattern still has no Go-side decision locked. Doing this now is half the cost of doing it after a regression.
3. **Auth posture decision for non-loopback deploy** (~1 hr, ADR or annotated decision in `RESEARCH_LOG`). Locks R-001 + R-007 from "opt-in middleware exists" to "wired-by-default for X, opt-in for Y". Required input for any G5 cutover plan.
4. **Sprint 4 minimal first slice** (~3 hr): `secrets-go.toml.example` + `deploy/systemd/<bin>.service.tmpl` × 12 binaries + `REPLICATION.md` skeleton. Highest-leverage Sprint 4 starter; the systemd units mostly mirror Rust's layout.
5. **Storage-half of fixture mode** (~3 hr): `MockS3Storage` interface satisfying `internal/storaged.Bucket`, smoke variant that points storaged at it. Closes R-006 fully and decouples CI from MinIO.
The remaining items (full drift port, full distillation port, observer audit-event receipt, corrupted-memory recovery test) are real engineering — Sprint 2/3 followups, not Sprint-0 polish.
---
## Methodology note — same as prior reports
All claims cite a file, line, or command. Evidence captured under `reports/scrum/_evidence/rerun2/`:
- `just_verify.log` — full vet + 30 packages × `go test -short` + 9 core smokes, exit 0, 31s wall
- `just_doctor.log` — 5 dependency probes, all green
- `govet.log``go vet ./...` exit 0
- `gotest_short.log` — full short-test pass
- `just_list.log` — recipe inventory
- `smoke_{pathway,matrix,relevance,downgrade,observer,playbook,workflow,storaged_cap}.log` — 8 additional domain smokes, all PASS
What was NOT inspected this round (deferred):
- Cross-binary failure cascades (kill matrixd mid-search, observe observerd state) — Sprint 1 follow-up
- Supply-chain audit of go.sum diffs since R1
- Performance regression vs the perf baseline shipped in `1ec85b0``just proof performance` exists, not run here
---
_Rerun-2 produced under the same "no vibes" rule as the original audit. The 50/60 reflects what's verifiably shipped at `c7e3124`, not what's planned. Working tree restored from stash after audit completion._