Adapts docs/SCRUM.md framework (originally written for the matrix-agent-validated repo) to the Go rewrite. Five deliverables: golang-lakehouse-scrum-test.md top-line + scoring + verdict risk-register.md 12 findings, R-001..R-012 claim-coverage-table.md claim/test/risk for Sprint 2 sprint-backlog.md 5 sprints, ~2 weeks of work acceptance-gates.md DoD as runnable commands Every claim cites file:line, command output, or "missing evidence." Smoke chain ran clean (33s wall, all 9 PASS) and is captured in reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact). Scoring: Reproducibility 7/10 9 smokes deterministic, no just/CI gate Test Coverage 6/10 internal/ packages tested, 6/7 cmd/ aren't Trust Boundary 7/10 escapes ok, zero auth, /sql is RCE-eq off-loopback Memory Correctness 3/10 pathway/playbook/observer not yet ported Deployment Readiness 4/10 no REPLICATION, no env template, no systemd Maintainability 8/10 no god-files, 7 lean binaries, ADRs current Top three risks: R-001 HIGH queryd /sql + DuckDB + non-loopback bind = RCE-equivalent R-002 HIGH internal/shared (server.go + config.go) zero tests R-003 HIGH internal/storeclient zero tests, used by 2 services R-004 MED 9-smoke chain green but not gated (no justfile/hook) The audit is the work; refactors come after. Sprint 0 owns coverage + CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are mostly design-bar work for unbuilt agent components. .gitignore exception: /reports/* + !/reports/scrum/ keeps reports/ a runtime-artifact directory while exposing reports/scrum/ as tracked documentation. Mirrors the pattern future audit passes will land in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
golangLAKEHOUSE — Sprint Backlog
Five sprints adapted from SCRUM.md's framework. Each sprint has a goal, user stories, and acceptance criteria. Risk IDs reference risk-register.md. Acceptance-of-done details live in acceptance-gates.md.
The audit is the work of this turn; these sprints are the next turns. Order matters — Sprint 0 unblocks the rest by making the substrate provably runnable on a clean box.
Sprint 0 — Reproducibility Gate
Goal: make the repo provably runnable, with structural protection against silent regressions in the load-bearing-but-untested layers.
Risks closed: R-002, R-003, R-004, R-005, R-006, R-008, R-012.
Stories
-
S0.1 — As an operator, I can run one command and know exactly which dependencies are missing or wrong-versioned.
- Concrete:
just doctorchecks Go ≥1.25, gcc, MinIO at:9000, Ollama at:11434withnomic-embed-textloaded,secrets-go.tomlpresent + readable. Output is structured JSON on--jsonflag. Non-zero exit on any missing dep.
- Concrete:
-
S0.2 — As an operator, I can run a minimal fixture test without MinIO or Ollama.
- Concrete:
just smoke-fixturesruns against in-process fakes (MockS3Storage+MockEmbedProvider). Smokes split into two tiers:*_smoke.sh(real services, slow) vs*_smoke_fixtures.sh(fakes, runs anywhere).
- Concrete:
-
S0.3 — As an operator, I can verify the whole substrate with one command, and I cannot push a regression past it.
- Concrete:
just verifyrunsgo vet+go test+ the 9-smoke chain..git/hooks/pre-pushcallsjust verifyand aborts on non-zero exit. Failure output is structured.
- Concrete:
-
S0.4 — As a reviewer, I can read coverage at a glance and see where wiring layers lack tests.
- Concrete:
cmd/<bin>/main_test.goexists for all 7 binaries (today: onlystoraged). Each tests routes registered, body-cap rejection, schema-validation rejection, happy-path with mocked dependency.
- Concrete:
-
S0.5 — Load-bearing internal packages have unit-test coverage proportional to their blast radius.
- Concrete:
internal/shared/{server,config}_test.goexist (R-002).internal/storeclient/client_test.goexists (R-003).internal/queryd/db_test.goexists with adversarialsqlEscape+ exhaustiveredactCredscases (R-008).
- Concrete:
-
S0.6 — Empty
tests/directory either claimed or removed.- Concrete: pick. If claimed for fixture-mode wiring (S0.2), document its purpose in README. If not, delete in the same commit as S0.1.
Acceptance
just --listshowsverify,smoke-fixtures,doctor, plus shortcuts forfmt/vet/test/smoke <day>.just verifyexits 0 on a clean clone with deps present.just smoke-fixturesexits 0 on a clean clone with no MinIO and no Ollama.- Pre-push hook present at
.git/hooks/pre-push, executable, callsjust verify. go test ./...shows non-empty test count for every package ininternal/(no more[no test files]lines for shared/storeclient).- Test count for cmd/ binaries: 7/7 (today: 1/7).
- Failure output structured: any
just doctorfailure prints JSON describing what's missing, no claim of success.
Estimate
- S0.1 doctor: ~1 hr
- S0.2 fixture-mode: ~3 hr (interface plumbing + fakes + new smokes)
- S0.3 verify + hook: ~30 min
- S0.4 cmd-level tests: ~3 hr (6 binaries × ~30 min)
- S0.5 internal tests: ~3 hr
- S0.6 tests/ dir: ~5 min
Total: ~1.5 days focused. Single bundled PR with one commit per story.
Sprint 1 — Trust Boundary Gate
Goal: prevent agent trust collapse. Make the SQL surface not be RCE-equivalent on accidental non-localhost binding. Decide auth posture once and apply uniformly.
Risks closed: R-001, R-007, R-009 (regression test only), R-010.
Stories
-
S1.1 — As an operator, I cannot accidentally expose
POST /sqlto the network.- Concrete:
cmd/queryd/main.gostartup asserts bind starts with127.or[::1]. If envLH_QUERYD_ALLOW_NONLOOPBACK=1is set, log a warning and continue. Otherwiseos.Exit(1). Same gate added to vectord, storaged, ingestd until S1.2 lands.
- Concrete:
-
S1.2 — As an operator, I have one configurable auth posture across all 7 binaries.
- Concrete: ADR-003 picks Bearer-token + IP allow-list (or alternative — decide in the ADR).
internal/shared/auth.goprovides middleware; eachcmd/<bin>/main.goaddsr.Use(authMiddleware)in one line. Token sourced fromsecrets-go.toml's new[auth].tokenfield. Empty token = local-mode (no auth, only127.bind allowed).
- Concrete: ADR-003 picks Bearer-token + IP allow-list (or alternative — decide in the ADR).
-
S1.3 — As an operator, every public endpoint validates schema on input.
- Concrete: each handler decoding a JSON body has explicit struct tags + missing-field detection. Unknown fields rejected (
json.Decoder.DisallowUnknownFields). Empty-required-field rejected with structured 400. Today's coverage is partial; this story closes it uniformly.
- Concrete: each handler decoding a JSON body has explicit struct tags + missing-field detection. Unknown fields rejected (
-
S1.4 — As a reviewer, I have a regression test against SQL injection in dataset names.
- Concrete:
internal/queryd/registrar_test.gogains a test where catalogd returns a manifest withname: 'foo"; DROP TABLE x; --'. The test assertsquoteIdentquoting prevents the DROP from executing — view name is"foo""; DROP TABLE x; --"which is a single quoted identifier (R-009 latent guard).
- Concrete:
Acceptance
- All 7 binaries fail-loud on non-loopback bind without explicit override env.
- ADR-003 in
docs/DECISIONS.mddocuments the auth model with rationale. - Auth middleware is one
r.Use()line per binary; adding it to a new binary takes one import. - Every JSON-decoding handler uses
DisallowUnknownFields+ missing-required-field rejection. - R-009 regression test passes; assertion would fail if
quoteIdentis removed.
Estimate
~2 days focused. ADR-003 is the gating decision; once written, S1.1 + S1.2 are mechanical.
Sprint 2 — Memory Correctness Gate
Goal: prove pathway / playbook memory cannot poison itself, with the test fixture covering Mem0 semantics on day one. This sprint is design-bar work for components that haven't been ported from Rust yet — the memory layer will not exist after Sprint 1.
Risks closed: all DESIGN-BAR rows in claim-coverage-table.md for Mem0 + persistence-at-scale.
Stories
-
S2.1 — As an architect, I have an ADR fixing the pathway-memory data model in Go before code lands.
- Concrete: ADR-004 documents trace shape, history-chain rules, retire semantics, replay-count rules. Cites the Rust
pathway_memorycrate as reference but does NOT carry forward the 88-trace state per ADR-001 (clean start ratified).
- Concrete: ADR-004 documents trace shape, history-chain rules, retire semantics, replay-count rules. Cites the Rust
-
S2.2 — As a developer, the pathway-memory port lands with a deterministic fixture corpus and full test coverage on day one.
- Concrete:
tests/fixtures/pathway/has known-shape JSON entries covering ADD / UPDATE / REVISE / RETIRE / HISTORY / cycle-attempt / replay-duplicate / corrupted-row. Newinternal/pathway/package implements the data model. Test count: ≥7 functions inpathway_test.go, one per fixture row.
- Concrete:
-
S2.3 — As a developer, retired traces are excluded from retrieval — and the test would fail without the exclusion.
- Concrete: integration test does ADD → RETIRE → SEARCH → assert returned set excludes the retired UID. Removing the retirement filter must turn this test red.
-
S2.4 — As an architect, vectord persistence works above 256 MiB single-key (the gap from the 500K staffing test).
- Concrete: either bump storaged's
MaxBytesReaderfor vector-content paths, or split LHV1 across N fixed-size keys with a manifest pointer, or add multipart upload to storaged. Decision in ADR-005. Smoke variantg1p_scale_smoke.shingests 200K vectors @ d=768 + asserts kill-restart preserves state at that size.
- Concrete: either bump storaged's
Acceptance
- ADR-004 and ADR-005 in
docs/DECISIONS.md. internal/pathway/package with ≥7 covering tests;go test ./internal/pathway/passes.- Retire-exclusion regression test passes; would fail if filter logic removed.
g1p_scale_smoke.shpasses at 200K vectors.
Estimate
~1 week. ADR-004 is the design anchor; the test fixtures derive from it.
Sprint 3 — Agent Loop Reality Gate
Goal: prove the full agent loop works across an actual workflow. End-to-end deterministic: search → verify → observer review → playbook seal → second-run retrieval surfaces the prior playbook.
Risks closed: all DESIGN-BAR rows for observer + playbook seal + agent loop closure. The Rust system's r.json() on text/plain crash-loop bug (memory 54689d5) gets a regression test.
Stories
-
S3.1 — As an architect, ADR-002 fixes observer fail-safe semantics before observer is ported.
- Concrete: doc-only. Default verdict =
cycle,degraded: trueon internal error. ExplicitLH_OBSERVER_FAIL_OPEN=1env to opt into fail-open in dev only. Reference the Rust mcp-server'sverdict: "accept"on observer error as the anti-pattern being designed away.
- Concrete: doc-only. Default verdict =
-
S3.2 — As a developer, the observer port ships with tests covering the four states (accept / reject / cycle / degraded).
- Concrete:
internal/observer/package +cmd/observerdbinary. Test fixture: hallucinated claim → reject; valid claim with SQL truth → accept; SQL truth unreachable → degraded+cycle (NEVER accept).
- Concrete:
-
S3.3 — As a developer, playbook seal + second-run retrieval is a single end-to-end smoke.
- Concrete:
agent_loop_smoke.shdoes ingest → search → verify → observer review → seal → second-run retrieval. Assertions: second run surfaces prior playbook UID; report includes input hash, output hash, verdict, and memory-mutation receipt.
- Concrete:
-
S3.4 — As a reviewer, the Rust health-endpoint content-type bug cannot recur.
- Concrete: regression test that consumes
/healthfrom each of the 7 binaries via the gateway and asserts: response is text/plain, body matches<service> okpattern, never silently parses as JSON.
- Concrete: regression test that consumes
Acceptance
- ADR-002 in
docs/DECISIONS.md. internal/observer/with ≥4 covering tests.agent_loop_smoke.shpasses deterministically; tagged report includes input/output hashes + verdict + receipt.health_contenttype_test.goexists, would fail if any binary regresses to JSON.
Estimate
~1 week. ADR-002 is short; observer port is the bulk; agent-loop wiring is real engineering.
Sprint 4 — Deployment Gate
Goal: turn deployment from tribal-knowledge into executable validation. Fresh box → green smoke chain in one command.
Risks closed: R-006 (cloud-only Provider), all deployment-readiness gaps (no REPLICATION, no env template, no systemd, no doctor).
Stories
-
S4.1 — As an operator on a fresh Debian box,
just doctortells me exactly what to install.- Concrete: structured JSON output describing each missing dep with the
apt install/curl ... | tarcommand to fix it. Cross-checked againstREADME.md"Cold-start dependencies" — single source of truth.
- Concrete: structured JSON output describing each missing dep with the
-
S4.2 — As an operator,
REPLICATION.mdis executable, not narrative.- Concrete: every step in
REPLICATION.mdis either a copy-pasteable command block or a reference to ajust <target>invocation. Validation steps from the upstreamREPLICATION.md(health checks, embed probe, vector probe, agent test) becomejust smoke-replication.
- Concrete: every step in
-
S4.3 — As an operator, I have an env template for
secrets-go.toml.- Concrete:
secrets-go.toml.examplein repo with all required keys + comments documenting each.just doctorchecks for unfilled placeholder values.
- Concrete:
-
S4.4 — As an operator, systemd units in repo wire each binary cleanly.
- Concrete:
deploy/systemd/{gateway,storaged,catalogd,ingestd,queryd,vectord,embedd}.servicewithAfter=,Restart=on-failure,MemoryMax=, environment loading.just install-systemdsymlinks them.
- Concrete:
-
S4.5 — As an operator deploying to AWS S3 instead of MinIO, no code changes are required.
- Concrete:
just smoke-aws-s3variant that points the bucket config at real S3. Existing smokes pass against real S3 (validates the aws-sdk-go-v2 path).
- Concrete:
Acceptance
just doctoron fresh Debian 13 box reports actionable JSON with install commands.just smoke-replicationsucceeds on first run afterjust doctorshows green.secrets-go.toml.examplepresent with documented keys.- 7 systemd unit files in
deploy/systemd/;systemctl status lakehouse-go-*shows green after install. just smoke-aws-s3succeeds against a real bucket (manual: requires AWS creds).
Estimate
~3 days focused. S4.4 + S4.5 are most of the time.
Cross-sprint dependencies
Sprint 0 ─────────────────────────────────────► (unblocks all)
│
├─► Sprint 1 ───► Sprint 2 ───► Sprint 3 ───► Sprint 4
│ │ │ │
│ ▼ ▼ ▼
└──── auth ADR ── memory ADR ── observer ADR
- Sprint 0 is the gate. None of the others should ship without
just verifyreliably catching regressions. - Sprint 1 should land before Sprint 2 because R-001 (queryd /sql) is HIGH severity and the fix is mostly mechanical.
- Sprint 2 / 3 are real engineering; estimates are floors not ceilings.
- Sprint 4 can land in parallel with Sprint 2/3 — its stories don't depend on the agent-loop port.