root 91edd43164 scrum audit: 5 reports under reports/scrum/ · score 35/60

Adapts docs/SCRUM.md framework (originally written for the
matrix-agent-validated repo) to the Go rewrite. Five deliverables:

  golang-lakehouse-scrum-test.md  top-line + scoring + verdict
  risk-register.md                12 findings, R-001..R-012
  claim-coverage-table.md         claim/test/risk for Sprint 2
  sprint-backlog.md               5 sprints, ~2 weeks of work
  acceptance-gates.md             DoD as runnable commands

Every claim cites file:line, command output, or "missing evidence."
Smoke chain ran clean (33s wall, all 9 PASS) and is captured in
reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact).

Scoring:
  Reproducibility       7/10  9 smokes deterministic, no just/CI gate
  Test Coverage         6/10  internal/ packages tested, 6/7 cmd/ aren't
  Trust Boundary        7/10  escapes ok, zero auth, /sql is RCE-eq off-loopback
  Memory Correctness    3/10  pathway/playbook/observer not yet ported
  Deployment Readiness  4/10  no REPLICATION, no env template, no systemd
  Maintainability       8/10  no god-files, 7 lean binaries, ADRs current

Top three risks:
  R-001 HIGH  queryd /sql + DuckDB + non-loopback bind = RCE-equivalent
  R-002 HIGH  internal/shared (server.go + config.go) zero tests
  R-003 HIGH  internal/storeclient zero tests, used by 2 services
  R-004 MED   9-smoke chain green but not gated (no justfile/hook)

The audit is the work; refactors come after. Sprint 0 owns coverage
+ CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are
mostly design-bar work for unbuilt agent components.

.gitignore exception: /reports/* + !/reports/scrum/ keeps reports/
a runtime-artifact directory while exposing reports/scrum/ as
tracked documentation. Mirrors the pattern future audit passes will
land in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-29 04:51:47 -05:00

13 KiB

Raw Permalink Blame History

golangLAKEHOUSE — Sprint Backlog

Five sprints adapted from SCRUM.md's framework. Each sprint has a goal, user stories, and acceptance criteria. Risk IDs reference risk-register.md. Acceptance-of-done details live in acceptance-gates.md.

The audit is the work of this turn; these sprints are the next turns. Order matters — Sprint 0 unblocks the rest by making the substrate provably runnable on a clean box.

Sprint 0 — Reproducibility Gate

Goal: make the repo provably runnable, with structural protection against silent regressions in the load-bearing-but-untested layers.

Risks closed: R-002, R-003, R-004, R-005, R-006, R-008, R-012.

Stories

S0.1 — As an operator, I can run one command and know exactly which dependencies are missing or wrong-versioned.
- Concrete: just doctor checks Go ≥1.25, gcc, MinIO at :9000, Ollama at :11434 with nomic-embed-text loaded, secrets-go.toml present + readable. Output is structured JSON on --json flag. Non-zero exit on any missing dep.
S0.2 — As an operator, I can run a minimal fixture test without MinIO or Ollama.
- Concrete: just smoke-fixtures runs against in-process fakes (MockS3Storage + MockEmbedProvider). Smokes split into two tiers: *_smoke.sh (real services, slow) vs *_smoke_fixtures.sh (fakes, runs anywhere).
S0.3 — As an operator, I can verify the whole substrate with one command, and I cannot push a regression past it.
- Concrete: just verify runs go vet + go test + the 9-smoke chain. .git/hooks/pre-push calls just verify and aborts on non-zero exit. Failure output is structured.
S0.4 — As a reviewer, I can read coverage at a glance and see where wiring layers lack tests.
- Concrete: cmd/<bin>/main_test.go exists for all 7 binaries (today: only storaged). Each tests routes registered, body-cap rejection, schema-validation rejection, happy-path with mocked dependency.
S0.5 — Load-bearing internal packages have unit-test coverage proportional to their blast radius.
- Concrete: internal/shared/{server,config}_test.go exist (R-002). internal/storeclient/client_test.go exists (R-003). internal/queryd/db_test.go exists with adversarial sqlEscape + exhaustive redactCreds cases (R-008).
S0.6 — Empty tests/ directory either claimed or removed.
- Concrete: pick. If claimed for fixture-mode wiring (S0.2), document its purpose in README. If not, delete in the same commit as S0.1.

Acceptance

just --list shows verify, smoke-fixtures, doctor, plus shortcuts for fmt/vet/test/smoke <day>.
just verify exits 0 on a clean clone with deps present.
just smoke-fixtures exits 0 on a clean clone with no MinIO and no Ollama.
Pre-push hook present at .git/hooks/pre-push, executable, calls just verify.
go test ./... shows non-empty test count for every package in internal/ (no more [no test files] lines for shared/storeclient).
Test count for cmd/ binaries: 7/7 (today: 1/7).
Failure output structured: any just doctor failure prints JSON describing what's missing, no claim of success.

Estimate

S0.1 doctor: ~1 hr
S0.2 fixture-mode: ~3 hr (interface plumbing + fakes + new smokes)
S0.3 verify + hook: ~30 min
S0.4 cmd-level tests: ~3 hr (6 binaries × ~30 min)
S0.5 internal tests: ~3 hr
S0.6 tests/ dir: ~5 min

Total: ~1.5 days focused. Single bundled PR with one commit per story.

Sprint 1 — Trust Boundary Gate

Goal: prevent agent trust collapse. Make the SQL surface not be RCE-equivalent on accidental non-localhost binding. Decide auth posture once and apply uniformly.

Risks closed: R-001, R-007, R-009 (regression test only), R-010.

Stories

S1.1 — As an operator, I cannot accidentally expose POST /sql to the network.
- Concrete: cmd/queryd/main.go startup asserts bind starts with 127. or [::1]. If env LH_QUERYD_ALLOW_NONLOOPBACK=1 is set, log a warning and continue. Otherwise os.Exit(1). Same gate added to vectord, storaged, ingestd until S1.2 lands.
S1.2 — As an operator, I have one configurable auth posture across all 7 binaries.
- Concrete: ADR-003 picks Bearer-token + IP allow-list (or alternative — decide in the ADR). internal/shared/auth.go provides middleware; each cmd/<bin>/main.go adds r.Use(authMiddleware) in one line. Token sourced from secrets-go.toml's new [auth].token field. Empty token = local-mode (no auth, only 127. bind allowed).
S1.3 — As an operator, every public endpoint validates schema on input.
- Concrete: each handler decoding a JSON body has explicit struct tags + missing-field detection. Unknown fields rejected (json.Decoder.DisallowUnknownFields). Empty-required-field rejected with structured 400. Today's coverage is partial; this story closes it uniformly.
S1.4 — As a reviewer, I have a regression test against SQL injection in dataset names.
- Concrete: internal/queryd/registrar_test.go gains a test where catalogd returns a manifest with name: 'foo"; DROP TABLE x; --'. The test asserts quoteIdent quoting prevents the DROP from executing — view name is "foo""; DROP TABLE x; --" which is a single quoted identifier (R-009 latent guard).

Acceptance

All 7 binaries fail-loud on non-loopback bind without explicit override env.
ADR-003 in docs/DECISIONS.md documents the auth model with rationale.
Auth middleware is one r.Use() line per binary; adding it to a new binary takes one import.
Every JSON-decoding handler uses DisallowUnknownFields + missing-required-field rejection.
R-009 regression test passes; assertion would fail if quoteIdent is removed.

Estimate

~2 days focused. ADR-003 is the gating decision; once written, S1.1 + S1.2 are mechanical.

Sprint 2 — Memory Correctness Gate

Goal: prove pathway / playbook memory cannot poison itself, with the test fixture covering Mem0 semantics on day one. This sprint is design-bar work for components that haven't been ported from Rust yet — the memory layer will not exist after Sprint 1.

Risks closed: all DESIGN-BAR rows in claim-coverage-table.md for Mem0 + persistence-at-scale.

Stories

S2.1 — As an architect, I have an ADR fixing the pathway-memory data model in Go before code lands.
- Concrete: ADR-004 documents trace shape, history-chain rules, retire semantics, replay-count rules. Cites the Rust pathway_memory crate as reference but does NOT carry forward the 88-trace state per ADR-001 (clean start ratified).
S2.2 — As a developer, the pathway-memory port lands with a deterministic fixture corpus and full test coverage on day one.
- Concrete: tests/fixtures/pathway/ has known-shape JSON entries covering ADD / UPDATE / REVISE / RETIRE / HISTORY / cycle-attempt / replay-duplicate / corrupted-row. New internal/pathway/ package implements the data model. Test count: ≥7 functions in pathway_test.go, one per fixture row.
S2.3 — As a developer, retired traces are excluded from retrieval — and the test would fail without the exclusion.
- Concrete: integration test does ADD → RETIRE → SEARCH → assert returned set excludes the retired UID. Removing the retirement filter must turn this test red.
S2.4 — As an architect, vectord persistence works above 256 MiB single-key (the gap from the 500K staffing test).
- Concrete: either bump storaged's MaxBytesReader for vector-content paths, or split LHV1 across N fixed-size keys with a manifest pointer, or add multipart upload to storaged. Decision in ADR-005. Smoke variant g1p_scale_smoke.sh ingests 200K vectors @ d=768 + asserts kill-restart preserves state at that size.

Acceptance

ADR-004 and ADR-005 in docs/DECISIONS.md.
internal/pathway/ package with ≥7 covering tests; go test ./internal/pathway/ passes.
Retire-exclusion regression test passes; would fail if filter logic removed.
g1p_scale_smoke.sh passes at 200K vectors.

Estimate

~1 week. ADR-004 is the design anchor; the test fixtures derive from it.

Sprint 3 — Agent Loop Reality Gate

Goal: prove the full agent loop works across an actual workflow. End-to-end deterministic: search → verify → observer review → playbook seal → second-run retrieval surfaces the prior playbook.

Risks closed: all DESIGN-BAR rows for observer + playbook seal + agent loop closure. The Rust system's r.json() on text/plain crash-loop bug (memory 54689d5) gets a regression test.

Stories

S3.1 — As an architect, ADR-002 fixes observer fail-safe semantics before observer is ported.
- Concrete: doc-only. Default verdict = cycle, degraded: true on internal error. Explicit LH_OBSERVER_FAIL_OPEN=1 env to opt into fail-open in dev only. Reference the Rust mcp-server's verdict: "accept" on observer error as the anti-pattern being designed away.
S3.2 — As a developer, the observer port ships with tests covering the four states (accept / reject / cycle / degraded).
- Concrete: internal/observer/ package + cmd/observerd binary. Test fixture: hallucinated claim → reject; valid claim with SQL truth → accept; SQL truth unreachable → degraded+cycle (NEVER accept).
S3.3 — As a developer, playbook seal + second-run retrieval is a single end-to-end smoke.
- Concrete: agent_loop_smoke.sh does ingest → search → verify → observer review → seal → second-run retrieval. Assertions: second run surfaces prior playbook UID; report includes input hash, output hash, verdict, and memory-mutation receipt.
S3.4 — As a reviewer, the Rust health-endpoint content-type bug cannot recur.
- Concrete: regression test that consumes /health from each of the 7 binaries via the gateway and asserts: response is text/plain, body matches <service> ok pattern, never silently parses as JSON.

Acceptance

ADR-002 in docs/DECISIONS.md.
internal/observer/ with ≥4 covering tests.
agent_loop_smoke.sh passes deterministically; tagged report includes input/output hashes + verdict + receipt.
health_contenttype_test.go exists, would fail if any binary regresses to JSON.

Estimate

~1 week. ADR-002 is short; observer port is the bulk; agent-loop wiring is real engineering.

Sprint 4 — Deployment Gate

Goal: turn deployment from tribal-knowledge into executable validation. Fresh box → green smoke chain in one command.

Risks closed: R-006 (cloud-only Provider), all deployment-readiness gaps (no REPLICATION, no env template, no systemd, no doctor).

Stories

S4.1 — As an operator on a fresh Debian box, just doctor tells me exactly what to install.
- Concrete: structured JSON output describing each missing dep with the apt install / curl ... | tar command to fix it. Cross-checked against README.md "Cold-start dependencies" — single source of truth.
S4.2 — As an operator, REPLICATION.md is executable, not narrative.
- Concrete: every step in REPLICATION.md is either a copy-pasteable command block or a reference to a just <target> invocation. Validation steps from the upstream REPLICATION.md (health checks, embed probe, vector probe, agent test) become just smoke-replication.
S4.3 — As an operator, I have an env template for secrets-go.toml.
- Concrete: secrets-go.toml.example in repo with all required keys + comments documenting each. just doctor checks for unfilled placeholder values.
S4.4 — As an operator, systemd units in repo wire each binary cleanly.
- Concrete: deploy/systemd/{gateway,storaged,catalogd,ingestd,queryd,vectord,embedd}.service with After=, Restart=on-failure, MemoryMax=, environment loading. just install-systemd symlinks them.
S4.5 — As an operator deploying to AWS S3 instead of MinIO, no code changes are required.
- Concrete: just smoke-aws-s3 variant that points the bucket config at real S3. Existing smokes pass against real S3 (validates the aws-sdk-go-v2 path).

Acceptance

just doctor on fresh Debian 13 box reports actionable JSON with install commands.
just smoke-replication succeeds on first run after just doctor shows green.
secrets-go.toml.example present with documented keys.
7 systemd unit files in deploy/systemd/; systemctl status lakehouse-go-* shows green after install.
just smoke-aws-s3 succeeds against a real bucket (manual: requires AWS creds).

Estimate

~3 days focused. S4.4 + S4.5 are most of the time.

Cross-sprint dependencies

Sprint 0 ─────────────────────────────────────► (unblocks all)
   │
   ├─► Sprint 1 ───► Sprint 2 ───► Sprint 3 ───► Sprint 4
   │       │              │              │
   │       ▼              ▼              ▼
   └──── auth ADR ── memory ADR ── observer ADR

Sprint 0 is the gate. None of the others should ship without just verify reliably catching regressions.
Sprint 1 should land before Sprint 2 because R-001 (queryd /sql) is HIGH severity and the fix is mostly mechanical.
Sprint 2 / 3 are real engineering; estimates are floors not ceilings.
Sprint 4 can land in parallel with Sprint 2/3 — its stories don't depend on the agent-loop port.

13 KiB Raw Permalink Blame History Unescape Escape

golangLAKEHOUSE — Sprint Backlog

Sprint 0 — Reproducibility Gate

Stories

Acceptance

Estimate

Sprint 1 — Trust Boundary Gate

Stories

Acceptance

Estimate

Sprint 2 — Memory Correctness Gate

Stories

Acceptance

Estimate

Sprint 3 — Agent Loop Reality Gate

Stories

Acceptance

Estimate

Sprint 4 — Deployment Gate

Stories

Acceptance

Estimate

Cross-sprint dependencies

13 KiB

Raw Permalink Blame History