root 91edd43164 scrum audit: 5 reports under reports/scrum/ · score 35/60
Adapts docs/SCRUM.md framework (originally written for the
matrix-agent-validated repo) to the Go rewrite. Five deliverables:

  golang-lakehouse-scrum-test.md  top-line + scoring + verdict
  risk-register.md                12 findings, R-001..R-012
  claim-coverage-table.md         claim/test/risk for Sprint 2
  sprint-backlog.md               5 sprints, ~2 weeks of work
  acceptance-gates.md             DoD as runnable commands

Every claim cites file:line, command output, or "missing evidence."
Smoke chain ran clean (33s wall, all 9 PASS) and is captured in
reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact).

Scoring:
  Reproducibility       7/10  9 smokes deterministic, no just/CI gate
  Test Coverage         6/10  internal/ packages tested, 6/7 cmd/ aren't
  Trust Boundary        7/10  escapes ok, zero auth, /sql is RCE-eq off-loopback
  Memory Correctness    3/10  pathway/playbook/observer not yet ported
  Deployment Readiness  4/10  no REPLICATION, no env template, no systemd
  Maintainability       8/10  no god-files, 7 lean binaries, ADRs current

Top three risks:
  R-001 HIGH  queryd /sql + DuckDB + non-loopback bind = RCE-equivalent
  R-002 HIGH  internal/shared (server.go + config.go) zero tests
  R-003 HIGH  internal/storeclient zero tests, used by 2 services
  R-004 MED   9-smoke chain green but not gated (no justfile/hook)

The audit is the work; refactors come after. Sprint 0 owns coverage
+ CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are
mostly design-bar work for unbuilt agent components.

.gitignore exception: /reports/* + !/reports/scrum/ keeps reports/
a runtime-artifact directory while exposing reports/scrum/ as
tracked documentation. Mirrors the pattern future audit passes will
land in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 04:51:47 -05:00

215 lines
5.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Scrum Test: Matrix Agent Validated Hardening Sprint
## Mission
Run a Scrum-style technical validation against this repository:
https://git.agentview.dev/profit/matrix-agent-validated.git
Do not add features first. Treat the codebase as a validated prototype that now needs production-hardening pressure.
The goal is to produce a hard evidence report and a prioritized sprint backlog.
## Core Questions
1. Can the repo be cloned, built, and smoke-tested from a clean environment?
2. Are the claimed validated paths actually covered by repeatable tests?
3. Where does the system rely on demo assumptions, hardcoded paths, permissive fallbacks, or unsafe string construction?
4. Which failures would corrupt trust in the agent loop?
5. What must be fixed before this becomes a reusable agent-memory substrate?
## Required Inspection Areas
### 1. Build and Test Surface
Inspect:
- Cargo workspace
- Rust crates
- Bun/TypeScript MCP server
- Python sidecar
- tests/
- justfile
- REPLICATION.md
- systemd units
- scripts/
Run or prepare the following commands where possible:
```bash
just --list
cargo check --workspace
cargo test --workspace
cd mcp-server && bun install && bun test || true
bun run tests/agent_test/agent_harness.ts || true
If heavy data or external services are missing, do not fake success. Record the blocker and define a mock/minimal fixture path.
2. Security and Trust Boundary Review
Search for:
raw SQL interpolation
shell command execution
open CORS
unauthenticated mutation endpoints
pass-through proxy routes
hardcoded absolute paths
secrets in repo
fail-open review behavior
unbounded file reads/writes
unsafe JSON parsing assumptions
Pay special attention to:
mcp-server/index.ts
mcp-server/observer.ts
crates/vectord/src/pathway_memory.rs
crates/vectord/src/playbook_memory.rs
scripts/
sidecar/
3. Agent Validation Review
Verify whether the following claims are actually enforced by tests:
vector retrieval across corpora
observer hand-review gates candidates
successful playbooks are sealed
retrieval surfaces prior playbooks on later runs
Mem0-style ADD / UPDATE / REVISE / RETIRE / HISTORY behavior works
retired traces are excluded from retrieval
history chains are cycle-safe
agent claims can be verified against SQL truth
cloud-only adaptation works without local Ollama
Create a table:
Claim Code Location Existing Test Missing Test Risk
4. Scrum Backlog Output
Create a prioritized backlog using this format:
Sprint 0 — Reproducibility Gate
Goal: make the repo provably runnable.
Stories:
As an operator, I can run one command and know which dependencies are missing.
As an operator, I can run a minimal fixture test without the 470MB data payload.
As an operator, I can verify gateway, sidecar, observer, and MCP health with one command.
Acceptance:
just verify exists.
just smoke runs without large datasets.
failure output is structured JSON.
no test claims success when dependencies are missing.
Sprint 1 — Trust Boundary Gate
Goal: prevent agent trust collapse.
Stories:
Replace raw SQL string interpolation with validated query builders or parameterized calls.
Change observer /review failure from fail-open accept to explicit degraded/cycle verdict.
Add auth or localhost-only guardrails for mutation endpoints.
Add schema validation for every public endpoint.
Acceptance:
SQL injection tests fail before fix and pass after fix.
observer crash cannot auto-accept unsafe candidate output.
mutation endpoints require configured token or local-only mode.
Sprint 2 — Memory Correctness Gate
Goal: prove Mem0/pathway memory cannot poison itself.
Stories:
Add tests for ADD, UPDATE, REVISE, RETIRE, HISTORY.
Add cycle detection tests.
Add retired-trace retrieval exclusion tests.
Add duplicate trace replay_count tests.
Add corrupted memory row recovery test.
Acceptance:
deterministic fixture corpus
all memory operations covered
every memory mutation emits audit/event receipt
Sprint 3 — Agent Loop Reality Gate
Goal: prove the agent loop works across actual workflows.
Stories:
Build deterministic mini corpus.
Run search → verify → observer review → playbook seal → second-run retrieval.
Add negative case where observer rejects hallucinated claim.
Add regression for health endpoint content-type mismatch.
Acceptance:
single command proves the full loop
generated report includes input hash, output hash, verdict, and memory mutation receipt
Sprint 4 — Deployment Gate
Goal: turn REPLICATION.md into executable deployment validation.
Stories:
Convert REPLICATION.md validation section into scripts.
Add env var template.
Add config validation.
Remove hardcoded /home/profit/lakehouse paths.
Add systemd readiness checks.
Acceptance:
fresh clone can run just doctor
missing env vars are reported clearly
no absolute path assumptions remain unless configured
Required Final Deliverables
Create:
reports/scrum/matrix-agent-scrum-test.md
reports/scrum/risk-register.md
reports/scrum/claim-coverage-table.md
reports/scrum/sprint-backlog.md
reports/scrum/acceptance-gates.md
Do not rewrite the system yet.
First produce the reports only.
Scoring Model
Use this scoring:
Reproducibility: 010
Test Coverage: 010
Trust Boundary Safety: 010
Agent Memory Correctness: 010
Deployment Readiness: 010
Maintainability: 010
Mark each score with evidence.
Final Rule
No vibes. No “appears to work.” Every claim must point to:
file path
line/function
command output
test result
missing evidence
Thats the move: **dont refactor yet. Put the repo under oath first.**
::contentReference[oaicite:5]{index=5}