Adapts docs/SCRUM.md framework (originally written for the matrix-agent-validated repo) to the Go rewrite. Five deliverables: golang-lakehouse-scrum-test.md top-line + scoring + verdict risk-register.md 12 findings, R-001..R-012 claim-coverage-table.md claim/test/risk for Sprint 2 sprint-backlog.md 5 sprints, ~2 weeks of work acceptance-gates.md DoD as runnable commands Every claim cites file:line, command output, or "missing evidence." Smoke chain ran clean (33s wall, all 9 PASS) and is captured in reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact). Scoring: Reproducibility 7/10 9 smokes deterministic, no just/CI gate Test Coverage 6/10 internal/ packages tested, 6/7 cmd/ aren't Trust Boundary 7/10 escapes ok, zero auth, /sql is RCE-eq off-loopback Memory Correctness 3/10 pathway/playbook/observer not yet ported Deployment Readiness 4/10 no REPLICATION, no env template, no systemd Maintainability 8/10 no god-files, 7 lean binaries, ADRs current Top three risks: R-001 HIGH queryd /sql + DuckDB + non-loopback bind = RCE-equivalent R-002 HIGH internal/shared (server.go + config.go) zero tests R-003 HIGH internal/storeclient zero tests, used by 2 services R-004 MED 9-smoke chain green but not gated (no justfile/hook) The audit is the work; refactors come after. Sprint 0 owns coverage + CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are mostly design-bar work for unbuilt agent components. .gitignore exception: /reports/* + !/reports/scrum/ keeps reports/ a runtime-artifact directory while exposing reports/scrum/ as tracked documentation. Mirrors the pattern future audit passes will land in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
215 lines
5.5 KiB
Markdown
215 lines
5.5 KiB
Markdown
# Scrum Test: Matrix Agent Validated Hardening Sprint
|
||
|
||
## Mission
|
||
|
||
Run a Scrum-style technical validation against this repository:
|
||
|
||
https://git.agentview.dev/profit/matrix-agent-validated.git
|
||
|
||
Do not add features first. Treat the codebase as a validated prototype that now needs production-hardening pressure.
|
||
|
||
The goal is to produce a hard evidence report and a prioritized sprint backlog.
|
||
|
||
## Core Questions
|
||
|
||
1. Can the repo be cloned, built, and smoke-tested from a clean environment?
|
||
2. Are the claimed validated paths actually covered by repeatable tests?
|
||
3. Where does the system rely on demo assumptions, hardcoded paths, permissive fallbacks, or unsafe string construction?
|
||
4. Which failures would corrupt trust in the agent loop?
|
||
5. What must be fixed before this becomes a reusable agent-memory substrate?
|
||
|
||
## Required Inspection Areas
|
||
|
||
### 1. Build and Test Surface
|
||
|
||
Inspect:
|
||
|
||
- Cargo workspace
|
||
- Rust crates
|
||
- Bun/TypeScript MCP server
|
||
- Python sidecar
|
||
- tests/
|
||
- justfile
|
||
- REPLICATION.md
|
||
- systemd units
|
||
- scripts/
|
||
|
||
Run or prepare the following commands where possible:
|
||
|
||
```bash
|
||
just --list
|
||
cargo check --workspace
|
||
cargo test --workspace
|
||
cd mcp-server && bun install && bun test || true
|
||
bun run tests/agent_test/agent_harness.ts || true
|
||
|
||
If heavy data or external services are missing, do not fake success. Record the blocker and define a mock/minimal fixture path.
|
||
|
||
2. Security and Trust Boundary Review
|
||
|
||
Search for:
|
||
|
||
raw SQL interpolation
|
||
shell command execution
|
||
open CORS
|
||
unauthenticated mutation endpoints
|
||
pass-through proxy routes
|
||
hardcoded absolute paths
|
||
secrets in repo
|
||
fail-open review behavior
|
||
unbounded file reads/writes
|
||
unsafe JSON parsing assumptions
|
||
|
||
Pay special attention to:
|
||
|
||
mcp-server/index.ts
|
||
mcp-server/observer.ts
|
||
crates/vectord/src/pathway_memory.rs
|
||
crates/vectord/src/playbook_memory.rs
|
||
scripts/
|
||
sidecar/
|
||
3. Agent Validation Review
|
||
|
||
Verify whether the following claims are actually enforced by tests:
|
||
|
||
vector retrieval across corpora
|
||
observer hand-review gates candidates
|
||
successful playbooks are sealed
|
||
retrieval surfaces prior playbooks on later runs
|
||
Mem0-style ADD / UPDATE / REVISE / RETIRE / HISTORY behavior works
|
||
retired traces are excluded from retrieval
|
||
history chains are cycle-safe
|
||
agent claims can be verified against SQL truth
|
||
cloud-only adaptation works without local Ollama
|
||
|
||
Create a table:
|
||
|
||
Claim Code Location Existing Test Missing Test Risk
|
||
4. Scrum Backlog Output
|
||
|
||
Create a prioritized backlog using this format:
|
||
|
||
Sprint 0 — Reproducibility Gate
|
||
|
||
Goal: make the repo provably runnable.
|
||
|
||
Stories:
|
||
|
||
As an operator, I can run one command and know which dependencies are missing.
|
||
As an operator, I can run a minimal fixture test without the 470MB data payload.
|
||
As an operator, I can verify gateway, sidecar, observer, and MCP health with one command.
|
||
|
||
Acceptance:
|
||
|
||
just verify exists.
|
||
just smoke runs without large datasets.
|
||
failure output is structured JSON.
|
||
no test claims success when dependencies are missing.
|
||
Sprint 1 — Trust Boundary Gate
|
||
|
||
Goal: prevent agent trust collapse.
|
||
|
||
Stories:
|
||
|
||
Replace raw SQL string interpolation with validated query builders or parameterized calls.
|
||
Change observer /review failure from fail-open accept to explicit degraded/cycle verdict.
|
||
Add auth or localhost-only guardrails for mutation endpoints.
|
||
Add schema validation for every public endpoint.
|
||
|
||
Acceptance:
|
||
|
||
SQL injection tests fail before fix and pass after fix.
|
||
observer crash cannot auto-accept unsafe candidate output.
|
||
mutation endpoints require configured token or local-only mode.
|
||
Sprint 2 — Memory Correctness Gate
|
||
|
||
Goal: prove Mem0/pathway memory cannot poison itself.
|
||
|
||
Stories:
|
||
|
||
Add tests for ADD, UPDATE, REVISE, RETIRE, HISTORY.
|
||
Add cycle detection tests.
|
||
Add retired-trace retrieval exclusion tests.
|
||
Add duplicate trace replay_count tests.
|
||
Add corrupted memory row recovery test.
|
||
|
||
Acceptance:
|
||
|
||
deterministic fixture corpus
|
||
all memory operations covered
|
||
every memory mutation emits audit/event receipt
|
||
Sprint 3 — Agent Loop Reality Gate
|
||
|
||
Goal: prove the agent loop works across actual workflows.
|
||
|
||
Stories:
|
||
|
||
Build deterministic mini corpus.
|
||
Run search → verify → observer review → playbook seal → second-run retrieval.
|
||
Add negative case where observer rejects hallucinated claim.
|
||
Add regression for health endpoint content-type mismatch.
|
||
|
||
Acceptance:
|
||
|
||
single command proves the full loop
|
||
generated report includes input hash, output hash, verdict, and memory mutation receipt
|
||
Sprint 4 — Deployment Gate
|
||
|
||
Goal: turn REPLICATION.md into executable deployment validation.
|
||
|
||
Stories:
|
||
|
||
Convert REPLICATION.md validation section into scripts.
|
||
Add env var template.
|
||
Add config validation.
|
||
Remove hardcoded /home/profit/lakehouse paths.
|
||
Add systemd readiness checks.
|
||
|
||
Acceptance:
|
||
|
||
fresh clone can run just doctor
|
||
missing env vars are reported clearly
|
||
no absolute path assumptions remain unless configured
|
||
Required Final Deliverables
|
||
|
||
Create:
|
||
|
||
reports/scrum/matrix-agent-scrum-test.md
|
||
reports/scrum/risk-register.md
|
||
reports/scrum/claim-coverage-table.md
|
||
reports/scrum/sprint-backlog.md
|
||
reports/scrum/acceptance-gates.md
|
||
|
||
Do not rewrite the system yet.
|
||
|
||
First produce the reports only.
|
||
|
||
Scoring Model
|
||
|
||
Use this scoring:
|
||
|
||
Reproducibility: 0–10
|
||
Test Coverage: 0–10
|
||
Trust Boundary Safety: 0–10
|
||
Agent Memory Correctness: 0–10
|
||
Deployment Readiness: 0–10
|
||
Maintainability: 0–10
|
||
|
||
Mark each score with evidence.
|
||
|
||
Final Rule
|
||
|
||
No vibes. No “appears to work.” Every claim must point to:
|
||
|
||
file path
|
||
line/function
|
||
command output
|
||
test result
|
||
missing evidence
|
||
|
||
That’s the move: **don’t refactor yet. Put the repo under oath first.**
|
||
::contentReference[oaicite:5]{index=5}
|
||
|
||
|
||
|