Adapts docs/SCRUM.md framework (originally written for the matrix-agent-validated repo) to the Go rewrite. Five deliverables: golang-lakehouse-scrum-test.md top-line + scoring + verdict risk-register.md 12 findings, R-001..R-012 claim-coverage-table.md claim/test/risk for Sprint 2 sprint-backlog.md 5 sprints, ~2 weeks of work acceptance-gates.md DoD as runnable commands Every claim cites file:line, command output, or "missing evidence." Smoke chain ran clean (33s wall, all 9 PASS) and is captured in reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact). Scoring: Reproducibility 7/10 9 smokes deterministic, no just/CI gate Test Coverage 6/10 internal/ packages tested, 6/7 cmd/ aren't Trust Boundary 7/10 escapes ok, zero auth, /sql is RCE-eq off-loopback Memory Correctness 3/10 pathway/playbook/observer not yet ported Deployment Readiness 4/10 no REPLICATION, no env template, no systemd Maintainability 8/10 no god-files, 7 lean binaries, ADRs current Top three risks: R-001 HIGH queryd /sql + DuckDB + non-loopback bind = RCE-equivalent R-002 HIGH internal/shared (server.go + config.go) zero tests R-003 HIGH internal/storeclient zero tests, used by 2 services R-004 MED 9-smoke chain green but not gated (no justfile/hook) The audit is the work; refactors come after. Sprint 0 owns coverage + CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are mostly design-bar work for unbuilt agent components. .gitignore exception: /reports/* + !/reports/scrum/ keeps reports/ a runtime-artifact directory while exposing reports/scrum/ as tracked documentation. Mirrors the pattern future audit passes will land in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.5 KiB
5.5 KiB
Scrum Test: Matrix Agent Validated Hardening Sprint
Mission
Run a Scrum-style technical validation against this repository:
https://git.agentview.dev/profit/matrix-agent-validated.git
Do not add features first. Treat the codebase as a validated prototype that now needs production-hardening pressure.
The goal is to produce a hard evidence report and a prioritized sprint backlog.
Core Questions
- Can the repo be cloned, built, and smoke-tested from a clean environment?
- Are the claimed validated paths actually covered by repeatable tests?
- Where does the system rely on demo assumptions, hardcoded paths, permissive fallbacks, or unsafe string construction?
- Which failures would corrupt trust in the agent loop?
- What must be fixed before this becomes a reusable agent-memory substrate?
Required Inspection Areas
1. Build and Test Surface
Inspect:
- Cargo workspace
- Rust crates
- Bun/TypeScript MCP server
- Python sidecar
- tests/
- justfile
- REPLICATION.md
- systemd units
- scripts/
Run or prepare the following commands where possible:
just --list
cargo check --workspace
cargo test --workspace
cd mcp-server && bun install && bun test || true
bun run tests/agent_test/agent_harness.ts || true
If heavy data or external services are missing, do not fake success. Record the blocker and define a mock/minimal fixture path.
2. Security and Trust Boundary Review
Search for:
raw SQL interpolation
shell command execution
open CORS
unauthenticated mutation endpoints
pass-through proxy routes
hardcoded absolute paths
secrets in repo
fail-open review behavior
unbounded file reads/writes
unsafe JSON parsing assumptions
Pay special attention to:
mcp-server/index.ts
mcp-server/observer.ts
crates/vectord/src/pathway_memory.rs
crates/vectord/src/playbook_memory.rs
scripts/
sidecar/
3. Agent Validation Review
Verify whether the following claims are actually enforced by tests:
vector retrieval across corpora
observer hand-review gates candidates
successful playbooks are sealed
retrieval surfaces prior playbooks on later runs
Mem0-style ADD / UPDATE / REVISE / RETIRE / HISTORY behavior works
retired traces are excluded from retrieval
history chains are cycle-safe
agent claims can be verified against SQL truth
cloud-only adaptation works without local Ollama
Create a table:
Claim Code Location Existing Test Missing Test Risk
4. Scrum Backlog Output
Create a prioritized backlog using this format:
Sprint 0 — Reproducibility Gate
Goal: make the repo provably runnable.
Stories:
As an operator, I can run one command and know which dependencies are missing.
As an operator, I can run a minimal fixture test without the 470MB data payload.
As an operator, I can verify gateway, sidecar, observer, and MCP health with one command.
Acceptance:
just verify exists.
just smoke runs without large datasets.
failure output is structured JSON.
no test claims success when dependencies are missing.
Sprint 1 — Trust Boundary Gate
Goal: prevent agent trust collapse.
Stories:
Replace raw SQL string interpolation with validated query builders or parameterized calls.
Change observer /review failure from fail-open accept to explicit degraded/cycle verdict.
Add auth or localhost-only guardrails for mutation endpoints.
Add schema validation for every public endpoint.
Acceptance:
SQL injection tests fail before fix and pass after fix.
observer crash cannot auto-accept unsafe candidate output.
mutation endpoints require configured token or local-only mode.
Sprint 2 — Memory Correctness Gate
Goal: prove Mem0/pathway memory cannot poison itself.
Stories:
Add tests for ADD, UPDATE, REVISE, RETIRE, HISTORY.
Add cycle detection tests.
Add retired-trace retrieval exclusion tests.
Add duplicate trace replay_count tests.
Add corrupted memory row recovery test.
Acceptance:
deterministic fixture corpus
all memory operations covered
every memory mutation emits audit/event receipt
Sprint 3 — Agent Loop Reality Gate
Goal: prove the agent loop works across actual workflows.
Stories:
Build deterministic mini corpus.
Run search → verify → observer review → playbook seal → second-run retrieval.
Add negative case where observer rejects hallucinated claim.
Add regression for health endpoint content-type mismatch.
Acceptance:
single command proves the full loop
generated report includes input hash, output hash, verdict, and memory mutation receipt
Sprint 4 — Deployment Gate
Goal: turn REPLICATION.md into executable deployment validation.
Stories:
Convert REPLICATION.md validation section into scripts.
Add env var template.
Add config validation.
Remove hardcoded /home/profit/lakehouse paths.
Add systemd readiness checks.
Acceptance:
fresh clone can run just doctor
missing env vars are reported clearly
no absolute path assumptions remain unless configured
Required Final Deliverables
Create:
reports/scrum/matrix-agent-scrum-test.md
reports/scrum/risk-register.md
reports/scrum/claim-coverage-table.md
reports/scrum/sprint-backlog.md
reports/scrum/acceptance-gates.md
Do not rewrite the system yet.
First produce the reports only.
Scoring Model
Use this scoring:
Reproducibility: 0–10
Test Coverage: 0–10
Trust Boundary Safety: 0–10
Agent Memory Correctness: 0–10
Deployment Readiness: 0–10
Maintainability: 0–10
Mark each score with evidence.
Final Rule
No vibes. No “appears to work.” Every claim must point to:
file path
line/function
command output
test result
missing evidence
That’s the move: **don’t refactor yet. Put the repo under oath first.**
::contentReference[oaicite:5]{index=5}