root 91edd43164 scrum audit: 5 reports under reports/scrum/ · score 35/60

Adapts docs/SCRUM.md framework (originally written for the
matrix-agent-validated repo) to the Go rewrite. Five deliverables:

  golang-lakehouse-scrum-test.md  top-line + scoring + verdict
  risk-register.md                12 findings, R-001..R-012
  claim-coverage-table.md         claim/test/risk for Sprint 2
  sprint-backlog.md               5 sprints, ~2 weeks of work
  acceptance-gates.md             DoD as runnable commands

Every claim cites file:line, command output, or "missing evidence."
Smoke chain ran clean (33s wall, all 9 PASS) and is captured in
reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact).

Scoring:
  Reproducibility       7/10  9 smokes deterministic, no just/CI gate
  Test Coverage         6/10  internal/ packages tested, 6/7 cmd/ aren't
  Trust Boundary        7/10  escapes ok, zero auth, /sql is RCE-eq off-loopback
  Memory Correctness    3/10  pathway/playbook/observer not yet ported
  Deployment Readiness  4/10  no REPLICATION, no env template, no systemd
  Maintainability       8/10  no god-files, 7 lean binaries, ADRs current

Top three risks:
  R-001 HIGH  queryd /sql + DuckDB + non-loopback bind = RCE-equivalent
  R-002 HIGH  internal/shared (server.go + config.go) zero tests
  R-003 HIGH  internal/storeclient zero tests, used by 2 services
  R-004 MED   9-smoke chain green but not gated (no justfile/hook)

The audit is the work; refactors come after. Sprint 0 owns coverage
+ CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are
mostly design-bar work for unbuilt agent components.

.gitignore exception: /reports/* + !/reports/scrum/ keeps reports/
a runtime-artifact directory while exposing reports/scrum/ as
tracked documentation. Mirrors the pattern future audit passes will
land in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-29 04:51:47 -05:00

5.5 KiB

Raw Permalink Blame History

Scrum Test: Matrix Agent Validated Hardening Sprint

Mission

Run a Scrum-style technical validation against this repository:

https://git.agentview.dev/profit/matrix-agent-validated.git

Do not add features first. Treat the codebase as a validated prototype that now needs production-hardening pressure.

The goal is to produce a hard evidence report and a prioritized sprint backlog.

Core Questions

Can the repo be cloned, built, and smoke-tested from a clean environment?
Are the claimed validated paths actually covered by repeatable tests?
Where does the system rely on demo assumptions, hardcoded paths, permissive fallbacks, or unsafe string construction?
Which failures would corrupt trust in the agent loop?
What must be fixed before this becomes a reusable agent-memory substrate?

Required Inspection Areas

1. Build and Test Surface

Inspect:

Cargo workspace
Rust crates
Bun/TypeScript MCP server
Python sidecar
tests/
justfile
REPLICATION.md
systemd units
scripts/

Run or prepare the following commands where possible:

just --list
cargo check --workspace
cargo test --workspace
cd mcp-server && bun install && bun test || true
bun run tests/agent_test/agent_harness.ts || true

If heavy data or external services are missing, do not fake success. Record the blocker and define a mock/minimal fixture path.

2. Security and Trust Boundary Review

Search for:

raw SQL interpolation
shell command execution
open CORS
unauthenticated mutation endpoints
pass-through proxy routes
hardcoded absolute paths
secrets in repo
fail-open review behavior
unbounded file reads/writes
unsafe JSON parsing assumptions

Pay special attention to:

mcp-server/index.ts
mcp-server/observer.ts
crates/vectord/src/pathway_memory.rs
crates/vectord/src/playbook_memory.rs
scripts/
sidecar/
3. Agent Validation Review

Verify whether the following claims are actually enforced by tests:

vector retrieval across corpora
observer hand-review gates candidates
successful playbooks are sealed
retrieval surfaces prior playbooks on later runs
Mem0-style ADD / UPDATE / REVISE / RETIRE / HISTORY behavior works
retired traces are excluded from retrieval
history chains are cycle-safe
agent claims can be verified against SQL truth
cloud-only adaptation works without local Ollama

Create a table:

Claim	Code Location	Existing Test	Missing Test	Risk
4. Scrum Backlog Output

Create a prioritized backlog using this format:

Sprint 0 — Reproducibility Gate

Goal: make the repo provably runnable.

Stories:

As an operator, I can run one command and know which dependencies are missing.
As an operator, I can run a minimal fixture test without the 470MB data payload.
As an operator, I can verify gateway, sidecar, observer, and MCP health with one command.

Acceptance:

just verify exists.
just smoke runs without large datasets.
failure output is structured JSON.
no test claims success when dependencies are missing.
Sprint 1 — Trust Boundary Gate

Goal: prevent agent trust collapse.

Stories:

Replace raw SQL string interpolation with validated query builders or parameterized calls.
Change observer /review failure from fail-open accept to explicit degraded/cycle verdict.
Add auth or localhost-only guardrails for mutation endpoints.
Add schema validation for every public endpoint.

Acceptance:

SQL injection tests fail before fix and pass after fix.
observer crash cannot auto-accept unsafe candidate output.
mutation endpoints require configured token or local-only mode.
Sprint 2 — Memory Correctness Gate

Goal: prove Mem0/pathway memory cannot poison itself.

Stories:

Add tests for ADD, UPDATE, REVISE, RETIRE, HISTORY.
Add cycle detection tests.
Add retired-trace retrieval exclusion tests.
Add duplicate trace replay_count tests.
Add corrupted memory row recovery test.

Acceptance:

deterministic fixture corpus
all memory operations covered
every memory mutation emits audit/event receipt
Sprint 3 — Agent Loop Reality Gate

Goal: prove the agent loop works across actual workflows.

Stories:

Build deterministic mini corpus.
Run search → verify → observer review → playbook seal → second-run retrieval.
Add negative case where observer rejects hallucinated claim.
Add regression for health endpoint content-type mismatch.

Acceptance:

single command proves the full loop
generated report includes input hash, output hash, verdict, and memory mutation receipt
Sprint 4 — Deployment Gate

Goal: turn REPLICATION.md into executable deployment validation.

Stories:

Convert REPLICATION.md validation section into scripts.
Add env var template.
Add config validation.
Remove hardcoded /home/profit/lakehouse paths.
Add systemd readiness checks.

Acceptance:

fresh clone can run just doctor
missing env vars are reported clearly
no absolute path assumptions remain unless configured
Required Final Deliverables

Create:

reports/scrum/matrix-agent-scrum-test.md
reports/scrum/risk-register.md
reports/scrum/claim-coverage-table.md
reports/scrum/sprint-backlog.md
reports/scrum/acceptance-gates.md

Do not rewrite the system yet.

First produce the reports only.

Scoring Model

Use this scoring:

Reproducibility: 0–10
Test Coverage: 0–10
Trust Boundary Safety: 0–10
Agent Memory Correctness: 0–10
Deployment Readiness: 0–10
Maintainability: 0–10

Mark each score with evidence.

Final Rule

No vibes. No “appears to work.” Every claim must point to:

file path
line/function
command output
test result
missing evidence

That’s the move: **don’t refactor yet. Put the repo under oath first.**
::contentReference[oaicite:5]{index=5}

5.5 KiB Raw Permalink Blame History Unescape Escape