# Scrum Test: Matrix Agent Validated Hardening Sprint ## Mission Run a Scrum-style technical validation against this repository: https://git.agentview.dev/profit/matrix-agent-validated.git Do not add features first. Treat the codebase as a validated prototype that now needs production-hardening pressure. The goal is to produce a hard evidence report and a prioritized sprint backlog. ## Core Questions 1. Can the repo be cloned, built, and smoke-tested from a clean environment? 2. Are the claimed validated paths actually covered by repeatable tests? 3. Where does the system rely on demo assumptions, hardcoded paths, permissive fallbacks, or unsafe string construction? 4. Which failures would corrupt trust in the agent loop? 5. What must be fixed before this becomes a reusable agent-memory substrate? ## Required Inspection Areas ### 1. Build and Test Surface Inspect: - Cargo workspace - Rust crates - Bun/TypeScript MCP server - Python sidecar - tests/ - justfile - REPLICATION.md - systemd units - scripts/ Run or prepare the following commands where possible: ```bash just --list cargo check --workspace cargo test --workspace cd mcp-server && bun install && bun test || true bun run tests/agent_test/agent_harness.ts || true If heavy data or external services are missing, do not fake success. Record the blocker and define a mock/minimal fixture path. 2. Security and Trust Boundary Review Search for: raw SQL interpolation shell command execution open CORS unauthenticated mutation endpoints pass-through proxy routes hardcoded absolute paths secrets in repo fail-open review behavior unbounded file reads/writes unsafe JSON parsing assumptions Pay special attention to: mcp-server/index.ts mcp-server/observer.ts crates/vectord/src/pathway_memory.rs crates/vectord/src/playbook_memory.rs scripts/ sidecar/ 3. Agent Validation Review Verify whether the following claims are actually enforced by tests: vector retrieval across corpora observer hand-review gates candidates successful playbooks are sealed retrieval surfaces prior playbooks on later runs Mem0-style ADD / UPDATE / REVISE / RETIRE / HISTORY behavior works retired traces are excluded from retrieval history chains are cycle-safe agent claims can be verified against SQL truth cloud-only adaptation works without local Ollama Create a table: Claim Code Location Existing Test Missing Test Risk 4. Scrum Backlog Output Create a prioritized backlog using this format: Sprint 0 — Reproducibility Gate Goal: make the repo provably runnable. Stories: As an operator, I can run one command and know which dependencies are missing. As an operator, I can run a minimal fixture test without the 470MB data payload. As an operator, I can verify gateway, sidecar, observer, and MCP health with one command. Acceptance: just verify exists. just smoke runs without large datasets. failure output is structured JSON. no test claims success when dependencies are missing. Sprint 1 — Trust Boundary Gate Goal: prevent agent trust collapse. Stories: Replace raw SQL string interpolation with validated query builders or parameterized calls. Change observer /review failure from fail-open accept to explicit degraded/cycle verdict. Add auth or localhost-only guardrails for mutation endpoints. Add schema validation for every public endpoint. Acceptance: SQL injection tests fail before fix and pass after fix. observer crash cannot auto-accept unsafe candidate output. mutation endpoints require configured token or local-only mode. Sprint 2 — Memory Correctness Gate Goal: prove Mem0/pathway memory cannot poison itself. Stories: Add tests for ADD, UPDATE, REVISE, RETIRE, HISTORY. Add cycle detection tests. Add retired-trace retrieval exclusion tests. Add duplicate trace replay_count tests. Add corrupted memory row recovery test. Acceptance: deterministic fixture corpus all memory operations covered every memory mutation emits audit/event receipt Sprint 3 — Agent Loop Reality Gate Goal: prove the agent loop works across actual workflows. Stories: Build deterministic mini corpus. Run search → verify → observer review → playbook seal → second-run retrieval. Add negative case where observer rejects hallucinated claim. Add regression for health endpoint content-type mismatch. Acceptance: single command proves the full loop generated report includes input hash, output hash, verdict, and memory mutation receipt Sprint 4 — Deployment Gate Goal: turn REPLICATION.md into executable deployment validation. Stories: Convert REPLICATION.md validation section into scripts. Add env var template. Add config validation. Remove hardcoded /home/profit/lakehouse paths. Add systemd readiness checks. Acceptance: fresh clone can run just doctor missing env vars are reported clearly no absolute path assumptions remain unless configured Required Final Deliverables Create: reports/scrum/matrix-agent-scrum-test.md reports/scrum/risk-register.md reports/scrum/claim-coverage-table.md reports/scrum/sprint-backlog.md reports/scrum/acceptance-gates.md Do not rewrite the system yet. First produce the reports only. Scoring Model Use this scoring: Reproducibility: 0–10 Test Coverage: 0–10 Trust Boundary Safety: 0–10 Agent Memory Correctness: 0–10 Deployment Readiness: 0–10 Maintainability: 0–10 Mark each score with evidence. Final Rule No vibes. No “appears to work.” Every claim must point to: file path line/function command output test result missing evidence That’s the move: **don’t refactor yet. Put the repo under oath first.** ::contentReference[oaicite:5]{index=5}