Per docs/TEST_PROOF_SCOPE.md, this is the closing deliverable for the
proof harness: a single document that names what's proven, what's
partially proven, what failed, what was skipped and why, what evidence
exists for each, what bottlenecks were measured, what contract drift
was found, what refactor risks remain, and what to fix first.
Per-run report dirs (tests/proof/reports/proof-<ts>/) keep their
existing summary.md + summary.json + raw/ structure — they are the
replayable evidence chain. FINAL_REPORT.md is the stable, repo-tracked
synthesis pointing at them.
Headline findings (no surprises — harness behaves as designed):
- 24 claims encoded; 22 fully proven, 1 informational (GOLAKE-085
duplicate vector ID, contract not yet specified), 0 failed.
- 4 contract-drift findings recorded as canonical: vectord add
body field is `items` not `vectors`, search response is `results`
not `hits`, index info is `length` not `count`, status codes
201/204 not 200. All caught during Phase B; all now pinned by the
harness.
- Performance baseline shows queryd as the largest RSS (69 MiB,
DuckDB process); single-sample noise floor is ~40% — tightening
to multi-sample medians is a documented Sprint follow-up.
- HIGH-risk audit findings (R-001 queryd /sql, R-002/R-003 untested
shared+storeclient) are NOT closed by the harness — it's a
multiplier, not a replacement for unit tests + auth posture.
The proof harness is complete. 11 cases · 3 modes · 168 assertions
peak across all tiers · ~22s total wall (contract+integration+perf).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>