Per docs/TEST_PROOF_SCOPE.md, building the claims-verification tier
above the smoke chain. This commit lays the scaffolding and proves
the orchestrator end-to-end with one canary case (00_health).
What landed:
tests/proof/
README.md how to read a report, layout, modes
claims.yaml 24 claims enumerated (GOLAKE-001..100)
run_proof.sh orchestrator with --mode {contract|integration|performance}
and --no-bootstrap / --regenerate-{rankings,baseline}
lib/
env.sh service URLs, report dir, mode, git context
http.sh curl wrappers writing per-probe JSON + body + headers
assert.sh proof_assert_{eq,ne,contains,lt,gt,status,json_eq} +
proof_skip — each emits one JSONL record per call
metrics.sh start/stop timers, value capture, RSS sampling,
percentile compute (for Phase D)
cases/
00_health.sh canary — gateway + 6 services /health → 200,
body identifies service, latency < 500ms (21 assertions)
fixtures/
csv/workers.csv spec's 5-row deterministic CSV
text/docs.txt 4 deterministic vector docs
expected/queries.json expected results for the 5 SQL assertions
Wired into the task runner:
just proof contract # canary only this commit
just proof integration # Phase C
just proof performance # Phase D
.gitignore: /tests/proof/reports/* with !.gitkeep — same pattern as
reports/scrum/_evidence/. Per-run output is a runtime artifact.
Specs landed alongside (J's drops):
docs/TEST_PROOF_SCOPE.md the harness contract this implements
docs/CLAUDE_REFACTOR_GUARDRAILS.md process discipline this harness obeys
Verified end-to-end (cached binaries):
just proof contract wall < 2s, 21 pass / 0 fail / 0 skip
just verify wall 31s, vet + test + 9 smokes still green
Two bugs fixed during canary run, both in run_proof.sh aggregation:
- grep -c exits 1 on zero matches; the `|| echo 0` form concatenated
"0\n0" and broke jq --argjson + integer comparison. Fixed via a
_count helper that captures count-or-zero cleanly.
- per-case table iterated case scripts (filename-based) but cases
write evidence under CASE_ID. Switched to JSONL-file iteration so
multi-case scripts work and the mapping is faithful.
Phase B (contract cases) lands next: 05_embedding, 06_vector_add,
08_gateway_contracts, 09_failure_modes. Each sourcing the same lib
helpers and writing to the same report shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
37 lines
1.1 KiB
JSON
37 lines
1.1 KiB
JSON
{
|
|
"_comment": "Expected results for the 5 SQL assertions in 04_query_correctness against fixtures/csv/workers.csv. The CSV is content-addressed; if its hash changes, this file must be re-derived.",
|
|
"fixture_sha256": "computed at runtime by 03_ingest_csv_to_parquet — see actual.fixture_sha in evidence",
|
|
"queries": [
|
|
{
|
|
"id": "Q1",
|
|
"claim": "row count = 5",
|
|
"sql": "SELECT count(*) AS n FROM workers",
|
|
"expected": {"n": 5}
|
|
},
|
|
{
|
|
"id": "Q2",
|
|
"claim": "Chicago row count = 2",
|
|
"sql": "SELECT count(*) AS n FROM workers WHERE city = 'Chicago'",
|
|
"expected": {"n": 2}
|
|
},
|
|
{
|
|
"id": "Q3",
|
|
"claim": "max score = 95",
|
|
"sql": "SELECT max(score) AS m FROM workers",
|
|
"expected": {"m": 95}
|
|
},
|
|
{
|
|
"id": "Q4",
|
|
"claim": "role = safety belongs to Barbara",
|
|
"sql": "SELECT name FROM workers WHERE role = 'safety'",
|
|
"expected": {"name": "Barbara"}
|
|
},
|
|
{
|
|
"id": "Q5",
|
|
"claim": "Houston average score = 89.5",
|
|
"sql": "SELECT avg(score) AS avg FROM workers WHERE city = 'Houston'",
|
|
"expected": {"avg": 89.5}
|
|
}
|
|
]
|
|
}
|