# tests/proof — claims-verification harness

Per `docs/TEST_PROOF_SCOPE.md`. The 9 smokes prove that the system *runs*; this harness proves that the system *makes the claims it claims to make*.

## Why this exists

Smokes verify that services boot, talk, and pass deterministic round-trips.
They do not verify:

- contract drift (a route silently changes its response shape)
- semantic correctness (the SQL query says what we claim it says)
- failure-mode discipline (a malformed request returns 4xx, not silent 200)
- performance regressions (vectors/sec drops 30% on a refactor)

The proof harness produces evidence, not pass/fail. Each case writes
input/output hashes, latencies, status codes, log paths, git SHA → a
future auditor can re-run + diff.

## Layout

```
tests/proof/
  README.md            ← you are here
  claims.yaml          ← enumeration of every claim, with id + type + routes
  run_proof.sh         ← orchestrator (--mode contract|integration|performance)
  lib/
    env.sh             ← service URLs, report dir, mode, git context
    http.sh            ← curl wrappers (latency + status + body capture)
    assert.sh          ← structured assertions writing JSONL evidence
    metrics.sh         ← rss/cpu/timing capture for performance mode
  cases/
    00_health.sh
    01_storage_roundtrip.sh
    …
    10_perf_baseline.sh
  fixtures/
    csv/workers.csv         ← canonical 5-row fixture (sha-pinned)
    text/docs.txt           ← 4 deterministic vector docs
    expected/queries.json   ← expected results for the 5 SQL assertions
    expected/rankings.json  ← stored top-K rankings for vector search
  reports/
    proof-YYYYMMDD-HHMMSSZ/   ← per-run; gitignored
      summary.md
      summary.json
      raw/
        context.json    ← git_sha, hostname, timestamp, mode
        cases/<id>.jsonl  ← one JSONL line per assertion
        http/<id>/*.{json,body,headers}
        logs/<svc>.log  ← captured stdout+stderr from booted services
        metrics/<id>.jsonl
```

## Modes

```bash
just proof contract       # APIs, schemas, status codes; no big data; ~30s
just proof integration    # full chain CSV→storaged→…→queryd, text→embedd→vectord
just proof performance    # measurements only; runs after contract+integration
```

The `just` recipes wrap `tests/proof/run_proof.sh` with `--mode <X>`. Use the script directly for advanced flags (`--no-bootstrap`, `--regenerate-rankings`, `--regenerate-baseline`).

## Hard rules (from TEST_PROOF_SCOPE.md)

- Don't claim performance without before/after metrics
- Detect Ollama unavailability; mark embedding tests skipped or degraded with explanation
- Skipped tests do not appear as passed
- No silent ignore of missing services
- No external cloud dependencies
- No "HTTP 200" assertions unless the claim is health-only
- No random data without a seed

## How to read a report

After `just proof integration`:

1. Open `tests/proof/reports/proof-<ts>/summary.md` for the human view.
2. `summary.json` is the machine-readable counterpart.
3. To investigate a single failed assertion:
   - find its `case_id` in `summary.md`
   - read `raw/cases/<case_id>.jsonl` (each line is one assertion)
   - cross-reference `raw/http/<case_id>/<probe>.{json,body,headers}` for the underlying HTTP round-trip

Every record cites the git SHA at run time; a clean re-run of the same SHA against the same fixtures must produce identical evidence (modulo timestamps + non-deterministic embedding noise).

## Reading order for new contributors

1. `docs/TEST_PROOF_SCOPE.md` — the spec this harness implements.
2. `docs/CLAUDE_REFACTOR_GUARDRAILS.md` — process discipline this harness must obey when extended.
3. `tests/proof/claims.yaml` — what's claimed.
4. `tests/proof/cases/00_health.sh` — canonical case shape; copy-paste to add new cases.