Per docs/TEST_PROOF_SCOPE.md, building the claims-verification tier
above the smoke chain. This commit lays the scaffolding and proves
the orchestrator end-to-end with one canary case (00_health).
What landed:
tests/proof/
README.md how to read a report, layout, modes
claims.yaml 24 claims enumerated (GOLAKE-001..100)
run_proof.sh orchestrator with --mode {contract|integration|performance}
and --no-bootstrap / --regenerate-{rankings,baseline}
lib/
env.sh service URLs, report dir, mode, git context
http.sh curl wrappers writing per-probe JSON + body + headers
assert.sh proof_assert_{eq,ne,contains,lt,gt,status,json_eq} +
proof_skip — each emits one JSONL record per call
metrics.sh start/stop timers, value capture, RSS sampling,
percentile compute (for Phase D)
cases/
00_health.sh canary — gateway + 6 services /health → 200,
body identifies service, latency < 500ms (21 assertions)
fixtures/
csv/workers.csv spec's 5-row deterministic CSV
text/docs.txt 4 deterministic vector docs
expected/queries.json expected results for the 5 SQL assertions
Wired into the task runner:
just proof contract # canary only this commit
just proof integration # Phase C
just proof performance # Phase D
.gitignore: /tests/proof/reports/* with !.gitkeep — same pattern as
reports/scrum/_evidence/. Per-run output is a runtime artifact.
Specs landed alongside (J's drops):
docs/TEST_PROOF_SCOPE.md the harness contract this implements
docs/CLAUDE_REFACTOR_GUARDRAILS.md process discipline this harness obeys
Verified end-to-end (cached binaries):
just proof contract wall < 2s, 21 pass / 0 fail / 0 skip
just verify wall 31s, vet + test + 9 smokes still green
Two bugs fixed during canary run, both in run_proof.sh aggregation:
- grep -c exits 1 on zero matches; the `|| echo 0` form concatenated
"0\n0" and broke jq --argjson + integer comparison. Fixed via a
_count helper that captures count-or-zero cleanly.
- per-case table iterated case scripts (filename-based) but cases
write evidence under CASE_ID. Switched to JSONL-file iteration so
multi-case scripts work and the mapping is faithful.
Phase B (contract cases) lands next: 05_embedding, 06_vector_add,
08_gateway_contracts, 09_failure_modes. Each sourcing the same lib
helpers and writing to the same report shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
215 lines
7.6 KiB
YAML
215 lines
7.6 KiB
YAML
# claims.yaml — what the Go lakehouse claims, enumerated.
|
|
#
|
|
# Each claim has an id, name, type, the services + routes it touches,
|
|
# the evidence shape, and whether failure is fatal (required: true) or
|
|
# advisory (required: false).
|
|
#
|
|
# Source of truth for what cases/*.sh actually verify is the case
|
|
# scripts themselves; this file is the human-readable enumeration that
|
|
# the spec mandates as a deliverable. run_proof.sh validates that every
|
|
# claim here has a matching case with the same CASE_ID at startup.
|
|
#
|
|
# Modes:
|
|
# contract — fast; APIs + schemas + status codes; no big data
|
|
# integration — full chain CSV→storaged→catalogd→ingestd→queryd, text→embedd→vectord
|
|
# performance — measurements only; runs after contract+integration green
|
|
|
|
claims:
|
|
- id: GOLAKE-001
|
|
name: Gateway health route responds
|
|
type: contract
|
|
services: [gateway]
|
|
routes: [GET /health]
|
|
evidence: [status_code, response_body, latency_ms]
|
|
required: true
|
|
|
|
- id: GOLAKE-002
|
|
name: Each backing service health route responds
|
|
type: contract
|
|
services: [storaged, catalogd, ingestd, queryd, vectord, embedd]
|
|
routes: [GET /health]
|
|
evidence: [status_code, response_body, latency_ms, service_field_match]
|
|
required: true
|
|
|
|
- id: GOLAKE-003
|
|
name: Gateway proxies /v1/* to the right upstream and preserves status codes
|
|
type: contract
|
|
services: [gateway]
|
|
routes: [GET /v1/storage/list, GET /v1/catalog/list, POST /v1/sql with empty]
|
|
evidence: [upstream_match, status_passthrough, latency_ms]
|
|
required: true
|
|
|
|
- id: GOLAKE-010
|
|
name: Storage put/get round-trip preserves bytes
|
|
type: integration
|
|
services: [storaged]
|
|
routes: [PUT /storage/put/*, GET /storage/get/*]
|
|
evidence: [input_sha256, output_sha256, status_code, latency_ms]
|
|
required: true
|
|
|
|
- id: GOLAKE-011
|
|
name: Storage list returns the just-put key
|
|
type: integration
|
|
services: [storaged]
|
|
routes: [PUT /storage/put/*, GET /storage/list]
|
|
evidence: [list_contains_key, latency_ms]
|
|
required: true
|
|
|
|
- id: GOLAKE-012
|
|
name: Storage delete removes the key (subsequent GET 404)
|
|
type: integration
|
|
services: [storaged]
|
|
routes: [DELETE /storage/delete/*, GET /storage/get/*]
|
|
evidence: [delete_status, get_after_status]
|
|
required: true
|
|
|
|
- id: GOLAKE-020
|
|
name: Catalog register is idempotent on identical fingerprint
|
|
type: integration
|
|
services: [catalogd]
|
|
routes: [POST /catalog/register]
|
|
evidence: [first_existing_false, second_existing_true, dataset_id_stable]
|
|
required: true
|
|
|
|
- id: GOLAKE-021
|
|
name: Catalog manifest read matches what was registered
|
|
type: integration
|
|
services: [catalogd]
|
|
routes: [POST /catalog/register, GET /catalog/manifest/*]
|
|
evidence: [manifest_equality, schema_fingerprint_match]
|
|
required: true
|
|
|
|
- id: GOLAKE-022
|
|
name: Catalog list contains the registered dataset
|
|
type: integration
|
|
services: [catalogd]
|
|
routes: [GET /catalog/list]
|
|
evidence: [list_contains_dataset_id]
|
|
required: true
|
|
|
|
- id: GOLAKE-030
|
|
name: Ingest CSV → Parquet writes a parquet object that catalogd manifests
|
|
type: integration
|
|
services: [ingestd, storaged, catalogd]
|
|
routes: [POST /ingest, GET /storage/list, GET /catalog/manifest/*]
|
|
evidence: [parquet_object_exists, manifest_row_count, content_addressed_key]
|
|
required: true
|
|
|
|
- id: GOLAKE-040
|
|
name: Query correctness — 5 SQL assertions against the workers CSV fixture
|
|
type: integration
|
|
services: [queryd]
|
|
routes: [POST /sql]
|
|
evidence: [Q1_count_5, Q2_chicago_2, Q3_max_95, Q4_safety_barbara, Q5_houston_avg_89_5]
|
|
required: true
|
|
|
|
- id: GOLAKE-050
|
|
name: Embedding contract — request returns dim=768, non-empty vector
|
|
type: contract
|
|
services: [embedd]
|
|
routes: [POST /embed]
|
|
evidence: [dimension, vector_nonempty, model_echoed]
|
|
required: true
|
|
|
|
- id: GOLAKE-051
|
|
name: Embedding integration — top-K ranking matches stored fixture
|
|
type: integration
|
|
services: [embedd, vectord]
|
|
routes: [POST /embed, POST /vectors/index/<n>/add, POST /vectors/index/<n>/search]
|
|
evidence: [top_k_id_set, top_1_id_match]
|
|
required: true
|
|
notes: |
|
|
Ollama embeddings are stable-but-not-bit-identical across runs.
|
|
Ranking-by-cosine is deterministic at our scale; this case asserts
|
|
the top-K ID set matches expected/rankings.json. Regenerable via
|
|
run_proof.sh --regenerate-rankings.
|
|
|
|
- id: GOLAKE-060
|
|
name: Vector add + lookup-by-ID round-trip
|
|
type: contract
|
|
services: [vectord]
|
|
routes: [POST /vectors/index, POST /vectors/index/<n>/add, GET /vectors/index/<n>]
|
|
evidence: [add_status, lookup_returns_inserted_ids]
|
|
required: true
|
|
|
|
- id: GOLAKE-061
|
|
name: Vector search nearest-neighbor — inserted vector ranks #1 against itself
|
|
type: contract
|
|
services: [vectord]
|
|
routes: [POST /vectors/index/<n>/add, POST /vectors/index/<n>/search]
|
|
evidence: [top_1_id, top_1_distance]
|
|
required: true
|
|
|
|
- id: GOLAKE-070
|
|
name: Vector persistence — kill+restart preserves index state
|
|
type: integration
|
|
services: [vectord, storaged]
|
|
routes: [POST /vectors/index/<n>/add, POST /vectors/index/<n>/search]
|
|
evidence: [pre_restart_search, post_restart_search_dist_zero]
|
|
required: true
|
|
|
|
- id: GOLAKE-080
|
|
name: Failure mode — malformed JSON returns 4xx, never 5xx, never silent 200
|
|
type: contract
|
|
services: [storaged, catalogd, ingestd, queryd, vectord, embedd]
|
|
routes: [POST endpoints with invalid body]
|
|
evidence: [per_service_status_codes, error_body_shape]
|
|
required: true
|
|
|
|
- id: GOLAKE-081
|
|
name: Failure mode — missing required field rejected with structured 400
|
|
type: contract
|
|
services: [catalogd, vectord, embedd]
|
|
routes: [POST endpoints with valid JSON but missing fields]
|
|
evidence: [per_service_status_codes]
|
|
required: true
|
|
|
|
- id: GOLAKE-082
|
|
name: Failure mode — bad SQL returns 4xx, error message present
|
|
type: contract
|
|
services: [queryd]
|
|
routes: [POST /sql with syntax error]
|
|
evidence: [status_code, error_body_present]
|
|
required: true
|
|
|
|
- id: GOLAKE-083
|
|
name: Failure mode — vector dim mismatch returns 4xx
|
|
type: contract
|
|
services: [vectord]
|
|
routes: [POST /vectors/index/<n>/add with wrong dim]
|
|
evidence: [status_code]
|
|
required: true
|
|
|
|
- id: GOLAKE-084
|
|
name: Failure mode — missing storage object returns 404
|
|
type: contract
|
|
services: [storaged]
|
|
routes: [GET /storage/get/<unseen-key>]
|
|
evidence: [status_code]
|
|
required: true
|
|
|
|
- id: GOLAKE-085
|
|
name: Failure mode — duplicate vector ID — record actual behavior (informational)
|
|
type: contract
|
|
services: [vectord]
|
|
routes: [POST /vectors/index/<n>/add with same id twice]
|
|
evidence: [first_status, second_status, search_returns_count]
|
|
required: false
|
|
notes: |
|
|
Spec asks us to verify duplicate-ID handling. Current behavior is
|
|
not yet documented; this case records what happens so we can
|
|
decide the contract. Required:false → does not fail the gate.
|
|
|
|
- id: GOLAKE-100
|
|
name: Performance baseline — rows/sec ingest, vectors/sec add, query latency
|
|
type: performance
|
|
services: [ingestd, vectord, queryd, embedd]
|
|
routes: [POST /ingest, POST /vectors/index/<n>/add, POST /sql, POST /embed]
|
|
evidence: [rows_per_sec, vectors_per_sec, query_p50_ms, query_p95_ms,
|
|
vector_search_p50_ms, vector_search_p95_ms, rss_peak_mb, cpu_peak_pct]
|
|
required: false
|
|
notes: |
|
|
First run writes tests/proof/baseline.json. Subsequent runs diff
|
|
against it; a regression ≥10% on any metric warns but does not
|
|
fail the gate. Use --regenerate-baseline to overwrite.
|