# claims.yaml — what the Go lakehouse claims, enumerated. # # Each claim has an id, name, type, the services + routes it touches, # the evidence shape, and whether failure is fatal (required: true) or # advisory (required: false). # # Source of truth for what cases/*.sh actually verify is the case # scripts themselves; this file is the human-readable enumeration that # the spec mandates as a deliverable. run_proof.sh validates that every # claim here has a matching case with the same CASE_ID at startup. # # Modes: # contract — fast; APIs + schemas + status codes; no big data # integration — full chain CSV→storaged→catalogd→ingestd→queryd, text→embedd→vectord # performance — measurements only; runs after contract+integration green claims: - id: GOLAKE-001 name: Gateway health route responds type: contract services: [gateway] routes: [GET /health] evidence: [status_code, response_body, latency_ms] required: true - id: GOLAKE-002 name: Each backing service health route responds type: contract services: [storaged, catalogd, ingestd, queryd, vectord, embedd] routes: [GET /health] evidence: [status_code, response_body, latency_ms, service_field_match] required: true - id: GOLAKE-003 name: Gateway proxies /v1/* to the right upstream and preserves status codes type: contract services: [gateway] routes: [GET /v1/storage/list, GET /v1/catalog/list, POST /v1/sql with empty] evidence: [upstream_match, status_passthrough, latency_ms] required: true - id: GOLAKE-010 name: Storage put/get round-trip preserves bytes type: integration services: [storaged] routes: [PUT /storage/put/*, GET /storage/get/*] evidence: [input_sha256, output_sha256, status_code, latency_ms] required: true - id: GOLAKE-011 name: Storage list returns the just-put key type: integration services: [storaged] routes: [PUT /storage/put/*, GET /storage/list] evidence: [list_contains_key, latency_ms] required: true - id: GOLAKE-012 name: Storage delete removes the key (subsequent GET 404) type: integration services: [storaged] routes: [DELETE /storage/delete/*, GET /storage/get/*] evidence: [delete_status, get_after_status] required: true - id: GOLAKE-020 name: Catalog register is idempotent on identical fingerprint type: integration services: [catalogd] routes: [POST /catalog/register] evidence: [first_existing_false, second_existing_true, dataset_id_stable] required: true - id: GOLAKE-021 name: Catalog manifest read matches what was registered type: integration services: [catalogd] routes: [POST /catalog/register, GET /catalog/manifest/*] evidence: [manifest_equality, schema_fingerprint_match] required: true - id: GOLAKE-022 name: Catalog list contains the registered dataset type: integration services: [catalogd] routes: [GET /catalog/list] evidence: [list_contains_dataset_id] required: true - id: GOLAKE-030 name: Ingest CSV → Parquet writes a parquet object that catalogd manifests type: integration services: [ingestd, storaged, catalogd] routes: [POST /ingest, GET /storage/list, GET /catalog/manifest/*] evidence: [parquet_object_exists, manifest_row_count, content_addressed_key] required: true - id: GOLAKE-040 name: Query correctness — 5 SQL assertions against the workers CSV fixture type: integration services: [queryd] routes: [POST /sql] evidence: [Q1_count_5, Q2_chicago_2, Q3_max_95, Q4_safety_barbara, Q5_houston_avg_89_5] required: true - id: GOLAKE-050 name: Embedding contract — request returns dim=768, non-empty vector type: contract services: [embedd] routes: [POST /embed] evidence: [dimension, vector_nonempty, model_echoed] required: true - id: GOLAKE-051 name: Embedding integration — top-K ranking matches stored fixture type: integration services: [embedd, vectord] routes: [POST /embed, POST /vectors/index//add, POST /vectors/index//search] evidence: [top_k_id_set, top_1_id_match] required: true notes: | Ollama embeddings are stable-but-not-bit-identical across runs. Ranking-by-cosine is deterministic at our scale; this case asserts the top-K ID set matches expected/rankings.json. Regenerable via run_proof.sh --regenerate-rankings. - id: GOLAKE-060 name: Vector add + lookup-by-ID round-trip type: contract services: [vectord] routes: [POST /vectors/index, POST /vectors/index//add, GET /vectors/index/] evidence: [add_status, lookup_returns_inserted_ids] required: true - id: GOLAKE-061 name: Vector search nearest-neighbor — inserted vector ranks #1 against itself type: contract services: [vectord] routes: [POST /vectors/index//add, POST /vectors/index//search] evidence: [top_1_id, top_1_distance] required: true - id: GOLAKE-070 name: Vector persistence — kill+restart preserves index state type: integration services: [vectord, storaged] routes: [POST /vectors/index//add, POST /vectors/index//search] evidence: [pre_restart_search, post_restart_search_dist_zero] required: true - id: GOLAKE-080 name: Failure mode — malformed JSON returns 4xx, never 5xx, never silent 200 type: contract services: [storaged, catalogd, ingestd, queryd, vectord, embedd] routes: [POST endpoints with invalid body] evidence: [per_service_status_codes, error_body_shape] required: true - id: GOLAKE-081 name: Failure mode — missing required field rejected with structured 400 type: contract services: [catalogd, vectord, embedd] routes: [POST endpoints with valid JSON but missing fields] evidence: [per_service_status_codes] required: true - id: GOLAKE-082 name: Failure mode — bad SQL returns 4xx, error message present type: contract services: [queryd] routes: [POST /sql with syntax error] evidence: [status_code, error_body_present] required: true - id: GOLAKE-083 name: Failure mode — vector dim mismatch returns 4xx type: contract services: [vectord] routes: [POST /vectors/index//add with wrong dim] evidence: [status_code] required: true - id: GOLAKE-084 name: Failure mode — missing storage object returns 404 type: contract services: [storaged] routes: [GET /storage/get/] evidence: [status_code] required: true - id: GOLAKE-085 name: Failure mode — duplicate vector ID — record actual behavior (informational) type: contract services: [vectord] routes: [POST /vectors/index//add with same id twice] evidence: [first_status, second_status, search_returns_count] required: false notes: | Spec asks us to verify duplicate-ID handling. Current behavior is not yet documented; this case records what happens so we can decide the contract. Required:false → does not fail the gate. - id: GOLAKE-100 name: Performance baseline — rows/sec ingest, vectors/sec add, query latency type: performance services: [ingestd, vectord, queryd, embedd] routes: [POST /ingest, POST /vectors/index//add, POST /sql, POST /embed] evidence: [rows_per_sec, vectors_per_sec, query_p50_ms, query_p95_ms, vector_search_p50_ms, vector_search_p95_ms, rss_peak_mb, cpu_peak_pct] required: false notes: | First run writes tests/proof/baseline.json. Subsequent runs diff against it; a regression ≥10% on any metric warns but does not fail the gate. Use --regenerate-baseline to overwrite.