Create `docs/TEST_PROOF_SCOPE.md`. Purpose: design a serious proof harness for the Go lakehouse refactor. You are not writing production features yet. You are designing and implementing a claims-verification test suite that proves or disproves what this system currently claims. ## System Claims To Prove The Go lakehouse claims: 1. Gateway-fronted services work as a coherent system. 2. CSV data can ingest into Parquet. 3. Catalog manifests remain consistent. 4. DuckDB query path returns correct results. 5. Embedding path works through Ollama or configured embedding backend. 6. Vector add/search works. 7. Vector persistence survives restart. 8. Service contracts are stable. 9. Refactor preserved behavior. 10. Performance claims are measurable, not vibes. ## Required Output Create a proof harness under: ```text tests/proof/ tests/proof/ README.md claims.yaml run_proof.sh lib/ env.sh http.sh assert.sh metrics.sh cases/ 00_health.sh 01_storage_roundtrip.sh 02_catalog_manifest.sh 03_ingest_csv_to_parquet.sh 04_query_correctness.sh 05_embedding_contract.sh 06_vector_add_search.sh 07_vector_persistence_restart.sh 08_gateway_contracts.sh 09_failure_modes.sh 10_perf_baseline.sh fixtures/ csv/ expected/ reports/ .gitkeep Test Design Requirements Each test must produce evidence, not just pass/fail. For every case, record: claim tested service routes called input fixture hash output hash expected result actual result pass/fail latency status codes logs location timestamp git commit hash Write results to: tests/proof/reports/proof-YYYYMMDD-HHMMSS/ Each run must produce: summary.md summary.json raw/ http/ logs/ outputs/ metrics/ Claims File Create tests/proof/claims.yaml. Each claim should have: id: GOLAKE-001 name: Gateway health routes respond type: contract services: - gateway routes: - GET /health evidence: - status_code - response_body - latency_ms required: true Include claims for: gateway health each service health storage put/get/list/delete if supported catalog create/read/update/list if supported ingest job creation Parquet output existence query correctness against known CSV fixture embedding vector dimension vector add/search nearest-neighbor correctness vector restart persistence invalid request rejection missing object behavior duplicate vector ID behavior malformed CSV behavior unavailable downstream service behavior latency baseline throughput baseline Fixtures Create deterministic fixtures. Minimum CSV fixture: id,name,role,city,score 1,Ada,welder,Chicago,91 2,Grace,electrician,Detroit,88 3,Linus,operator,Chicago,77 4,Ken,pipefitter,Houston,84 5,Barbara,safety,Houston,95 Expected query assertions: count rows = 5 city Chicago = 2 max score = 95 role safety belongs to Barbara Houston average score = 89.5 For vector tests, use deterministic text fixtures: doc-001: industrial staffing for welders in Chicago doc-002: safety compliance for warehouse crews doc-003: electrical contractors assigned to Detroit doc-004: pipefitters and heavy equipment operators in Houston Search assertions should verify that semantically related queries return expected top candidates where embeddings are enabled. If embeddings are not deterministic enough, support a contract-only mode that verifies: vector dimension non-empty vector add succeeds search returns known inserted IDs persistence survives restart Modes Support three modes: tests/proof/run_proof.sh --mode contract tests/proof/run_proof.sh --mode integration tests/proof/run_proof.sh --mode performance Contract mode Fast. No massive data. Verifies APIs, schemas, status codes, basic correctness. Integration mode Runs full gateway → service chain. Must prove: CSV fixture → storaged → ingestd → catalogd → queryd text fixture → embedd → vectord → search Performance mode Measures baseline only. Do not fake claims. Record: rows ingested/sec vectors added/sec p50/p95 query latency p50/p95 vector search latency memory usage if available CPU usage if available service restart time if available Failure-Mode Tests Add tests proving the system fails cleanly. Required: malformed JSON missing required field invalid vector dimension missing object bad SQL query duplicate vector ID downstream service unavailable if easy to simulate restart before persistence load completes if relevant Do not hide failures behind retries unless the system explicitly documents retry behavior. Hard Rules Do not add production features unless needed to expose testable behavior. Do not change public route contracts without documenting it. Do not write tests that merely check “HTTP 200” unless the claim is health-only. Do not use random data unless seeded and recorded. Do not make performance claims without before/after metrics. Do not assume Ollama is available; detect it and mark embedding tests skipped or degraded with explanation. Do not let skipped tests appear as passed. Do not silently ignore missing services. Do not make the proof harness depend on external cloud services. Final Deliverables After implementation, produce: tests/proof/README.md tests/proof/claims.yaml tests/proof/run_proof.sh tests/proof/cases/*.sh tests/proof/reports//summary.md tests/proof/reports//summary.json Final Report Must Answer At the end, write a clear report: Which claims are proven? Which claims are partially proven? Which claims failed? Which claims were skipped and why? What evidence supports each claim? What bottlenecks were measured? What contract drift was found? What refactor risks remain? What should be fixed first? Execution Plan First inspect the repo. Then produce a short implementation plan. Then build the proof harness. Then run contract mode. Then run integration mode if services can be started. Then run performance mode only if contract and integration pass. Do not declare success without evidence files.