proof harness Phase A: scaffolding + canary case green
Per docs/TEST_PROOF_SCOPE.md, building the claims-verification tier
above the smoke chain. This commit lays the scaffolding and proves
the orchestrator end-to-end with one canary case (00_health).
What landed:
tests/proof/
README.md how to read a report, layout, modes
claims.yaml 24 claims enumerated (GOLAKE-001..100)
run_proof.sh orchestrator with --mode {contract|integration|performance}
and --no-bootstrap / --regenerate-{rankings,baseline}
lib/
env.sh service URLs, report dir, mode, git context
http.sh curl wrappers writing per-probe JSON + body + headers
assert.sh proof_assert_{eq,ne,contains,lt,gt,status,json_eq} +
proof_skip — each emits one JSONL record per call
metrics.sh start/stop timers, value capture, RSS sampling,
percentile compute (for Phase D)
cases/
00_health.sh canary — gateway + 6 services /health → 200,
body identifies service, latency < 500ms (21 assertions)
fixtures/
csv/workers.csv spec's 5-row deterministic CSV
text/docs.txt 4 deterministic vector docs
expected/queries.json expected results for the 5 SQL assertions
Wired into the task runner:
just proof contract # canary only this commit
just proof integration # Phase C
just proof performance # Phase D
.gitignore: /tests/proof/reports/* with !.gitkeep — same pattern as
reports/scrum/_evidence/. Per-run output is a runtime artifact.
Specs landed alongside (J's drops):
docs/TEST_PROOF_SCOPE.md the harness contract this implements
docs/CLAUDE_REFACTOR_GUARDRAILS.md process discipline this harness obeys
Verified end-to-end (cached binaries):
just proof contract wall < 2s, 21 pass / 0 fail / 0 skip
just verify wall 31s, vet + test + 9 smokes still green
Two bugs fixed during canary run, both in run_proof.sh aggregation:
- grep -c exits 1 on zero matches; the `|| echo 0` form concatenated
"0\n0" and broke jq --argjson + integer comparison. Fixed via a
_count helper that captures count-or-zero cleanly.
- per-case table iterated case scripts (filename-based) but cases
write evidence under CASE_ID. Switched to JSONL-file iteration so
multi-case scripts work and the mapping is faithful.
Phase B (contract cases) lands next: 05_embedding, 06_vector_add,
08_gateway_contracts, 09_failure_modes. Each sourcing the same lib
helpers and writing to the same report shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e31638204d
commit
a81291e38c
5
.gitignore
vendored
5
.gitignore
vendored
@ -44,6 +44,11 @@ vendor/
|
|||||||
/reports/scrum/_evidence/*
|
/reports/scrum/_evidence/*
|
||||||
!/reports/scrum/_evidence/.gitkeep
|
!/reports/scrum/_evidence/.gitkeep
|
||||||
|
|
||||||
|
# Proof harness runtime output — same pattern as reports/scrum/_evidence.
|
||||||
|
# Track the directory but ignore per-run subdirs.
|
||||||
|
/tests/proof/reports/*
|
||||||
|
!/tests/proof/reports/.gitkeep
|
||||||
|
|
||||||
# Secrets — never commit. Resolved via SecretsProvider per ADR-001 §1.x.
|
# Secrets — never commit. Resolved via SecretsProvider per ADR-001 §1.x.
|
||||||
*.env
|
*.env
|
||||||
secrets.toml
|
secrets.toml
|
||||||
|
|||||||
281
docs/CLAUDE_REFACTOR_GUARDRAILS.md
Normal file
281
docs/CLAUDE_REFACTOR_GUARDRAILS.md
Normal file
@ -0,0 +1,281 @@
|
|||||||
|
# Claude Refactor Guardrails — Go Lakehouse
|
||||||
|
|
||||||
|
## Mission
|
||||||
|
|
||||||
|
Continue the Go refactor without recreating the Rust-era complexity.
|
||||||
|
|
||||||
|
The Go rewrite exists to make the lakehouse operationally legible:
|
||||||
|
|
||||||
|
- small binaries
|
||||||
|
- clear service boundaries
|
||||||
|
- gateway-fronted APIs
|
||||||
|
- smoke-testable behavior
|
||||||
|
- fast build/run/debug loop
|
||||||
|
- no accidental framework bureaucracy
|
||||||
|
|
||||||
|
The Rust repo had maturity, but also accumulated control-plane weight: validator, auditor, provider routing, iteration loops, UI, truth layers, and agent-era scaffolding. Do not blindly port all of that back.
|
||||||
|
|
||||||
|
## Prime Directive
|
||||||
|
|
||||||
|
Preserve the Go spine:
|
||||||
|
|
||||||
|
```text
|
||||||
|
gateway
|
||||||
|
→ storaged
|
||||||
|
→ catalogd
|
||||||
|
→ ingestd
|
||||||
|
→ queryd
|
||||||
|
→ vectord
|
||||||
|
→ embedd
|
||||||
|
Only add complexity when there is measured evidence, a failing smoke, or a documented feature-parity requirement.
|
||||||
|
|
||||||
|
Refactor Rules
|
||||||
|
1. No silent behavior changes
|
||||||
|
|
||||||
|
Before changing any service behavior, identify:
|
||||||
|
|
||||||
|
current route
|
||||||
|
current request schema
|
||||||
|
current response schema
|
||||||
|
current status-code behavior
|
||||||
|
current smoke test covering it
|
||||||
|
|
||||||
|
If no smoke exists, add one before or with the refactor.
|
||||||
|
|
||||||
|
2. Keep service boundaries hard
|
||||||
|
|
||||||
|
Do not let services reach into each other’s internals.
|
||||||
|
|
||||||
|
Allowed:
|
||||||
|
|
||||||
|
HTTP client calls
|
||||||
|
shared request/response structs when stable
|
||||||
|
small internal packages for config/secrets/logging
|
||||||
|
|
||||||
|
Avoid:
|
||||||
|
|
||||||
|
importing another service’s implementation package
|
||||||
|
hidden global state
|
||||||
|
“just for now” shared mutable registries
|
||||||
|
circular service knowledge
|
||||||
|
3. Go is not Rust
|
||||||
|
|
||||||
|
Do not imitate Rust patterns mechanically.
|
||||||
|
|
||||||
|
Prefer Go-native clarity:
|
||||||
|
|
||||||
|
simple structs
|
||||||
|
explicit errors
|
||||||
|
small interfaces at the consumer side
|
||||||
|
context-aware HTTP handlers
|
||||||
|
table-driven tests
|
||||||
|
boring package names
|
||||||
|
no abstraction tax unless repeated 3+ times
|
||||||
|
4. Validation replaces the borrow checker
|
||||||
|
|
||||||
|
Rust caught many problems at compile time. Go will not.
|
||||||
|
|
||||||
|
Therefore every refactor must preserve or improve:
|
||||||
|
|
||||||
|
input validation
|
||||||
|
dimension checks
|
||||||
|
duplicate handling
|
||||||
|
restart persistence checks
|
||||||
|
schema drift detection
|
||||||
|
error status mapping
|
||||||
|
smoke coverage
|
||||||
|
5. Performance work must be measured
|
||||||
|
|
||||||
|
Do not optimize by vibes.
|
||||||
|
|
||||||
|
For each performance change, record:
|
||||||
|
|
||||||
|
baseline command
|
||||||
|
baseline result
|
||||||
|
changed code path
|
||||||
|
new result
|
||||||
|
regression risk
|
||||||
|
rollback plan
|
||||||
|
|
||||||
|
Current known bottleneck:
|
||||||
|
|
||||||
|
vectord Add is RWMutex-serialized.
|
||||||
|
500K vectors: ~35m36s, ~234/sec avg.
|
||||||
|
GPU around 65%, so embedding is not the only bottleneck.
|
||||||
|
|
||||||
|
Do not claim concurrency improvements unless the HNSW library thread-safety is audited or writes are safely batched/sharded.
|
||||||
|
|
||||||
|
File/Package Expectations
|
||||||
|
cmd/
|
||||||
|
|
||||||
|
One binary per service. Keep main files thin.
|
||||||
|
|
||||||
|
Main should only:
|
||||||
|
|
||||||
|
load config
|
||||||
|
construct dependencies
|
||||||
|
wire routes
|
||||||
|
start server
|
||||||
|
handle shutdown
|
||||||
|
internal/
|
||||||
|
|
||||||
|
Shared code belongs here only when it is genuinely shared.
|
||||||
|
|
||||||
|
Good internal packages:
|
||||||
|
|
||||||
|
config
|
||||||
|
secrets
|
||||||
|
storeclient
|
||||||
|
catalogclient
|
||||||
|
gateway routing helpers
|
||||||
|
logging
|
||||||
|
request/response contracts
|
||||||
|
|
||||||
|
Bad internal packages:
|
||||||
|
|
||||||
|
vague “utils”
|
||||||
|
giant “common”
|
||||||
|
cross-service god objects
|
||||||
|
hidden dependency containers
|
||||||
|
scripts/
|
||||||
|
|
||||||
|
Every major behavior needs a runnable smoke.
|
||||||
|
|
||||||
|
Smokes are not decoration. They are the replacement nervous system.
|
||||||
|
|
||||||
|
Existing smoke pattern must remain:
|
||||||
|
|
||||||
|
d1 skeleton/health/gateway
|
||||||
|
d2 storaged
|
||||||
|
d3 catalogd
|
||||||
|
d4 ingestd
|
||||||
|
d5 queryd
|
||||||
|
d6 full ingest/query
|
||||||
|
g1 vectord
|
||||||
|
g1p vectord persistence
|
||||||
|
g2 embed → vector add → search
|
||||||
|
|
||||||
|
New functionality needs a new smoke or an extension to the closest existing one.
|
||||||
|
|
||||||
|
Refactor Checklist
|
||||||
|
|
||||||
|
Before editing:
|
||||||
|
|
||||||
|
Read README.md
|
||||||
|
Read docs/PRD.md
|
||||||
|
Read docs/SPEC.md
|
||||||
|
Read docs/DECISIONS.md
|
||||||
|
Read docs/PHASE_G0_KICKOFF.md
|
||||||
|
Identify affected services
|
||||||
|
Identify affected smokes
|
||||||
|
|
||||||
|
During editing:
|
||||||
|
|
||||||
|
Keep public API stable unless explicitly changing it
|
||||||
|
Keep errors explicit
|
||||||
|
Keep logs useful but not noisy
|
||||||
|
Avoid package sprawl
|
||||||
|
Avoid premature generic interfaces
|
||||||
|
Preserve restart behavior
|
||||||
|
Preserve gateway-only acceptance path
|
||||||
|
|
||||||
|
After editing:
|
||||||
|
|
||||||
|
Run go test ./...
|
||||||
|
Run relevant smoke script
|
||||||
|
Run full smoke loop when service contracts changed
|
||||||
|
Record evidence in a short refactor note
|
||||||
|
|
||||||
|
Full smoke loop:
|
||||||
|
|
||||||
|
for s in scripts/{d1,d2,d3,d4,d5,d6,g1,g1p,g2}_smoke.sh; do
|
||||||
|
"$s" || break
|
||||||
|
done
|
||||||
|
Refactor Note Format
|
||||||
|
|
||||||
|
Create or update:
|
||||||
|
|
||||||
|
docs/refactor-notes/YYYYMMDD-<short-name>.md
|
||||||
|
|
||||||
|
Use this structure:
|
||||||
|
|
||||||
|
# Refactor Note: <name>
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
## Files changed
|
||||||
|
|
||||||
|
## Behavior changed
|
||||||
|
|
||||||
|
## Behavior preserved
|
||||||
|
|
||||||
|
## Tests run
|
||||||
|
|
||||||
|
## Smoke results
|
||||||
|
|
||||||
|
## Performance before/after
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
Anti-Patterns To Reject
|
||||||
|
Reject these unless specifically requested:
|
||||||
|
|
||||||
|
porting Rust modules 1:1
|
||||||
|
adding orchestration before service parity
|
||||||
|
adding AI/agent logic inside core services
|
||||||
|
making gateway business-aware
|
||||||
|
hiding failures behind retries
|
||||||
|
swallowing errors
|
||||||
|
“temporary” global maps
|
||||||
|
changing route contracts without smoke updates
|
||||||
|
adding dependencies for trivial code
|
||||||
|
optimizing vector ingestion without measurement
|
||||||
|
rebuilding the Rust bureaucracy in Go clothing
|
||||||
|
Preferred Next Targets
|
||||||
|
|
||||||
|
Prioritize in this order:
|
||||||
|
|
||||||
|
Stability of existing Go service contracts
|
||||||
|
Better smoke coverage
|
||||||
|
Persistence limits and large object handling
|
||||||
|
vectord ingestion bottleneck analysis
|
||||||
|
gateway observability
|
||||||
|
feature parity with Rust only where needed
|
||||||
|
UI/agent/auditor layers later, not now
|
||||||
|
Architectural Position
|
||||||
|
|
||||||
|
The Go rewrite should remain the production spine.
|
||||||
|
|
||||||
|
The Rust system remains historical reference and possible source for:
|
||||||
|
|
||||||
|
validation ideas
|
||||||
|
audit semantics
|
||||||
|
provider-routing lessons
|
||||||
|
prior acceptance criteria
|
||||||
|
edge cases
|
||||||
|
|
||||||
|
But Rust is not the shape to copy.
|
||||||
|
|
||||||
|
Go owns the clean operational path.
|
||||||
|
Rust owns historical scar tissue and high-performance lessons.
|
||||||
|
|
||||||
|
Do not confuse the two.
|
||||||
|
|
||||||
|
|
||||||
|
Also add a shorter command prompt when you hand this to Claude Code:
|
||||||
|
|
||||||
|
```md
|
||||||
|
Read `docs/CLAUDE_REFACTOR_GUARDRAILS.md` first.
|
||||||
|
|
||||||
|
Then inspect the current Go lakehouse repo and produce a refactor plan only. Do not edit code yet.
|
||||||
|
|
||||||
|
Your plan must identify:
|
||||||
|
1. affected services
|
||||||
|
2. affected routes
|
||||||
|
3. affected request/response contracts
|
||||||
|
4. affected smoke scripts
|
||||||
|
5. risks of accidentally reintroducing Rust-era complexity
|
||||||
|
6. exact tests/smokes you will run after changes
|
||||||
|
|
||||||
|
Do not port Rust structure blindly. Preserve the Go service spine.
|
||||||
258
docs/TEST_PROOF_SCOPE.md
Normal file
258
docs/TEST_PROOF_SCOPE.md
Normal file
@ -0,0 +1,258 @@
|
|||||||
|
Create `docs/TEST_PROOF_SCOPE.md`.
|
||||||
|
|
||||||
|
Purpose: design a serious proof harness for the Go lakehouse refactor.
|
||||||
|
|
||||||
|
You are not writing production features yet. You are designing and implementing a claims-verification test suite that proves or disproves what this system currently claims.
|
||||||
|
|
||||||
|
## System Claims To Prove
|
||||||
|
|
||||||
|
The Go lakehouse claims:
|
||||||
|
|
||||||
|
1. Gateway-fronted services work as a coherent system.
|
||||||
|
2. CSV data can ingest into Parquet.
|
||||||
|
3. Catalog manifests remain consistent.
|
||||||
|
4. DuckDB query path returns correct results.
|
||||||
|
5. Embedding path works through Ollama or configured embedding backend.
|
||||||
|
6. Vector add/search works.
|
||||||
|
7. Vector persistence survives restart.
|
||||||
|
8. Service contracts are stable.
|
||||||
|
9. Refactor preserved behavior.
|
||||||
|
10. Performance claims are measurable, not vibes.
|
||||||
|
|
||||||
|
## Required Output
|
||||||
|
|
||||||
|
Create a proof harness under:
|
||||||
|
|
||||||
|
```text
|
||||||
|
tests/proof/
|
||||||
|
tests/proof/
|
||||||
|
README.md
|
||||||
|
claims.yaml
|
||||||
|
run_proof.sh
|
||||||
|
lib/
|
||||||
|
env.sh
|
||||||
|
http.sh
|
||||||
|
assert.sh
|
||||||
|
metrics.sh
|
||||||
|
cases/
|
||||||
|
00_health.sh
|
||||||
|
01_storage_roundtrip.sh
|
||||||
|
02_catalog_manifest.sh
|
||||||
|
03_ingest_csv_to_parquet.sh
|
||||||
|
04_query_correctness.sh
|
||||||
|
05_embedding_contract.sh
|
||||||
|
06_vector_add_search.sh
|
||||||
|
07_vector_persistence_restart.sh
|
||||||
|
08_gateway_contracts.sh
|
||||||
|
09_failure_modes.sh
|
||||||
|
10_perf_baseline.sh
|
||||||
|
fixtures/
|
||||||
|
csv/
|
||||||
|
expected/
|
||||||
|
reports/
|
||||||
|
.gitkeep
|
||||||
|
|
||||||
|
Test Design Requirements
|
||||||
|
|
||||||
|
Each test must produce evidence, not just pass/fail.
|
||||||
|
|
||||||
|
For every case, record:
|
||||||
|
|
||||||
|
claim tested
|
||||||
|
service routes called
|
||||||
|
input fixture hash
|
||||||
|
output hash
|
||||||
|
expected result
|
||||||
|
actual result
|
||||||
|
pass/fail
|
||||||
|
latency
|
||||||
|
status codes
|
||||||
|
logs location
|
||||||
|
timestamp
|
||||||
|
git commit hash
|
||||||
|
|
||||||
|
Write results to:
|
||||||
|
|
||||||
|
tests/proof/reports/proof-YYYYMMDD-HHMMSS/
|
||||||
|
|
||||||
|
Each run must produce:
|
||||||
|
|
||||||
|
summary.md
|
||||||
|
summary.json
|
||||||
|
raw/
|
||||||
|
http/
|
||||||
|
logs/
|
||||||
|
outputs/
|
||||||
|
metrics/
|
||||||
|
Claims File
|
||||||
|
|
||||||
|
Create tests/proof/claims.yaml.
|
||||||
|
|
||||||
|
Each claim should have:
|
||||||
|
|
||||||
|
id: GOLAKE-001
|
||||||
|
name: Gateway health routes respond
|
||||||
|
type: contract
|
||||||
|
services:
|
||||||
|
- gateway
|
||||||
|
routes:
|
||||||
|
- GET /health
|
||||||
|
evidence:
|
||||||
|
- status_code
|
||||||
|
- response_body
|
||||||
|
- latency_ms
|
||||||
|
required: true
|
||||||
|
|
||||||
|
Include claims for:
|
||||||
|
|
||||||
|
gateway health
|
||||||
|
each service health
|
||||||
|
storage put/get/list/delete if supported
|
||||||
|
catalog create/read/update/list if supported
|
||||||
|
ingest job creation
|
||||||
|
Parquet output existence
|
||||||
|
query correctness against known CSV fixture
|
||||||
|
embedding vector dimension
|
||||||
|
vector add/search nearest-neighbor correctness
|
||||||
|
vector restart persistence
|
||||||
|
invalid request rejection
|
||||||
|
missing object behavior
|
||||||
|
duplicate vector ID behavior
|
||||||
|
malformed CSV behavior
|
||||||
|
unavailable downstream service behavior
|
||||||
|
latency baseline
|
||||||
|
throughput baseline
|
||||||
|
Fixtures
|
||||||
|
|
||||||
|
Create deterministic fixtures.
|
||||||
|
|
||||||
|
Minimum CSV fixture:
|
||||||
|
|
||||||
|
id,name,role,city,score
|
||||||
|
1,Ada,welder,Chicago,91
|
||||||
|
2,Grace,electrician,Detroit,88
|
||||||
|
3,Linus,operator,Chicago,77
|
||||||
|
4,Ken,pipefitter,Houston,84
|
||||||
|
5,Barbara,safety,Houston,95
|
||||||
|
|
||||||
|
Expected query assertions:
|
||||||
|
|
||||||
|
count rows = 5
|
||||||
|
city Chicago = 2
|
||||||
|
max score = 95
|
||||||
|
role safety belongs to Barbara
|
||||||
|
Houston average score = 89.5
|
||||||
|
|
||||||
|
For vector tests, use deterministic text fixtures:
|
||||||
|
|
||||||
|
doc-001: industrial staffing for welders in Chicago
|
||||||
|
doc-002: safety compliance for warehouse crews
|
||||||
|
doc-003: electrical contractors assigned to Detroit
|
||||||
|
doc-004: pipefitters and heavy equipment operators in Houston
|
||||||
|
|
||||||
|
Search assertions should verify that semantically related queries return expected top candidates where embeddings are enabled.
|
||||||
|
|
||||||
|
If embeddings are not deterministic enough, support a contract-only mode that verifies:
|
||||||
|
|
||||||
|
vector dimension
|
||||||
|
non-empty vector
|
||||||
|
add succeeds
|
||||||
|
search returns known inserted IDs
|
||||||
|
persistence survives restart
|
||||||
|
Modes
|
||||||
|
|
||||||
|
Support three modes:
|
||||||
|
|
||||||
|
tests/proof/run_proof.sh --mode contract
|
||||||
|
tests/proof/run_proof.sh --mode integration
|
||||||
|
tests/proof/run_proof.sh --mode performance
|
||||||
|
Contract mode
|
||||||
|
|
||||||
|
Fast. No massive data. Verifies APIs, schemas, status codes, basic correctness.
|
||||||
|
|
||||||
|
Integration mode
|
||||||
|
|
||||||
|
Runs full gateway → service chain.
|
||||||
|
|
||||||
|
Must prove:
|
||||||
|
|
||||||
|
CSV fixture → storaged → ingestd → catalogd → queryd
|
||||||
|
text fixture → embedd → vectord → search
|
||||||
|
Performance mode
|
||||||
|
|
||||||
|
Measures baseline only. Do not fake claims.
|
||||||
|
|
||||||
|
Record:
|
||||||
|
|
||||||
|
rows ingested/sec
|
||||||
|
vectors added/sec
|
||||||
|
p50/p95 query latency
|
||||||
|
p50/p95 vector search latency
|
||||||
|
memory usage if available
|
||||||
|
CPU usage if available
|
||||||
|
service restart time if available
|
||||||
|
Failure-Mode Tests
|
||||||
|
|
||||||
|
Add tests proving the system fails cleanly.
|
||||||
|
|
||||||
|
Required:
|
||||||
|
|
||||||
|
malformed JSON
|
||||||
|
missing required field
|
||||||
|
invalid vector dimension
|
||||||
|
missing object
|
||||||
|
bad SQL query
|
||||||
|
duplicate vector ID
|
||||||
|
downstream service unavailable if easy to simulate
|
||||||
|
restart before persistence load completes if relevant
|
||||||
|
|
||||||
|
Do not hide failures behind retries unless the system explicitly documents retry behavior.
|
||||||
|
|
||||||
|
Hard Rules
|
||||||
|
Do not add production features unless needed to expose testable behavior.
|
||||||
|
Do not change public route contracts without documenting it.
|
||||||
|
Do not write tests that merely check “HTTP 200” unless the claim is health-only.
|
||||||
|
Do not use random data unless seeded and recorded.
|
||||||
|
Do not make performance claims without before/after metrics.
|
||||||
|
Do not assume Ollama is available; detect it and mark embedding tests skipped or degraded with explanation.
|
||||||
|
Do not let skipped tests appear as passed.
|
||||||
|
Do not silently ignore missing services.
|
||||||
|
Do not make the proof harness depend on external cloud services.
|
||||||
|
Final Deliverables
|
||||||
|
|
||||||
|
After implementation, produce:
|
||||||
|
|
||||||
|
tests/proof/README.md
|
||||||
|
tests/proof/claims.yaml
|
||||||
|
tests/proof/run_proof.sh
|
||||||
|
tests/proof/cases/*.sh
|
||||||
|
tests/proof/reports/<latest>/summary.md
|
||||||
|
tests/proof/reports/<latest>/summary.json
|
||||||
|
Final Report Must Answer
|
||||||
|
|
||||||
|
At the end, write a clear report:
|
||||||
|
|
||||||
|
Which claims are proven?
|
||||||
|
Which claims are partially proven?
|
||||||
|
Which claims failed?
|
||||||
|
Which claims were skipped and why?
|
||||||
|
What evidence supports each claim?
|
||||||
|
What bottlenecks were measured?
|
||||||
|
What contract drift was found?
|
||||||
|
What refactor risks remain?
|
||||||
|
What should be fixed first?
|
||||||
|
Execution Plan
|
||||||
|
|
||||||
|
First inspect the repo.
|
||||||
|
|
||||||
|
Then produce a short implementation plan.
|
||||||
|
|
||||||
|
Then build the proof harness.
|
||||||
|
|
||||||
|
Then run contract mode.
|
||||||
|
|
||||||
|
Then run integration mode if services can be started.
|
||||||
|
|
||||||
|
Then run performance mode only if contract and integration pass.
|
||||||
|
|
||||||
|
Do not declare success without evidence files.
|
||||||
8
justfile
8
justfile
@ -75,6 +75,14 @@ smoke-all:
|
|||||||
doctor *args:
|
doctor *args:
|
||||||
@bash scripts/doctor.sh {{args}}
|
@bash scripts/doctor.sh {{args}}
|
||||||
|
|
||||||
|
# Proof harness — claims-verification tier above the smoke chain.
|
||||||
|
# See tests/proof/README.md and docs/TEST_PROOF_SCOPE.md.
|
||||||
|
# just proof contract fast: APIs + status codes + dim/nonempty
|
||||||
|
# just proof integration full: CSV→Parquet→SQL, text→vector→search
|
||||||
|
# just proof performance measurements; runs only after contract+integration
|
||||||
|
proof mode *flags:
|
||||||
|
@bash tests/proof/run_proof.sh --mode {{mode}} {{flags}}
|
||||||
|
|
||||||
# Install pre-push hook so `git push` runs `just verify` first.
|
# Install pre-push hook so `git push` runs `just verify` first.
|
||||||
install-hooks:
|
install-hooks:
|
||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
|
|||||||
91
tests/proof/README.md
Normal file
91
tests/proof/README.md
Normal file
@ -0,0 +1,91 @@
|
|||||||
|
# tests/proof — claims-verification harness
|
||||||
|
|
||||||
|
Per `docs/TEST_PROOF_SCOPE.md`. The 9 smokes prove that the system *runs*; this harness proves that the system *makes the claims it claims to make*.
|
||||||
|
|
||||||
|
## Why this exists
|
||||||
|
|
||||||
|
Smokes verify that services boot, talk, and pass deterministic round-trips.
|
||||||
|
They do not verify:
|
||||||
|
|
||||||
|
- contract drift (a route silently changes its response shape)
|
||||||
|
- semantic correctness (the SQL query says what we claim it says)
|
||||||
|
- failure-mode discipline (a malformed request returns 4xx, not silent 200)
|
||||||
|
- performance regressions (vectors/sec drops 30% on a refactor)
|
||||||
|
|
||||||
|
The proof harness produces evidence, not pass/fail. Each case writes
|
||||||
|
input/output hashes, latencies, status codes, log paths, git SHA → a
|
||||||
|
future auditor can re-run + diff.
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
tests/proof/
|
||||||
|
README.md ← you are here
|
||||||
|
claims.yaml ← enumeration of every claim, with id + type + routes
|
||||||
|
run_proof.sh ← orchestrator (--mode contract|integration|performance)
|
||||||
|
lib/
|
||||||
|
env.sh ← service URLs, report dir, mode, git context
|
||||||
|
http.sh ← curl wrappers (latency + status + body capture)
|
||||||
|
assert.sh ← structured assertions writing JSONL evidence
|
||||||
|
metrics.sh ← rss/cpu/timing capture for performance mode
|
||||||
|
cases/
|
||||||
|
00_health.sh
|
||||||
|
01_storage_roundtrip.sh
|
||||||
|
…
|
||||||
|
10_perf_baseline.sh
|
||||||
|
fixtures/
|
||||||
|
csv/workers.csv ← canonical 5-row fixture (sha-pinned)
|
||||||
|
text/docs.txt ← 4 deterministic vector docs
|
||||||
|
expected/queries.json ← expected results for the 5 SQL assertions
|
||||||
|
expected/rankings.json ← stored top-K rankings for vector search
|
||||||
|
reports/
|
||||||
|
proof-YYYYMMDD-HHMMSSZ/ ← per-run; gitignored
|
||||||
|
summary.md
|
||||||
|
summary.json
|
||||||
|
raw/
|
||||||
|
context.json ← git_sha, hostname, timestamp, mode
|
||||||
|
cases/<id>.jsonl ← one JSONL line per assertion
|
||||||
|
http/<id>/*.{json,body,headers}
|
||||||
|
logs/<svc>.log ← captured stdout+stderr from booted services
|
||||||
|
metrics/<id>.jsonl
|
||||||
|
```
|
||||||
|
|
||||||
|
## Modes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just proof contract # APIs, schemas, status codes; no big data; ~30s
|
||||||
|
just proof integration # full chain CSV→storaged→…→queryd, text→embedd→vectord
|
||||||
|
just proof performance # measurements only; runs after contract+integration
|
||||||
|
```
|
||||||
|
|
||||||
|
The `just` recipes wrap `tests/proof/run_proof.sh` with `--mode <X>`. Use the script directly for advanced flags (`--no-bootstrap`, `--regenerate-rankings`, `--regenerate-baseline`).
|
||||||
|
|
||||||
|
## Hard rules (from TEST_PROOF_SCOPE.md)
|
||||||
|
|
||||||
|
- Don't claim performance without before/after metrics
|
||||||
|
- Detect Ollama unavailability; mark embedding tests skipped or degraded with explanation
|
||||||
|
- Skipped tests do not appear as passed
|
||||||
|
- No silent ignore of missing services
|
||||||
|
- No external cloud dependencies
|
||||||
|
- No "HTTP 200" assertions unless the claim is health-only
|
||||||
|
- No random data without a seed
|
||||||
|
|
||||||
|
## How to read a report
|
||||||
|
|
||||||
|
After `just proof integration`:
|
||||||
|
|
||||||
|
1. Open `tests/proof/reports/proof-<ts>/summary.md` for the human view.
|
||||||
|
2. `summary.json` is the machine-readable counterpart.
|
||||||
|
3. To investigate a single failed assertion:
|
||||||
|
- find its `case_id` in `summary.md`
|
||||||
|
- read `raw/cases/<case_id>.jsonl` (each line is one assertion)
|
||||||
|
- cross-reference `raw/http/<case_id>/<probe>.{json,body,headers}` for the underlying HTTP round-trip
|
||||||
|
|
||||||
|
Every record cites the git SHA at run time; a clean re-run of the same SHA against the same fixtures must produce identical evidence (modulo timestamps + non-deterministic embedding noise).
|
||||||
|
|
||||||
|
## Reading order for new contributors
|
||||||
|
|
||||||
|
1. `docs/TEST_PROOF_SCOPE.md` — the spec this harness implements.
|
||||||
|
2. `docs/CLAUDE_REFACTOR_GUARDRAILS.md` — process discipline this harness must obey when extended.
|
||||||
|
3. `tests/proof/claims.yaml` — what's claimed.
|
||||||
|
4. `tests/proof/cases/00_health.sh` — canonical case shape; copy-paste to add new cases.
|
||||||
51
tests/proof/cases/00_health.sh
Executable file
51
tests/proof/cases/00_health.sh
Executable file
@ -0,0 +1,51 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# 00_health.sh — GOLAKE-001 + GOLAKE-002.
|
||||||
|
# Verifies that gateway and each backing service answer GET /health
|
||||||
|
# with 200 and a body that includes the service name. Canonical case
|
||||||
|
# shape — copy this file when adding new cases.
|
||||||
|
|
||||||
|
set -uo pipefail
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=../lib/env.sh
|
||||||
|
source "${SCRIPT_DIR}/../lib/env.sh"
|
||||||
|
# shellcheck source=../lib/http.sh
|
||||||
|
source "${SCRIPT_DIR}/../lib/http.sh"
|
||||||
|
# shellcheck source=../lib/assert.sh
|
||||||
|
source "${SCRIPT_DIR}/../lib/assert.sh"
|
||||||
|
|
||||||
|
CASE_ID="GOLAKE-001-002"
|
||||||
|
CASE_NAME="health endpoints respond"
|
||||||
|
CASE_TYPE="contract"
|
||||||
|
|
||||||
|
# Allow run_proof.sh to read metadata without executing.
|
||||||
|
if [ "${1:-}" = "--metadata-only" ]; then return 0 2>/dev/null || exit 0; fi
|
||||||
|
|
||||||
|
# Each row: <name> <port>. Service name in /health body must match.
|
||||||
|
SERVICES=(
|
||||||
|
"gateway:3110"
|
||||||
|
"storaged:3211"
|
||||||
|
"catalogd:3212"
|
||||||
|
"ingestd:3213"
|
||||||
|
"queryd:3214"
|
||||||
|
"vectord:3215"
|
||||||
|
"embedd:3216"
|
||||||
|
)
|
||||||
|
|
||||||
|
for spec in "${SERVICES[@]}"; do
|
||||||
|
name="${spec%:*}"
|
||||||
|
port="${spec#*:}"
|
||||||
|
probe="${name}_health"
|
||||||
|
|
||||||
|
# Probe — captures status, body, latency to raw/http/<case>/<probe>.json
|
||||||
|
proof_get "$CASE_ID" "$probe" "http://127.0.0.1:${port}/health" >/dev/null
|
||||||
|
|
||||||
|
status=$(proof_status_of "$CASE_ID" "$probe")
|
||||||
|
body=$(proof_body_of "$CASE_ID" "$probe")
|
||||||
|
latency=$(proof_latency_of "$CASE_ID" "$probe")
|
||||||
|
|
||||||
|
proof_assert_eq "$CASE_ID" "${name} /health → 200" "200" "$status"
|
||||||
|
proof_assert_contains "$CASE_ID" "${name} body identifies service" "$name" "$body"
|
||||||
|
# Latency budget — generous so we don't get spurious failures from
|
||||||
|
# cold-start or system jitter; tighten if a real budget emerges.
|
||||||
|
proof_assert_lt "$CASE_ID" "${name} health latency < 500ms" "$latency" "500"
|
||||||
|
done
|
||||||
214
tests/proof/claims.yaml
Normal file
214
tests/proof/claims.yaml
Normal file
@ -0,0 +1,214 @@
|
|||||||
|
# claims.yaml — what the Go lakehouse claims, enumerated.
|
||||||
|
#
|
||||||
|
# Each claim has an id, name, type, the services + routes it touches,
|
||||||
|
# the evidence shape, and whether failure is fatal (required: true) or
|
||||||
|
# advisory (required: false).
|
||||||
|
#
|
||||||
|
# Source of truth for what cases/*.sh actually verify is the case
|
||||||
|
# scripts themselves; this file is the human-readable enumeration that
|
||||||
|
# the spec mandates as a deliverable. run_proof.sh validates that every
|
||||||
|
# claim here has a matching case with the same CASE_ID at startup.
|
||||||
|
#
|
||||||
|
# Modes:
|
||||||
|
# contract — fast; APIs + schemas + status codes; no big data
|
||||||
|
# integration — full chain CSV→storaged→catalogd→ingestd→queryd, text→embedd→vectord
|
||||||
|
# performance — measurements only; runs after contract+integration green
|
||||||
|
|
||||||
|
claims:
|
||||||
|
- id: GOLAKE-001
|
||||||
|
name: Gateway health route responds
|
||||||
|
type: contract
|
||||||
|
services: [gateway]
|
||||||
|
routes: [GET /health]
|
||||||
|
evidence: [status_code, response_body, latency_ms]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-002
|
||||||
|
name: Each backing service health route responds
|
||||||
|
type: contract
|
||||||
|
services: [storaged, catalogd, ingestd, queryd, vectord, embedd]
|
||||||
|
routes: [GET /health]
|
||||||
|
evidence: [status_code, response_body, latency_ms, service_field_match]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-003
|
||||||
|
name: Gateway proxies /v1/* to the right upstream and preserves status codes
|
||||||
|
type: contract
|
||||||
|
services: [gateway]
|
||||||
|
routes: [GET /v1/storage/list, GET /v1/catalog/list, POST /v1/sql with empty]
|
||||||
|
evidence: [upstream_match, status_passthrough, latency_ms]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-010
|
||||||
|
name: Storage put/get round-trip preserves bytes
|
||||||
|
type: integration
|
||||||
|
services: [storaged]
|
||||||
|
routes: [PUT /storage/put/*, GET /storage/get/*]
|
||||||
|
evidence: [input_sha256, output_sha256, status_code, latency_ms]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-011
|
||||||
|
name: Storage list returns the just-put key
|
||||||
|
type: integration
|
||||||
|
services: [storaged]
|
||||||
|
routes: [PUT /storage/put/*, GET /storage/list]
|
||||||
|
evidence: [list_contains_key, latency_ms]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-012
|
||||||
|
name: Storage delete removes the key (subsequent GET 404)
|
||||||
|
type: integration
|
||||||
|
services: [storaged]
|
||||||
|
routes: [DELETE /storage/delete/*, GET /storage/get/*]
|
||||||
|
evidence: [delete_status, get_after_status]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-020
|
||||||
|
name: Catalog register is idempotent on identical fingerprint
|
||||||
|
type: integration
|
||||||
|
services: [catalogd]
|
||||||
|
routes: [POST /catalog/register]
|
||||||
|
evidence: [first_existing_false, second_existing_true, dataset_id_stable]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-021
|
||||||
|
name: Catalog manifest read matches what was registered
|
||||||
|
type: integration
|
||||||
|
services: [catalogd]
|
||||||
|
routes: [POST /catalog/register, GET /catalog/manifest/*]
|
||||||
|
evidence: [manifest_equality, schema_fingerprint_match]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-022
|
||||||
|
name: Catalog list contains the registered dataset
|
||||||
|
type: integration
|
||||||
|
services: [catalogd]
|
||||||
|
routes: [GET /catalog/list]
|
||||||
|
evidence: [list_contains_dataset_id]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-030
|
||||||
|
name: Ingest CSV → Parquet writes a parquet object that catalogd manifests
|
||||||
|
type: integration
|
||||||
|
services: [ingestd, storaged, catalogd]
|
||||||
|
routes: [POST /ingest, GET /storage/list, GET /catalog/manifest/*]
|
||||||
|
evidence: [parquet_object_exists, manifest_row_count, content_addressed_key]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-040
|
||||||
|
name: Query correctness — 5 SQL assertions against the workers CSV fixture
|
||||||
|
type: integration
|
||||||
|
services: [queryd]
|
||||||
|
routes: [POST /sql]
|
||||||
|
evidence: [Q1_count_5, Q2_chicago_2, Q3_max_95, Q4_safety_barbara, Q5_houston_avg_89_5]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-050
|
||||||
|
name: Embedding contract — request returns dim=768, non-empty vector
|
||||||
|
type: contract
|
||||||
|
services: [embedd]
|
||||||
|
routes: [POST /embed]
|
||||||
|
evidence: [dimension, vector_nonempty, model_echoed]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-051
|
||||||
|
name: Embedding integration — top-K ranking matches stored fixture
|
||||||
|
type: integration
|
||||||
|
services: [embedd, vectord]
|
||||||
|
routes: [POST /embed, POST /vectors/index/<n>/add, POST /vectors/index/<n>/search]
|
||||||
|
evidence: [top_k_id_set, top_1_id_match]
|
||||||
|
required: true
|
||||||
|
notes: |
|
||||||
|
Ollama embeddings are stable-but-not-bit-identical across runs.
|
||||||
|
Ranking-by-cosine is deterministic at our scale; this case asserts
|
||||||
|
the top-K ID set matches expected/rankings.json. Regenerable via
|
||||||
|
run_proof.sh --regenerate-rankings.
|
||||||
|
|
||||||
|
- id: GOLAKE-060
|
||||||
|
name: Vector add + lookup-by-ID round-trip
|
||||||
|
type: contract
|
||||||
|
services: [vectord]
|
||||||
|
routes: [POST /vectors/index, POST /vectors/index/<n>/add, GET /vectors/index/<n>]
|
||||||
|
evidence: [add_status, lookup_returns_inserted_ids]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-061
|
||||||
|
name: Vector search nearest-neighbor — inserted vector ranks #1 against itself
|
||||||
|
type: contract
|
||||||
|
services: [vectord]
|
||||||
|
routes: [POST /vectors/index/<n>/add, POST /vectors/index/<n>/search]
|
||||||
|
evidence: [top_1_id, top_1_distance]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-070
|
||||||
|
name: Vector persistence — kill+restart preserves index state
|
||||||
|
type: integration
|
||||||
|
services: [vectord, storaged]
|
||||||
|
routes: [POST /vectors/index/<n>/add, POST /vectors/index/<n>/search]
|
||||||
|
evidence: [pre_restart_search, post_restart_search_dist_zero]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-080
|
||||||
|
name: Failure mode — malformed JSON returns 4xx, never 5xx, never silent 200
|
||||||
|
type: contract
|
||||||
|
services: [storaged, catalogd, ingestd, queryd, vectord, embedd]
|
||||||
|
routes: [POST endpoints with invalid body]
|
||||||
|
evidence: [per_service_status_codes, error_body_shape]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-081
|
||||||
|
name: Failure mode — missing required field rejected with structured 400
|
||||||
|
type: contract
|
||||||
|
services: [catalogd, vectord, embedd]
|
||||||
|
routes: [POST endpoints with valid JSON but missing fields]
|
||||||
|
evidence: [per_service_status_codes]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-082
|
||||||
|
name: Failure mode — bad SQL returns 4xx, error message present
|
||||||
|
type: contract
|
||||||
|
services: [queryd]
|
||||||
|
routes: [POST /sql with syntax error]
|
||||||
|
evidence: [status_code, error_body_present]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-083
|
||||||
|
name: Failure mode — vector dim mismatch returns 4xx
|
||||||
|
type: contract
|
||||||
|
services: [vectord]
|
||||||
|
routes: [POST /vectors/index/<n>/add with wrong dim]
|
||||||
|
evidence: [status_code]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-084
|
||||||
|
name: Failure mode — missing storage object returns 404
|
||||||
|
type: contract
|
||||||
|
services: [storaged]
|
||||||
|
routes: [GET /storage/get/<unseen-key>]
|
||||||
|
evidence: [status_code]
|
||||||
|
required: true
|
||||||
|
|
||||||
|
- id: GOLAKE-085
|
||||||
|
name: Failure mode — duplicate vector ID — record actual behavior (informational)
|
||||||
|
type: contract
|
||||||
|
services: [vectord]
|
||||||
|
routes: [POST /vectors/index/<n>/add with same id twice]
|
||||||
|
evidence: [first_status, second_status, search_returns_count]
|
||||||
|
required: false
|
||||||
|
notes: |
|
||||||
|
Spec asks us to verify duplicate-ID handling. Current behavior is
|
||||||
|
not yet documented; this case records what happens so we can
|
||||||
|
decide the contract. Required:false → does not fail the gate.
|
||||||
|
|
||||||
|
- id: GOLAKE-100
|
||||||
|
name: Performance baseline — rows/sec ingest, vectors/sec add, query latency
|
||||||
|
type: performance
|
||||||
|
services: [ingestd, vectord, queryd, embedd]
|
||||||
|
routes: [POST /ingest, POST /vectors/index/<n>/add, POST /sql, POST /embed]
|
||||||
|
evidence: [rows_per_sec, vectors_per_sec, query_p50_ms, query_p95_ms,
|
||||||
|
vector_search_p50_ms, vector_search_p95_ms, rss_peak_mb, cpu_peak_pct]
|
||||||
|
required: false
|
||||||
|
notes: |
|
||||||
|
First run writes tests/proof/baseline.json. Subsequent runs diff
|
||||||
|
against it; a regression ≥10% on any metric warns but does not
|
||||||
|
fail the gate. Use --regenerate-baseline to overwrite.
|
||||||
6
tests/proof/fixtures/csv/workers.csv
Normal file
6
tests/proof/fixtures/csv/workers.csv
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
id,name,role,city,score
|
||||||
|
1,Ada,welder,Chicago,91
|
||||||
|
2,Grace,electrician,Detroit,88
|
||||||
|
3,Linus,operator,Chicago,77
|
||||||
|
4,Ken,pipefitter,Houston,84
|
||||||
|
5,Barbara,safety,Houston,95
|
||||||
|
36
tests/proof/fixtures/expected/queries.json
Normal file
36
tests/proof/fixtures/expected/queries.json
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
{
|
||||||
|
"_comment": "Expected results for the 5 SQL assertions in 04_query_correctness against fixtures/csv/workers.csv. The CSV is content-addressed; if its hash changes, this file must be re-derived.",
|
||||||
|
"fixture_sha256": "computed at runtime by 03_ingest_csv_to_parquet — see actual.fixture_sha in evidence",
|
||||||
|
"queries": [
|
||||||
|
{
|
||||||
|
"id": "Q1",
|
||||||
|
"claim": "row count = 5",
|
||||||
|
"sql": "SELECT count(*) AS n FROM workers",
|
||||||
|
"expected": {"n": 5}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "Q2",
|
||||||
|
"claim": "Chicago row count = 2",
|
||||||
|
"sql": "SELECT count(*) AS n FROM workers WHERE city = 'Chicago'",
|
||||||
|
"expected": {"n": 2}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "Q3",
|
||||||
|
"claim": "max score = 95",
|
||||||
|
"sql": "SELECT max(score) AS m FROM workers",
|
||||||
|
"expected": {"m": 95}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "Q4",
|
||||||
|
"claim": "role = safety belongs to Barbara",
|
||||||
|
"sql": "SELECT name FROM workers WHERE role = 'safety'",
|
||||||
|
"expected": {"name": "Barbara"}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "Q5",
|
||||||
|
"claim": "Houston average score = 89.5",
|
||||||
|
"sql": "SELECT avg(score) AS avg FROM workers WHERE city = 'Houston'",
|
||||||
|
"expected": {"avg": 89.5}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
4
tests/proof/fixtures/text/docs.txt
Normal file
4
tests/proof/fixtures/text/docs.txt
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
doc-001 industrial staffing for welders in Chicago
|
||||||
|
doc-002 safety compliance for warehouse crews
|
||||||
|
doc-003 electrical contractors assigned to Detroit
|
||||||
|
doc-004 pipefitters and heavy equipment operators in Houston
|
||||||
118
tests/proof/lib/assert.sh
Normal file
118
tests/proof/lib/assert.sh
Normal file
@ -0,0 +1,118 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# lib/assert.sh — assertions that record evidence per the spec.
|
||||||
|
#
|
||||||
|
# Each assertion appends one record to:
|
||||||
|
# $PROOF_REPORT_DIR/raw/cases/<case_id>.jsonl
|
||||||
|
#
|
||||||
|
# Each line is a self-describing JSON object — case ID, claim, expected,
|
||||||
|
# actual, result {pass|fail|skip}, optional evidence pointers. Cases
|
||||||
|
# call multiple assertions; run_proof.sh aggregates JSONL → summary.
|
||||||
|
#
|
||||||
|
# Functions:
|
||||||
|
# proof_assert_eq <case_id> <claim> <expected> <actual>
|
||||||
|
# proof_assert_ne <case_id> <claim> <not_expected> <actual>
|
||||||
|
# proof_assert_contains <case_id> <claim> <substring> <haystack>
|
||||||
|
# proof_assert_lt <case_id> <claim> <a> <b> # passes if a < b
|
||||||
|
# proof_assert_gt <case_id> <claim> <a> <b> # passes if a > b
|
||||||
|
# proof_assert_status <case_id> <claim> <expected_status> <probe_name>
|
||||||
|
# proof_assert_json_eq <case_id> <claim> <jq_path> <expected> <body_or_file>
|
||||||
|
# proof_skip <case_id> <claim> <reason>
|
||||||
|
#
|
||||||
|
# All return 0 (case scripts decide their own halt-on-fail policy).
|
||||||
|
|
||||||
|
_proof_record() {
|
||||||
|
local case_id="$1" claim="$2" result="$3" expected="$4" actual="$5" detail="$6"
|
||||||
|
local out="${PROOF_REPORT_DIR}/raw/cases/${case_id}.jsonl"
|
||||||
|
mkdir -p "$(dirname "$out")"
|
||||||
|
# JSON-escape the variable inputs.
|
||||||
|
local e_claim e_expected e_actual e_detail
|
||||||
|
e_claim=$(printf '%s' "$claim" | jq -Rs .)
|
||||||
|
e_expected=$(printf '%s' "$expected" | jq -Rs .)
|
||||||
|
e_actual=$(printf '%s' "$actual" | jq -Rs .)
|
||||||
|
e_detail=$(printf '%s' "$detail" | jq -Rs .)
|
||||||
|
local ts
|
||||||
|
ts=$(date -u -Iseconds)
|
||||||
|
cat >> "$out" <<JSON
|
||||||
|
{"case_id":"${case_id}","claim":${e_claim},"result":"${result}","expected":${e_expected},"actual":${e_actual},"detail":${e_detail},"timestamp":"${ts}","git_sha":"${PROOF_GIT_SHA}"}
|
||||||
|
JSON
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_assert_eq() {
|
||||||
|
local case_id="$1" claim="$2" expected="$3" actual="$4"
|
||||||
|
if [ "$expected" = "$actual" ]; then
|
||||||
|
_proof_record "$case_id" "$claim" pass "$expected" "$actual" ""
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
_proof_record "$case_id" "$claim" fail "$expected" "$actual" "values differ"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_assert_ne() {
|
||||||
|
local case_id="$1" claim="$2" not_expected="$3" actual="$4"
|
||||||
|
if [ "$not_expected" != "$actual" ]; then
|
||||||
|
_proof_record "$case_id" "$claim" pass "!= ${not_expected}" "$actual" ""
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
_proof_record "$case_id" "$claim" fail "!= ${not_expected}" "$actual" "values matched (should differ)"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_assert_contains() {
|
||||||
|
local case_id="$1" claim="$2" substring="$3" haystack="$4"
|
||||||
|
if [[ "$haystack" == *"$substring"* ]]; then
|
||||||
|
_proof_record "$case_id" "$claim" pass "contains: ${substring}" "$haystack" ""
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
_proof_record "$case_id" "$claim" fail "contains: ${substring}" "$haystack" "substring not found"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_assert_lt() {
|
||||||
|
local case_id="$1" claim="$2" a="$3" b="$4"
|
||||||
|
# awk handles ints + floats uniformly
|
||||||
|
if awk -v a="$a" -v b="$b" 'BEGIN{exit !(a < b)}'; then
|
||||||
|
_proof_record "$case_id" "$claim" pass "${a} < ${b}" "${a}" ""
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
_proof_record "$case_id" "$claim" fail "${a} < ${b}" "${a}" "${a} is not less than ${b}"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_assert_gt() {
|
||||||
|
local case_id="$1" claim="$2" a="$3" b="$4"
|
||||||
|
if awk -v a="$a" -v b="$b" 'BEGIN{exit !(a > b)}'; then
|
||||||
|
_proof_record "$case_id" "$claim" pass "${a} > ${b}" "${a}" ""
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
_proof_record "$case_id" "$claim" fail "${a} > ${b}" "${a}" "${a} is not greater than ${b}"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# proof_assert_status compares the status from a previously-recorded
|
||||||
|
# probe against an expected value. Probe must have run via lib/http.sh.
|
||||||
|
proof_assert_status() {
|
||||||
|
local case_id="$1" claim="$2" expected="$3" probe_name="$4"
|
||||||
|
local actual
|
||||||
|
actual=$(proof_status_of "$case_id" "$probe_name" 2>/dev/null || echo missing)
|
||||||
|
proof_assert_eq "$case_id" "$claim" "$expected" "$actual"
|
||||||
|
}
|
||||||
|
|
||||||
|
# proof_assert_json_eq: jq-based equality on response body or a file.
|
||||||
|
# body_or_file: if starts with @, read from file; otherwise treat as
|
||||||
|
# literal JSON string.
|
||||||
|
proof_assert_json_eq() {
|
||||||
|
local case_id="$1" claim="$2" jq_path="$3" expected="$4" source="$5"
|
||||||
|
local actual
|
||||||
|
if [[ "$source" == @* ]]; then
|
||||||
|
actual=$(jq -r "$jq_path" "${source#@}" 2>/dev/null || echo "<jq error>")
|
||||||
|
else
|
||||||
|
actual=$(printf '%s' "$source" | jq -r "$jq_path" 2>/dev/null || echo "<jq error>")
|
||||||
|
fi
|
||||||
|
proof_assert_eq "$case_id" "$claim" "$expected" "$actual"
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_skip() {
|
||||||
|
local case_id="$1" claim="$2" reason="$3"
|
||||||
|
_proof_record "$case_id" "$claim" skip "" "" "$reason"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
60
tests/proof/lib/env.sh
Normal file
60
tests/proof/lib/env.sh
Normal file
@ -0,0 +1,60 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# lib/env.sh — proof harness environment.
|
||||||
|
#
|
||||||
|
# Sourced once by run_proof.sh and by every case script. Establishes:
|
||||||
|
# - service URLs (gateway and direct ports)
|
||||||
|
# - report directory paths
|
||||||
|
# - run context (git SHA, hostname, timestamp)
|
||||||
|
# - mode (contract|integration|performance)
|
||||||
|
#
|
||||||
|
# Cases read from these vars; never re-set them.
|
||||||
|
|
||||||
|
# Repo root — every path the harness emits is anchored here so report
|
||||||
|
# JSON is portable across reviewer machines.
|
||||||
|
PROOF_REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../.." && pwd)"
|
||||||
|
export PROOF_REPO_ROOT
|
||||||
|
|
||||||
|
# Service endpoints. Match internal/shared/config.go defaults; every
|
||||||
|
# binary binds 127.0.0.1 per the audit's R-007 mitigation.
|
||||||
|
export PROOF_GATEWAY_URL="${PROOF_GATEWAY_URL:-http://127.0.0.1:3110}"
|
||||||
|
export PROOF_STORAGED_URL="${PROOF_STORAGED_URL:-http://127.0.0.1:3211}"
|
||||||
|
export PROOF_CATALOGD_URL="${PROOF_CATALOGD_URL:-http://127.0.0.1:3212}"
|
||||||
|
export PROOF_INGESTD_URL="${PROOF_INGESTD_URL:-http://127.0.0.1:3213}"
|
||||||
|
export PROOF_QUERYD_URL="${PROOF_QUERYD_URL:-http://127.0.0.1:3214}"
|
||||||
|
export PROOF_VECTORD_URL="${PROOF_VECTORD_URL:-http://127.0.0.1:3215}"
|
||||||
|
export PROOF_EMBEDD_URL="${PROOF_EMBEDD_URL:-http://127.0.0.1:3216}"
|
||||||
|
|
||||||
|
# Mode + report directory — set by run_proof.sh before sourcing cases.
|
||||||
|
# Defaulted here so cases sourced standalone for debug still work.
|
||||||
|
export PROOF_MODE="${PROOF_MODE:-contract}"
|
||||||
|
|
||||||
|
if [ -z "${PROOF_REPORT_DIR:-}" ]; then
|
||||||
|
ts="$(date -u +%Y%m%d-%H%M%SZ)"
|
||||||
|
export PROOF_REPORT_DIR="${PROOF_REPO_ROOT}/tests/proof/reports/proof-${ts}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
mkdir -p \
|
||||||
|
"${PROOF_REPORT_DIR}/raw/http" \
|
||||||
|
"${PROOF_REPORT_DIR}/raw/logs" \
|
||||||
|
"${PROOF_REPORT_DIR}/raw/outputs" \
|
||||||
|
"${PROOF_REPORT_DIR}/raw/metrics" \
|
||||||
|
"${PROOF_REPORT_DIR}/raw/cases"
|
||||||
|
|
||||||
|
# Run context — captured once per run by run_proof.sh, but recomputed
|
||||||
|
# here as fallback if a case is invoked standalone.
|
||||||
|
if [ ! -f "${PROOF_REPORT_DIR}/raw/context.json" ]; then
|
||||||
|
git_sha="$(cd "$PROOF_REPO_ROOT" && git rev-parse HEAD 2>/dev/null || echo unknown)"
|
||||||
|
git_dirty="$(cd "$PROOF_REPO_ROOT" && [ -n "$(git status --porcelain 2>/dev/null)" ] && echo true || echo false)"
|
||||||
|
cat > "${PROOF_REPORT_DIR}/raw/context.json" <<JSON
|
||||||
|
{
|
||||||
|
"git_sha": "${git_sha}",
|
||||||
|
"git_dirty": ${git_dirty},
|
||||||
|
"hostname": "$(hostname)",
|
||||||
|
"timestamp_utc": "$(date -u -Iseconds)",
|
||||||
|
"mode": "${PROOF_MODE}",
|
||||||
|
"harness_version": "1"
|
||||||
|
}
|
||||||
|
JSON
|
||||||
|
fi
|
||||||
|
|
||||||
|
export PROOF_GIT_SHA="$(cd "$PROOF_REPO_ROOT" && git rev-parse HEAD 2>/dev/null || echo unknown)"
|
||||||
111
tests/proof/lib/http.sh
Normal file
111
tests/proof/lib/http.sh
Normal file
@ -0,0 +1,111 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# lib/http.sh — curl wrappers that capture latency, status, body.
|
||||||
|
#
|
||||||
|
# Each request emits a JSON file under raw/http/<case_id>/<probe>.json
|
||||||
|
# describing the round-trip. Cases consume the JSON via assert.sh.
|
||||||
|
#
|
||||||
|
# Why JSON files instead of bash variables: gives the final report a
|
||||||
|
# diffable, replayable record. Future runs can compare on disk without
|
||||||
|
# re-executing the case.
|
||||||
|
#
|
||||||
|
# Functions:
|
||||||
|
# proof_get <case_id> <probe_name> <url> [extra-curl-args...]
|
||||||
|
# proof_post <case_id> <probe_name> <url> <content-type> <body> [extra-curl-args...]
|
||||||
|
# proof_put <case_id> <probe_name> <url> <content-type> <body|@file> [extra-curl-args...]
|
||||||
|
# proof_delete<case_id> <probe_name> <url> [extra-curl-args...]
|
||||||
|
#
|
||||||
|
# Returns 0 always (capture is independent of HTTP outcome).
|
||||||
|
# Stores result at: $PROOF_REPORT_DIR/raw/http/<case_id>/<probe>.json
|
||||||
|
# Stores body at: $PROOF_REPORT_DIR/raw/http/<case_id>/<probe>.body
|
||||||
|
|
||||||
|
_proof_http_emit() {
|
||||||
|
local case_id="$1" probe="$2" method="$3" url="$4" status="$5" latency_ms="$6" body_path="$7" headers_path="$8"
|
||||||
|
local dir="${PROOF_REPORT_DIR}/raw/http/${case_id}"
|
||||||
|
mkdir -p "$dir"
|
||||||
|
local body_sha=""
|
||||||
|
[ -s "$body_path" ] && body_sha="$(sha256sum "$body_path" | awk '{print $1}')"
|
||||||
|
cat > "${dir}/${probe}.json" <<JSON
|
||||||
|
{
|
||||||
|
"case_id": "${case_id}",
|
||||||
|
"probe": "${probe}",
|
||||||
|
"method": "${method}",
|
||||||
|
"url": "${url}",
|
||||||
|
"status": ${status},
|
||||||
|
"latency_ms": ${latency_ms},
|
||||||
|
"body_path": "raw/http/${case_id}/${probe}.body",
|
||||||
|
"body_sha256": "${body_sha}",
|
||||||
|
"headers_path": "raw/http/${case_id}/${probe}.headers"
|
||||||
|
}
|
||||||
|
JSON
|
||||||
|
}
|
||||||
|
|
||||||
|
# Internal common runner — populates a temp body+headers file, times
|
||||||
|
# the request, emits the per-probe JSON, prints the body to stdout for
|
||||||
|
# convenience (cases can capture or discard).
|
||||||
|
_proof_http_run() {
|
||||||
|
local case_id="$1" probe="$2" method="$3" url="$4"; shift 4
|
||||||
|
local dir="${PROOF_REPORT_DIR}/raw/http/${case_id}"
|
||||||
|
mkdir -p "$dir"
|
||||||
|
local body_path="${dir}/${probe}.body"
|
||||||
|
local headers_path="${dir}/${probe}.headers"
|
||||||
|
local start_ms end_ms
|
||||||
|
start_ms=$(date +%s%3N)
|
||||||
|
local status
|
||||||
|
status=$(curl -sS -X "$method" -o "$body_path" -D "$headers_path" -w "%{http_code}" "$@" "$url" 2>/dev/null || echo 0)
|
||||||
|
end_ms=$(date +%s%3N)
|
||||||
|
local latency_ms=$((end_ms - start_ms))
|
||||||
|
_proof_http_emit "$case_id" "$probe" "$method" "$url" "$status" "$latency_ms" "$body_path" "$headers_path"
|
||||||
|
cat "$body_path"
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_get() {
|
||||||
|
local case_id="$1" probe="$2" url="$3"; shift 3
|
||||||
|
_proof_http_run "$case_id" "$probe" GET "$url" "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_post() {
|
||||||
|
local case_id="$1" probe="$2" url="$3" content_type="$4" body="$5"; shift 5
|
||||||
|
_proof_http_run "$case_id" "$probe" POST "$url" \
|
||||||
|
-H "Content-Type: ${content_type}" \
|
||||||
|
--data "$body" \
|
||||||
|
"$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
# proof_put accepts either an inline body or @-prefixed file path
|
||||||
|
# (curl --upload-file semantics for streaming).
|
||||||
|
proof_put() {
|
||||||
|
local case_id="$1" probe="$2" url="$3" content_type="$4" body="$5"; shift 5
|
||||||
|
if [[ "$body" == @* ]]; then
|
||||||
|
local file="${body#@}"
|
||||||
|
_proof_http_run "$case_id" "$probe" PUT "$url" \
|
||||||
|
-H "Content-Type: ${content_type}" \
|
||||||
|
--upload-file "$file" \
|
||||||
|
"$@"
|
||||||
|
else
|
||||||
|
_proof_http_run "$case_id" "$probe" PUT "$url" \
|
||||||
|
-H "Content-Type: ${content_type}" \
|
||||||
|
--data "$body" \
|
||||||
|
"$@"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_delete() {
|
||||||
|
local case_id="$1" probe="$2" url="$3"; shift 3
|
||||||
|
_proof_http_run "$case_id" "$probe" DELETE "$url" "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Helper accessors — reads the per-probe JSON.
|
||||||
|
proof_status_of() {
|
||||||
|
local case_id="$1" probe="$2"
|
||||||
|
jq -r '.status' "${PROOF_REPORT_DIR}/raw/http/${case_id}/${probe}.json"
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_body_of() {
|
||||||
|
local case_id="$1" probe="$2"
|
||||||
|
cat "${PROOF_REPORT_DIR}/raw/http/${case_id}/${probe}.body"
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_latency_of() {
|
||||||
|
local case_id="$1" probe="$2"
|
||||||
|
jq -r '.latency_ms' "${PROOF_REPORT_DIR}/raw/http/${case_id}/${probe}.json"
|
||||||
|
}
|
||||||
82
tests/proof/lib/metrics.sh
Normal file
82
tests/proof/lib/metrics.sh
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# lib/metrics.sh — performance measurements for performance mode.
|
||||||
|
#
|
||||||
|
# Functions:
|
||||||
|
# proof_metric_start <case_id> <metric_name>
|
||||||
|
# proof_metric_stop <case_id> <metric_name> # writes elapsed_ms
|
||||||
|
# proof_metric_value <case_id> <metric_name> <value> [unit]
|
||||||
|
# proof_sample_rss <case_id> <process_pattern> # MB
|
||||||
|
# proof_compute_percentile <values_file> <percentile> # 50, 95, 99
|
||||||
|
#
|
||||||
|
# All metrics emit to:
|
||||||
|
# $PROOF_REPORT_DIR/raw/metrics/<case_id>.jsonl
|
||||||
|
|
||||||
|
_proof_metric_emit() {
|
||||||
|
local case_id="$1" name="$2" value="$3" unit="$4" detail="$5"
|
||||||
|
local out="${PROOF_REPORT_DIR}/raw/metrics/${case_id}.jsonl"
|
||||||
|
mkdir -p "$(dirname "$out")"
|
||||||
|
local e_detail
|
||||||
|
e_detail=$(printf '%s' "$detail" | jq -Rs .)
|
||||||
|
cat >> "$out" <<JSON
|
||||||
|
{"case_id":"${case_id}","metric":"${name}","value":${value},"unit":"${unit}","detail":${e_detail},"timestamp":"$(date -u -Iseconds)"}
|
||||||
|
JSON
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_metric_start() {
|
||||||
|
local case_id="$1" name="$2"
|
||||||
|
local f="${PROOF_REPORT_DIR}/raw/metrics/_timer_${case_id}_${name}"
|
||||||
|
date +%s%3N > "$f"
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_metric_stop() {
|
||||||
|
local case_id="$1" name="$2"
|
||||||
|
local f="${PROOF_REPORT_DIR}/raw/metrics/_timer_${case_id}_${name}"
|
||||||
|
if [ ! -f "$f" ]; then
|
||||||
|
echo "[metrics] timer ${name} for case ${case_id} not started" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
local start_ms end_ms elapsed_ms
|
||||||
|
start_ms=$(cat "$f")
|
||||||
|
end_ms=$(date +%s%3N)
|
||||||
|
elapsed_ms=$((end_ms - start_ms))
|
||||||
|
rm -f "$f"
|
||||||
|
_proof_metric_emit "$case_id" "${name}_ms" "$elapsed_ms" "ms" ""
|
||||||
|
echo "$elapsed_ms"
|
||||||
|
}
|
||||||
|
|
||||||
|
proof_metric_value() {
|
||||||
|
local case_id="$1" name="$2" value="$3" unit="${4:-}"
|
||||||
|
_proof_metric_emit "$case_id" "$name" "$value" "$unit" ""
|
||||||
|
}
|
||||||
|
|
||||||
|
# Sample resident-set-size (MB) for the first matching process.
|
||||||
|
proof_sample_rss() {
|
||||||
|
local case_id="$1" pattern="$2"
|
||||||
|
local pid rss_kb rss_mb
|
||||||
|
pid=$(pgrep -f "$pattern" | head -1)
|
||||||
|
if [ -z "$pid" ]; then
|
||||||
|
_proof_metric_emit "$case_id" "rss_${pattern//\//_}_mb" 0 "MB" "process not found"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
rss_kb=$(awk '/^VmRSS:/ {print $2}' "/proc/$pid/status" 2>/dev/null || echo 0)
|
||||||
|
rss_mb=$(awk -v k="$rss_kb" 'BEGIN{printf "%.1f", k/1024}')
|
||||||
|
_proof_metric_emit "$case_id" "rss_${pattern//\//_}_mb" "$rss_mb" "MB" "pid=${pid}"
|
||||||
|
echo "$rss_mb"
|
||||||
|
}
|
||||||
|
|
||||||
|
# proof_compute_percentile: streams values from a file (one number per
|
||||||
|
# line), prints the requested percentile.
|
||||||
|
proof_compute_percentile() {
|
||||||
|
local file="$1" pct="$2"
|
||||||
|
sort -n "$file" | awk -v pct="$pct" '
|
||||||
|
{ v[NR] = $1 }
|
||||||
|
END {
|
||||||
|
n = NR
|
||||||
|
if (n == 0) { print "0"; exit }
|
||||||
|
idx = int((pct/100.0) * n)
|
||||||
|
if (idx < 1) idx = 1
|
||||||
|
if (idx > n) idx = n
|
||||||
|
print v[idx]
|
||||||
|
}
|
||||||
|
'
|
||||||
|
}
|
||||||
0
tests/proof/reports/.gitkeep
Normal file
0
tests/proof/reports/.gitkeep
Normal file
257
tests/proof/run_proof.sh
Executable file
257
tests/proof/run_proof.sh
Executable file
@ -0,0 +1,257 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# run_proof.sh — orchestrator for the proof harness.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# tests/proof/run_proof.sh --mode contract
|
||||||
|
# tests/proof/run_proof.sh --mode integration
|
||||||
|
# tests/proof/run_proof.sh --mode performance
|
||||||
|
# tests/proof/run_proof.sh --mode integration --no-bootstrap # assume services up
|
||||||
|
# tests/proof/run_proof.sh --regenerate-rankings # rebuild expected/rankings.json
|
||||||
|
#
|
||||||
|
# Bootstraps services (storaged → catalogd → ingestd → queryd →
|
||||||
|
# vectord → embedd → gateway) once at the start unless --no-bootstrap.
|
||||||
|
# Iterates matching cases in numerical order. Aggregates per-case JSONL
|
||||||
|
# evidence into summary.md + summary.json under tests/proof/reports/proof-<ts>/.
|
||||||
|
#
|
||||||
|
# Designed per CLAUDE_REFACTOR_GUARDRAILS.md: bash + curl + jq only,
|
||||||
|
# no Go test framework, no DSL. Each case is a thin shell script that
|
||||||
|
# sources lib/*.sh and writes evidence; this harness orchestrates them.
|
||||||
|
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
# ── arg parsing ────────────────────────────────────────────────────────────
|
||||||
|
MODE="contract"
|
||||||
|
NO_BOOTSTRAP=0
|
||||||
|
REGENERATE_RANKINGS=0
|
||||||
|
REGENERATE_BASELINE=0
|
||||||
|
|
||||||
|
while [ $# -gt 0 ]; do
|
||||||
|
case "$1" in
|
||||||
|
--mode) MODE="$2"; shift 2 ;;
|
||||||
|
--mode=*) MODE="${1#--mode=}"; shift ;;
|
||||||
|
--no-bootstrap) NO_BOOTSTRAP=1; shift ;;
|
||||||
|
--regenerate-rankings) REGENERATE_RANKINGS=1; shift ;;
|
||||||
|
--regenerate-baseline) REGENERATE_BASELINE=1; shift ;;
|
||||||
|
-h|--help)
|
||||||
|
sed -n '1,16p' "$0" | sed 's/^# *//'
|
||||||
|
exit 0 ;;
|
||||||
|
*) echo "unknown arg: $1" >&2; exit 2 ;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
case "$MODE" in
|
||||||
|
contract|integration|performance) ;;
|
||||||
|
*) echo "[run_proof] invalid --mode '$MODE' (must be contract|integration|performance)" >&2; exit 2 ;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
export PROOF_MODE="$MODE"
|
||||||
|
export PROOF_REGENERATE_RANKINGS="$REGENERATE_RANKINGS"
|
||||||
|
export PROOF_REGENERATE_BASELINE="$REGENERATE_BASELINE"
|
||||||
|
|
||||||
|
# ── env setup ─────────────────────────────────────────────────────────────
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
cd "$SCRIPT_DIR/../.."
|
||||||
|
|
||||||
|
# Establish the report directory before sourcing env.sh so cases see it.
|
||||||
|
ts="$(date -u +%Y%m%d-%H%M%SZ)"
|
||||||
|
export PROOF_REPORT_DIR="$(pwd)/tests/proof/reports/proof-${ts}"
|
||||||
|
mkdir -p "$PROOF_REPORT_DIR"
|
||||||
|
|
||||||
|
# shellcheck source=lib/env.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/env.sh"
|
||||||
|
# shellcheck source=lib/http.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/http.sh"
|
||||||
|
# shellcheck source=lib/assert.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/assert.sh"
|
||||||
|
# shellcheck source=lib/metrics.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/metrics.sh"
|
||||||
|
|
||||||
|
echo "[run_proof] mode=${MODE} report=${PROOF_REPORT_DIR}"
|
||||||
|
echo "[run_proof] git_sha=${PROOF_GIT_SHA}"
|
||||||
|
|
||||||
|
# ── service lifecycle ────────────────────────────────────────────────────
|
||||||
|
PIDS=()
|
||||||
|
WE_BOOTED=0
|
||||||
|
|
||||||
|
cleanup() {
|
||||||
|
if [ "$WE_BOOTED" -eq 1 ] && [ "${#PIDS[@]}" -gt 0 ]; then
|
||||||
|
echo "[run_proof] cleanup: killing ${#PIDS[@]} services we started"
|
||||||
|
kill "${PIDS[@]}" 2>/dev/null || true
|
||||||
|
wait 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
trap cleanup EXIT INT TERM
|
||||||
|
|
||||||
|
poll_health() {
|
||||||
|
local name="$1" port="$2" deadline=$(($(date +%s) + 8))
|
||||||
|
while [ "$(date +%s)" -lt "$deadline" ]; do
|
||||||
|
if curl -sS --max-time 1 "http://127.0.0.1:${port}/health" >/dev/null 2>&1; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
sleep 0.1
|
||||||
|
done
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
bootstrap_services() {
|
||||||
|
echo "[run_proof] bootstrap: building binaries..."
|
||||||
|
export PATH="/usr/local/go/bin:${PATH}"
|
||||||
|
if ! go build -o bin/ ./cmd/... > "${PROOF_REPORT_DIR}/raw/logs/build.log" 2>&1; then
|
||||||
|
echo "[run_proof] BUILD FAILED — see raw/logs/build.log"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[run_proof] bootstrap: launching services in dep order..."
|
||||||
|
for SPEC in "storaged:3211" "catalogd:3212" "ingestd:3213" "queryd:3214" "vectord:3215" "embedd:3216" "gateway:3110"; do
|
||||||
|
local NAME="${SPEC%:*}" PORT="${SPEC#*:}"
|
||||||
|
# Skip if already up.
|
||||||
|
if curl -sS --max-time 1 "http://127.0.0.1:${PORT}/health" >/dev/null 2>&1; then
|
||||||
|
echo " ✓ ${NAME} (:${PORT}) already up — leaving as-is"
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
./bin/"$NAME" > "${PROOF_REPORT_DIR}/raw/logs/${NAME}.log" 2>&1 &
|
||||||
|
PIDS+=("$!")
|
||||||
|
if poll_health "$NAME" "$PORT"; then
|
||||||
|
echo " ✓ ${NAME} (:${PORT}) booted"
|
||||||
|
WE_BOOTED=1
|
||||||
|
else
|
||||||
|
echo " ✗ ${NAME} (:${PORT}) failed to bind in 8s — see raw/logs/${NAME}.log"
|
||||||
|
tail -20 "${PROOF_REPORT_DIR}/raw/logs/${NAME}.log" | sed 's/^/ /'
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
if [ "$NO_BOOTSTRAP" -eq 0 ]; then
|
||||||
|
if ! bootstrap_services; then
|
||||||
|
echo "[run_proof] FATAL — bootstrap failed"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "[run_proof] --no-bootstrap — assuming services already up"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# ── case discovery + filtering ───────────────────────────────────────────
|
||||||
|
discover_cases() {
|
||||||
|
# Returns case files matching the current mode, sorted by NN prefix.
|
||||||
|
# Each case declares CASE_TYPE; we re-source in a subshell to read it.
|
||||||
|
local f case_type
|
||||||
|
for f in "${SCRIPT_DIR}/cases/"*.sh; do
|
||||||
|
[ -e "$f" ] || continue
|
||||||
|
case_type=$(bash -c "source '$f' --metadata-only 2>/dev/null; echo \${CASE_TYPE:-}" 2>/dev/null || echo "")
|
||||||
|
# contract mode runs contract cases only
|
||||||
|
# integration mode runs contract + integration
|
||||||
|
# performance mode runs contract + integration + performance
|
||||||
|
case "$MODE:$case_type" in
|
||||||
|
contract:contract|\
|
||||||
|
integration:contract|integration:integration|\
|
||||||
|
performance:contract|performance:integration|performance:performance)
|
||||||
|
echo "$f" ;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
CASES=()
|
||||||
|
while IFS= read -r line; do CASES+=("$line"); done < <(discover_cases)
|
||||||
|
|
||||||
|
echo "[run_proof] cases for mode=${MODE}: ${#CASES[@]}"
|
||||||
|
|
||||||
|
# ── case execution ───────────────────────────────────────────────────────
|
||||||
|
CASE_PASS=0
|
||||||
|
CASE_FAIL=0
|
||||||
|
CASE_SKIP=0
|
||||||
|
REQUIRED_FAIL=0
|
||||||
|
|
||||||
|
for case_file in "${CASES[@]}"; do
|
||||||
|
case_name=$(basename "$case_file" .sh)
|
||||||
|
echo ""
|
||||||
|
echo "[run_proof] running ${case_name} ..."
|
||||||
|
SECONDS=0
|
||||||
|
if bash "$case_file" >> "${PROOF_REPORT_DIR}/raw/logs/${case_name}.log" 2>&1; then
|
||||||
|
echo " → wrapper exit 0 (${SECONDS}s)"
|
||||||
|
else
|
||||||
|
echo " → wrapper exit non-zero (${SECONDS}s) — see raw/logs/${case_name}.log"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
# ── aggregation ──────────────────────────────────────────────────────────
|
||||||
|
echo ""
|
||||||
|
echo "[run_proof] aggregating evidence..."
|
||||||
|
|
||||||
|
ALL_RECORDS_FILE="${PROOF_REPORT_DIR}/raw/all_records.jsonl"
|
||||||
|
> "$ALL_RECORDS_FILE"
|
||||||
|
for f in "${PROOF_REPORT_DIR}/raw/cases/"*.jsonl; do
|
||||||
|
[ -e "$f" ] || continue
|
||||||
|
cat "$f" >> "$ALL_RECORDS_FILE"
|
||||||
|
done
|
||||||
|
|
||||||
|
# grep -c exits 1 with output "0" when no matches; the `|| echo 0` form
|
||||||
|
# concatenates "0\n0" and breaks jq --argjson + arithmetic. Capture the
|
||||||
|
# count and force a clean integer fallback on non-zero exit.
|
||||||
|
_count() {
|
||||||
|
local pattern="$1" file="$2" n
|
||||||
|
n=$(grep -c "$pattern" "$file" 2>/dev/null) || n=0
|
||||||
|
echo "$n"
|
||||||
|
}
|
||||||
|
|
||||||
|
if [ -s "$ALL_RECORDS_FILE" ]; then
|
||||||
|
pass=$(_count '"result":"pass"' "$ALL_RECORDS_FILE")
|
||||||
|
fail=$(_count '"result":"fail"' "$ALL_RECORDS_FILE")
|
||||||
|
skip=$(_count '"result":"skip"' "$ALL_RECORDS_FILE")
|
||||||
|
else
|
||||||
|
pass=0; fail=0; skip=0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# summary.json
|
||||||
|
jq -n \
|
||||||
|
--arg mode "$MODE" \
|
||||||
|
--arg ts "$(date -u -Iseconds)" \
|
||||||
|
--arg sha "$PROOF_GIT_SHA" \
|
||||||
|
--argjson pass "$pass" \
|
||||||
|
--argjson fail "$fail" \
|
||||||
|
--argjson skip "$skip" \
|
||||||
|
--argjson cases "${#CASES[@]}" \
|
||||||
|
'{mode: $mode, timestamp_utc: $ts, git_sha: $sha,
|
||||||
|
counts: {pass: $pass, fail: $fail, skip: $skip},
|
||||||
|
cases_run: $cases, evidence_dir: "raw/"}' \
|
||||||
|
> "${PROOF_REPORT_DIR}/summary.json"
|
||||||
|
|
||||||
|
# summary.md
|
||||||
|
{
|
||||||
|
echo "# proof-${ts} — ${MODE} mode"
|
||||||
|
echo ""
|
||||||
|
echo "- git_sha: \`${PROOF_GIT_SHA}\`"
|
||||||
|
echo "- timestamp: $(date -u -Iseconds)"
|
||||||
|
echo "- cases run: ${#CASES[@]}"
|
||||||
|
echo "- assertions: ${pass} pass · ${fail} fail · ${skip} skip"
|
||||||
|
echo ""
|
||||||
|
echo "## per-case-id"
|
||||||
|
echo ""
|
||||||
|
echo "| case_id | pass | fail | skip |"
|
||||||
|
echo "|---|---:|---:|---:|"
|
||||||
|
# Iterate JSONL files (one per CASE_ID), not case scripts — a single
|
||||||
|
# case file may emit under multiple CASE_IDs and this preserves the
|
||||||
|
# mapping faithfully.
|
||||||
|
for jsonl in "${PROOF_REPORT_DIR}/raw/cases/"*.jsonl; do
|
||||||
|
[ -e "$jsonl" ] || continue
|
||||||
|
cid=$(basename "$jsonl" .jsonl)
|
||||||
|
cp=$(_count '"result":"pass"' "$jsonl")
|
||||||
|
cfl=$(_count '"result":"fail"' "$jsonl")
|
||||||
|
cs=$(_count '"result":"skip"' "$jsonl")
|
||||||
|
echo "| ${cid} | ${cp} | ${cfl} | ${cs} |"
|
||||||
|
done
|
||||||
|
echo ""
|
||||||
|
if [ "$fail" -gt 0 ]; then
|
||||||
|
echo "## failed assertions"
|
||||||
|
echo ""
|
||||||
|
grep '"result":"fail"' "$ALL_RECORDS_FILE" | jq -r '"- **\(.case_id)** — \(.claim) — expected: \(.expected) actual: \(.actual)"'
|
||||||
|
fi
|
||||||
|
} > "${PROOF_REPORT_DIR}/summary.md"
|
||||||
|
|
||||||
|
# ── exit ─────────────────────────────────────────────────────────────────
|
||||||
|
echo ""
|
||||||
|
echo "[run_proof] DONE — summary: ${PROOF_REPORT_DIR}/summary.md"
|
||||||
|
echo " ${pass} pass · ${fail} fail · ${skip} skip"
|
||||||
|
|
||||||
|
if [ "$fail" -gt 0 ]; then exit 1; fi
|
||||||
|
exit 0
|
||||||
Loading…
x
Reference in New Issue
Block a user