lakehouse

profit/lakehouse

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	4a94da2d41	tests/architecture_smoke — PRD-invariant probe against 500k workers J's reset: I'd been iterating on pipeline internals without a driver. The PRD says staffing is the REFERENCE consumer, not the domain driver — the architecture is the thing. This test makes that explicit. 8 sections exercise the PRD §Shared requirements against live production-shaped data (500k workers parquet, 50k-chunk vector index, 768d nomic embeddings): 1. preconditions — gateway + sidecar alive 2. catalog lookup — workers_500k resolves to 500000 rows 3. SQL at scale — count() + geo filter on 500k rows 4. vector search — /vectors/search returns top-k 5. hybrid SQL+vector — /vectors/hybrid with sql_filter 6. playbook_memory — /vectors/playbook_memory/stats 7. pathway_memory — ADR-021 stats + bug_fingerprints 8. truth gate — DROP TABLE blocked with 403 No cloud calls. Completes in ~5 seconds. Exits non-zero on any failure; failure messages print "these are the next things to fix." First-run measurements against current code: - 500k COUNT() = 22ms, OH-filtered = 20ms (invariant met) - vector search p=368ms on 10-NN - hybrid p=4662ms, returned 0 Toledo-OH hits (two signals worth investigating: the latency AND the empty result) - playbook_memory = 0 entries (rebuild never fired since boot) The 11/11 pass means the substrate's contract is intact. The measurements tell us WHERE to look next, not what to speculate. Going forward: this script is the canary. Run it after every substantive change. If a section flips from pass to fail, that IS the regression; roll back or fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 14:12:14 -05:00

Author

SHA1

Message

Date

root

4a94da2d41

tests/architecture_smoke — PRD-invariant probe against 500k workers

J's reset: I'd been iterating on pipeline internals without a
driver. The PRD says staffing is the REFERENCE consumer, not the
domain driver — the architecture is the thing. This test makes
that explicit.

8 sections exercise the PRD §Shared requirements against live
production-shaped data (500k workers parquet, 50k-chunk vector
index, 768d nomic embeddings):

  1. preconditions       — gateway + sidecar alive
  2. catalog lookup      — workers_500k resolves to 500000 rows
  3. SQL at scale        — count(*) + geo filter on 500k rows
  4. vector search       — /vectors/search returns top-k
  5. hybrid SQL+vector   — /vectors/hybrid with sql_filter
  6. playbook_memory     — /vectors/playbook_memory/stats
  7. pathway_memory      — ADR-021 stats + bug_fingerprints
  8. truth gate          — DROP TABLE blocked with 403

No cloud calls. Completes in ~5 seconds. Exits non-zero on any
failure; failure messages print "these are the next things to fix."

First-run measurements against current code:
  - 500k COUNT(*) = 22ms, OH-filtered = 20ms (invariant met)
  - vector search p=368ms on 10-NN
  - hybrid p=4662ms, returned 0 Toledo-OH hits (two signals worth
    investigating: the latency AND the empty result)
  - playbook_memory = 0 entries (rebuild never fired since boot)

The 11/11 pass means the substrate's contract is intact. The
measurements tell us WHERE to look next, not what to speculate.

Going forward: this script is the canary. Run it after every
substantive change. If a section flips from pass to fail, that IS
the regression; roll back or fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-24 14:12:14 -05:00

1 Commits