Added the contract tier above 00_health canary. All 5 contract cases
now cover GOLAKE-001-003, 050, 060-061, 080-085 — 53 assertions pass,
1 informational skip, 0 fail. Wall: 4s end-to-end (cached binaries).
Cases:
05_embedding_contract.sh
GOLAKE-050. POST /v1/embed with one short text → asserts dim=768,
one vector returned, vector length matches dimension, sum of
squared elements > 0 (proxy for non-zero), response.model echoed.
Skips with explicit reason if Ollama is unreachable (502 from
embedd) — per spec hard rule "skipped tests do not appear as
passed."
06_vector_add_search.sh
GOLAKE-060 + GOLAKE-061. Synthetic dim=4 unit basis vectors.
Create index → add 3 vectors → get-index returns length=3 →
search([1,0,0,0],k=3) returns v1 at rank 1 with distance < 0.001.
Cleanup with DELETE. No embedd dependency — pure contract layer.
08_gateway_contracts.sh
GOLAKE-003. For each /v1/* route, asserts gateway and direct
upstream return identical status AND identical response body
(sha256 match). Confirms gateway is a proxy not a transformer.
Status passthrough verified on both 200 path (storage/list,
catalog/list) and 4xx path (sql empty body → 400 from queryd).
09_failure_modes.sh
GOLAKE-080..085. Six failure-mode contracts:
080 malformed JSON → 4xx on catalog/ingest/sql/embed
081 missing required field → 4xx on catalog/vectors/embed
082 bad SQL → 4xx with non-empty error body
083 vector dim mismatch → 4xx
084 missing storage object → 404
085 duplicate vector ID → INFORMATIONAL (spec says required:false)
first/second statuses recorded as evidence; contract decided
later from the recorded record.
Two new lib helpers in lib/assert.sh:
proof_assert_status_in <id> <claim> "200 201 204" <probe>
pass if status is in the space-separated list. Used for
delete-returns-200-or-204 case where vectord returns 204.
proof_assert_status_4xx <id> <claim> <probe>
pass if status in [400, 500). Used for failure modes where the
specific 4xx code may vary (400 vs 422 vs 409). Records actual
code as evidence.
Two real contract findings recorded by the harness during build:
- vectord add expects {"items": [...]}, not {"vectors": [...]}.
My initial test sent the wrong field; would have masked the bug
forever in CI. The harness caught it via the assertion failure.
- vectord create returns 201 Created, delete returns 204 No Content.
Documented in the test fixtures as canonical.
Regression: just verify wall 33s, vet + test + 9 smokes still green.
Phase C (integration) lands next: 01_storage_roundtrip, 02_catalog_manifest,
03_ingest_csv_to_parquet, 04_query_correctness, 05/06 integration extends,
07_vector_persistence_restart.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
golangLAKEHOUSE
Go reimplementation of the Lakehouse — a versioned knowledge substrate for staffing analytics + local AI workloads.
Status
Phase G0 complete + G1/G1P/G2 shipped. Six binaries plus a seventh (vectord) and an eighth (embedd) on top, fronted by a single gateway. Acceptance smokes green for D1-D6 + G1 + G1P + G2.
End-to-end staffing co-pilot pipeline functional through the gateway:
text → /v1/embed → /v1/vectors/index/<name>/add
text → /v1/embed → /v1/vectors/index/<name>/search → top-K hits
Plus the SQL path:
CSV → /v1/ingest (parses, writes Parquet via storaged, registers
manifest with catalogd)
SQL → /v1/sql (DuckDB over the registered Parquets via httpfs)
See docs/PHASE_G0_KICKOFF.md for the day-by-day record (D1-D6 +
real-scale validation + G1/G1P/G2 pointer at the bottom).
Service inventory
| Bin | Port | Role |
|---|---|---|
gateway |
3110 | Reverse proxy fronting all backing services |
storaged |
3211 | Object I/O over S3 (MinIO in dev) |
catalogd |
3212 | Parquet manifest registry, ADR-020 idempotency |
ingestd |
3213 | CSV → Parquet → register loop |
queryd |
3214 | DuckDB SELECT over registered Parquets via httpfs |
vectord |
3215 | HNSW vector search (+ optional persistence to storaged) |
embedd |
3216 | Text → vector via Ollama (default nomic-embed-text 768-d) |
Acceptance smokes
scripts/d1_smoke.sh # 5-binary skeleton + chi /health + gateway proxy probes
scripts/d2_smoke.sh # storaged GET/PUT/LIST/DELETE + 256 MiB cap + concurrency cap
scripts/d3_smoke.sh # catalogd register/manifest/list + rehydrate-across-restart
scripts/d4_smoke.sh # ingestd CSV → Parquet round-trip + schema-drift 409
scripts/d5_smoke.sh # queryd DuckDB SELECT through httpfs over MinIO
scripts/d6_smoke.sh # full ingest → query through gateway only
scripts/g1_smoke.sh # vectord HNSW recall + dim mismatch + duplicate-create 409
scripts/g1p_smoke.sh # vectord state survives kill+restart via storaged
scripts/g2_smoke.sh # embed → vectord add → search round-trip
Or run the full gate via the task runner (see below):
just verify # vet + tests + 9 smokes; ~33s wall
Task runner
just # show available recipes
just verify # full Sprint 0 gate (vet + tests + 9 smokes)
just smoke <day> # single smoke (d1..d6, g1, g1p, g2)
just doctor # check cold-start deps; --json for CI
just install-hooks # install pre-push hook that runs just verify
After a fresh clone, run just install-hooks once so git push is
gated on the same green chain that ran here. Hook lives in
.git/hooks/pre-push (not tracked; recreated by the recipe).
Cold-start dependencies
- Go 1.25+ at
/usr/local/go/bin(arrow-go pulled the 1.25 floor) gcc+libc-devfor the DuckDB cgo binding (ADR-001 §1.1)justtask runner (apt install juston Debian 13+)- MinIO running on
:9000with bucketlakehouse-go-primary - Ollama running on
:11434withnomic-embed-textloaded (G2) /etc/lakehouse/secrets-go.tomlwith[s3.primary]credentials (storaged + queryd both read this)
just doctor probes all of the above and reports the fix command
for each missing dep. CI / scripts can use just doctor --json.
Layout
docs/ Direction + spec + ADRs + day-by-day
cmd/ One main package per binary
internal/ Shared packages — storeclient, catalogclient,
secrets, shared, embed, gateway, plus
per-service implementation packages
scripts/ Smokes + ancillary tooling
Reading order
docs/PRD.md— what we're building and whydocs/SPEC.md— how, per-componentdocs/DECISIONS.md— ADRs (ADR-001 foundational)docs/PHASE_G0_KICKOFF.md— day-by-day from D1 through G2docs/RUST_PATHWAY_MEMORY_NOTE.md— historical reference for the Rust era's pathway memory (not migrated, by ADR-001 #5)
Predecessor
The Rust Lakehouse this rewrite supersedes lives at
git.agentview.dev/profit/lakehouse. It remains the live system
serving devop.live/lakehouse/ until this Go implementation reaches
feature parity per docs/SPEC.md §7. Then Rust enters
maintenance-only mode.