Sprint 0 / R-004 / GATE-0.4 — the 9-smoke chain is no longer
documentation only. One command (`just verify`) runs vet + tests +
all 9 smokes; pre-push hook calls it; a regression cannot leave
this machine without explicit --no-verify override.
Recipes:
just verify full gate (33s wall on this box)
just smoke <day> single smoke (d1..d6, g1, g1p, g2)
just smoke-all all 9 smokes only
just doctor dep probe with structured output
(--json for CI / pre-push)
just install-hooks install .git/hooks/pre-push
just fmt|vet|test|build|clean
scripts/doctor.sh probes Go ≥1.25, gcc, MinIO at :9000 with bucket
lakehouse-go-primary, Ollama at :11434 with nomic-embed-text loaded,
/etc/lakehouse/secrets-go.toml with [s3.primary]. Each missing dep
prints its install fix command. JSON mode emits the same shape for
CI / pre-push consumers.
README updated with the task-runner section + just install-hooks
on cold-start. Hooks live in .git/hooks/ (untracked); install
recipe recreates them on a fresh clone.
PATH note: justfile prepends /usr/local/go/bin so recipes find Go
without depending on the parent shell's PATH (ADR-001 §1.x lives
go there).
Verified: just verify exits 0 in 33s wall (vet ~0.1s + test ~0.1s +
9 smokes deterministic per audit baseline). Pre-push hook installed
and bash -n clean.
Closes audit risk R-004 (smokes not gated).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
115 lines
4.3 KiB
Markdown
115 lines
4.3 KiB
Markdown
# golangLAKEHOUSE
|
|
|
|
Go reimplementation of the Lakehouse — a versioned knowledge
|
|
substrate for staffing analytics + local AI workloads.
|
|
|
|
## Status
|
|
|
|
**Phase G0 complete + G1/G1P/G2 shipped.** Six binaries plus a
|
|
seventh (vectord) and an eighth (embedd) on top, fronted by a
|
|
single gateway. Acceptance smokes green for D1-D6 + G1 + G1P + G2.
|
|
|
|
End-to-end staffing co-pilot pipeline functional through the
|
|
gateway:
|
|
|
|
```
|
|
text → /v1/embed → /v1/vectors/index/<name>/add
|
|
text → /v1/embed → /v1/vectors/index/<name>/search → top-K hits
|
|
```
|
|
|
|
Plus the SQL path:
|
|
```
|
|
CSV → /v1/ingest (parses, writes Parquet via storaged, registers
|
|
manifest with catalogd)
|
|
SQL → /v1/sql (DuckDB over the registered Parquets via httpfs)
|
|
```
|
|
|
|
See `docs/PHASE_G0_KICKOFF.md` for the day-by-day record (D1-D6 +
|
|
real-scale validation + G1/G1P/G2 pointer at the bottom).
|
|
|
|
## Service inventory
|
|
|
|
| Bin | Port | Role |
|
|
|---|---|---|
|
|
| `gateway` | 3110 | Reverse proxy fronting all backing services |
|
|
| `storaged` | 3211 | Object I/O over S3 (MinIO in dev) |
|
|
| `catalogd` | 3212 | Parquet manifest registry, ADR-020 idempotency |
|
|
| `ingestd` | 3213 | CSV → Parquet → register loop |
|
|
| `queryd` | 3214 | DuckDB SELECT over registered Parquets via httpfs |
|
|
| `vectord` | 3215 | HNSW vector search (+ optional persistence to storaged) |
|
|
| `embedd` | 3216 | Text → vector via Ollama (default `nomic-embed-text` 768-d) |
|
|
|
|
## Acceptance smokes
|
|
|
|
```
|
|
scripts/d1_smoke.sh # 5-binary skeleton + chi /health + gateway proxy probes
|
|
scripts/d2_smoke.sh # storaged GET/PUT/LIST/DELETE + 256 MiB cap + concurrency cap
|
|
scripts/d3_smoke.sh # catalogd register/manifest/list + rehydrate-across-restart
|
|
scripts/d4_smoke.sh # ingestd CSV → Parquet round-trip + schema-drift 409
|
|
scripts/d5_smoke.sh # queryd DuckDB SELECT through httpfs over MinIO
|
|
scripts/d6_smoke.sh # full ingest → query through gateway only
|
|
scripts/g1_smoke.sh # vectord HNSW recall + dim mismatch + duplicate-create 409
|
|
scripts/g1p_smoke.sh # vectord state survives kill+restart via storaged
|
|
scripts/g2_smoke.sh # embed → vectord add → search round-trip
|
|
```
|
|
|
|
Or run the full gate via the task runner (see below):
|
|
```
|
|
just verify # vet + tests + 9 smokes; ~33s wall
|
|
```
|
|
|
|
## Task runner
|
|
|
|
```
|
|
just # show available recipes
|
|
just verify # full Sprint 0 gate (vet + tests + 9 smokes)
|
|
just smoke <day> # single smoke (d1..d6, g1, g1p, g2)
|
|
just doctor # check cold-start deps; --json for CI
|
|
just install-hooks # install pre-push hook that runs just verify
|
|
```
|
|
|
|
After a fresh clone, run `just install-hooks` once so `git push` is
|
|
gated on the same green chain that ran here. Hook lives in
|
|
`.git/hooks/pre-push` (not tracked; recreated by the recipe).
|
|
|
|
## Cold-start dependencies
|
|
|
|
- Go 1.25+ at `/usr/local/go/bin` (arrow-go pulled the 1.25 floor)
|
|
- `gcc` + `libc-dev` for the DuckDB cgo binding (ADR-001 §1.1)
|
|
- `just` task runner (`apt install just` on Debian 13+)
|
|
- MinIO running on `:9000` with bucket `lakehouse-go-primary`
|
|
- Ollama running on `:11434` with `nomic-embed-text` loaded (G2)
|
|
- `/etc/lakehouse/secrets-go.toml` with `[s3.primary]` credentials
|
|
(storaged + queryd both read this)
|
|
|
|
`just doctor` probes all of the above and reports the fix command
|
|
for each missing dep. CI / scripts can use `just doctor --json`.
|
|
|
|
## Layout
|
|
|
|
```
|
|
docs/ Direction + spec + ADRs + day-by-day
|
|
cmd/ One main package per binary
|
|
internal/ Shared packages — storeclient, catalogclient,
|
|
secrets, shared, embed, gateway, plus
|
|
per-service implementation packages
|
|
scripts/ Smokes + ancillary tooling
|
|
```
|
|
|
|
## Reading order
|
|
|
|
1. `docs/PRD.md` — what we're building and why
|
|
2. `docs/SPEC.md` — how, per-component
|
|
3. `docs/DECISIONS.md` — ADRs (ADR-001 foundational)
|
|
4. `docs/PHASE_G0_KICKOFF.md` — day-by-day from D1 through G2
|
|
5. `docs/RUST_PATHWAY_MEMORY_NOTE.md` — historical reference for the
|
|
Rust era's pathway memory (not migrated, by ADR-001 #5)
|
|
|
|
## Predecessor
|
|
|
|
The Rust Lakehouse this rewrite supersedes lives at
|
|
`git.agentview.dev/profit/lakehouse`. It remains the live system
|
|
serving `devop.live/lakehouse/` until this Go implementation reaches
|
|
feature parity per `docs/SPEC.md` §7. Then Rust enters
|
|
maintenance-only mode.
|