README was stuck on "Pre-Phase G0, implementation has not started" while we shipped through G2. Updated to reflect the current 7-binary service inventory, the 9 acceptance smokes, the cold-start deps (MinIO bucket, Ollama with nomic-embed-text, secrets-go.toml). PHASE_G0_KICKOFF gains a "Post-G0 work" pointer at the end — brief table mapping each G1+/G2 commit to its smoke + scrum-fix count. Full per-day detail stays in commit messages and the project memory file. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
97 lines
3.6 KiB
Markdown
97 lines
3.6 KiB
Markdown
# golangLAKEHOUSE
|
|
|
|
Go reimplementation of the Lakehouse — a versioned knowledge
|
|
substrate for staffing analytics + local AI workloads.
|
|
|
|
## Status
|
|
|
|
**Phase G0 complete + G1/G1P/G2 shipped.** Six binaries plus a
|
|
seventh (vectord) and an eighth (embedd) on top, fronted by a
|
|
single gateway. Acceptance smokes green for D1-D6 + G1 + G1P + G2.
|
|
|
|
End-to-end staffing co-pilot pipeline functional through the
|
|
gateway:
|
|
|
|
```
|
|
text → /v1/embed → /v1/vectors/index/<name>/add
|
|
text → /v1/embed → /v1/vectors/index/<name>/search → top-K hits
|
|
```
|
|
|
|
Plus the SQL path:
|
|
```
|
|
CSV → /v1/ingest (parses, writes Parquet via storaged, registers
|
|
manifest with catalogd)
|
|
SQL → /v1/sql (DuckDB over the registered Parquets via httpfs)
|
|
```
|
|
|
|
See `docs/PHASE_G0_KICKOFF.md` for the day-by-day record (D1-D6 +
|
|
real-scale validation + G1/G1P/G2 pointer at the bottom).
|
|
|
|
## Service inventory
|
|
|
|
| Bin | Port | Role |
|
|
|---|---|---|
|
|
| `gateway` | 3110 | Reverse proxy fronting all backing services |
|
|
| `storaged` | 3211 | Object I/O over S3 (MinIO in dev) |
|
|
| `catalogd` | 3212 | Parquet manifest registry, ADR-020 idempotency |
|
|
| `ingestd` | 3213 | CSV → Parquet → register loop |
|
|
| `queryd` | 3214 | DuckDB SELECT over registered Parquets via httpfs |
|
|
| `vectord` | 3215 | HNSW vector search (+ optional persistence to storaged) |
|
|
| `embedd` | 3216 | Text → vector via Ollama (default `nomic-embed-text` 768-d) |
|
|
|
|
## Acceptance smokes
|
|
|
|
```
|
|
scripts/d1_smoke.sh # 5-binary skeleton + chi /health + gateway proxy probes
|
|
scripts/d2_smoke.sh # storaged GET/PUT/LIST/DELETE + 256 MiB cap + concurrency cap
|
|
scripts/d3_smoke.sh # catalogd register/manifest/list + rehydrate-across-restart
|
|
scripts/d4_smoke.sh # ingestd CSV → Parquet round-trip + schema-drift 409
|
|
scripts/d5_smoke.sh # queryd DuckDB SELECT through httpfs over MinIO
|
|
scripts/d6_smoke.sh # full ingest → query through gateway only
|
|
scripts/g1_smoke.sh # vectord HNSW recall + dim mismatch + duplicate-create 409
|
|
scripts/g1p_smoke.sh # vectord state survives kill+restart via storaged
|
|
scripts/g2_smoke.sh # embed → vectord add → search round-trip
|
|
```
|
|
|
|
Run them all in any order:
|
|
```
|
|
for s in scripts/{d1,d2,d3,d4,d5,d6,g1,g1p,g2}_smoke.sh; do "$s" || break; done
|
|
```
|
|
|
|
## Cold-start dependencies
|
|
|
|
- Go 1.25+ at `/usr/local/go/bin` (arrow-go pulled the 1.25 floor)
|
|
- `gcc` + `libc-dev` for the DuckDB cgo binding (ADR-001 §1.1)
|
|
- MinIO running on `:9000` with bucket `lakehouse-go-primary`
|
|
- Ollama running on `:11434` with `nomic-embed-text` loaded (G2)
|
|
- `/etc/lakehouse/secrets-go.toml` with `[s3.primary]` credentials
|
|
(storaged + queryd both read this)
|
|
|
|
## Layout
|
|
|
|
```
|
|
docs/ Direction + spec + ADRs + day-by-day
|
|
cmd/ One main package per binary
|
|
internal/ Shared packages — storeclient, catalogclient,
|
|
secrets, shared, embed, gateway, plus
|
|
per-service implementation packages
|
|
scripts/ Smokes + ancillary tooling
|
|
```
|
|
|
|
## Reading order
|
|
|
|
1. `docs/PRD.md` — what we're building and why
|
|
2. `docs/SPEC.md` — how, per-component
|
|
3. `docs/DECISIONS.md` — ADRs (ADR-001 foundational)
|
|
4. `docs/PHASE_G0_KICKOFF.md` — day-by-day from D1 through G2
|
|
5. `docs/RUST_PATHWAY_MEMORY_NOTE.md` — historical reference for the
|
|
Rust era's pathway memory (not migrated, by ADR-001 #5)
|
|
|
|
## Predecessor
|
|
|
|
The Rust Lakehouse this rewrite supersedes lives at
|
|
`git.agentview.dev/profit/lakehouse`. It remains the live system
|
|
serving `devop.live/lakehouse/` until this Go implementation reaches
|
|
feature parity per `docs/SPEC.md` §7. Then Rust enters
|
|
maintenance-only mode.
|