golangLAKEHOUSE/README.md
root 0cb29cda15 docs: README + PHASE_G0_KICKOFF reflect post-G0 state (G1, G1P, G2)
README was stuck on "Pre-Phase G0, implementation has not started"
while we shipped through G2. Updated to reflect the current 7-binary
service inventory, the 9 acceptance smokes, the cold-start deps
(MinIO bucket, Ollama with nomic-embed-text, secrets-go.toml).

PHASE_G0_KICKOFF gains a "Post-G0 work" pointer at the end —
brief table mapping each G1+/G2 commit to its smoke + scrum-fix
count. Full per-day detail stays in commit messages and the
project memory file.

No code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:45:59 -05:00

97 lines
3.6 KiB
Markdown

# golangLAKEHOUSE
Go reimplementation of the Lakehouse — a versioned knowledge
substrate for staffing analytics + local AI workloads.
## Status
**Phase G0 complete + G1/G1P/G2 shipped.** Six binaries plus a
seventh (vectord) and an eighth (embedd) on top, fronted by a
single gateway. Acceptance smokes green for D1-D6 + G1 + G1P + G2.
End-to-end staffing co-pilot pipeline functional through the
gateway:
```
text → /v1/embed → /v1/vectors/index/<name>/add
text → /v1/embed → /v1/vectors/index/<name>/search → top-K hits
```
Plus the SQL path:
```
CSV → /v1/ingest (parses, writes Parquet via storaged, registers
manifest with catalogd)
SQL → /v1/sql (DuckDB over the registered Parquets via httpfs)
```
See `docs/PHASE_G0_KICKOFF.md` for the day-by-day record (D1-D6 +
real-scale validation + G1/G1P/G2 pointer at the bottom).
## Service inventory
| Bin | Port | Role |
|---|---|---|
| `gateway` | 3110 | Reverse proxy fronting all backing services |
| `storaged` | 3211 | Object I/O over S3 (MinIO in dev) |
| `catalogd` | 3212 | Parquet manifest registry, ADR-020 idempotency |
| `ingestd` | 3213 | CSV → Parquet → register loop |
| `queryd` | 3214 | DuckDB SELECT over registered Parquets via httpfs |
| `vectord` | 3215 | HNSW vector search (+ optional persistence to storaged) |
| `embedd` | 3216 | Text → vector via Ollama (default `nomic-embed-text` 768-d) |
## Acceptance smokes
```
scripts/d1_smoke.sh # 5-binary skeleton + chi /health + gateway proxy probes
scripts/d2_smoke.sh # storaged GET/PUT/LIST/DELETE + 256 MiB cap + concurrency cap
scripts/d3_smoke.sh # catalogd register/manifest/list + rehydrate-across-restart
scripts/d4_smoke.sh # ingestd CSV → Parquet round-trip + schema-drift 409
scripts/d5_smoke.sh # queryd DuckDB SELECT through httpfs over MinIO
scripts/d6_smoke.sh # full ingest → query through gateway only
scripts/g1_smoke.sh # vectord HNSW recall + dim mismatch + duplicate-create 409
scripts/g1p_smoke.sh # vectord state survives kill+restart via storaged
scripts/g2_smoke.sh # embed → vectord add → search round-trip
```
Run them all in any order:
```
for s in scripts/{d1,d2,d3,d4,d5,d6,g1,g1p,g2}_smoke.sh; do "$s" || break; done
```
## Cold-start dependencies
- Go 1.25+ at `/usr/local/go/bin` (arrow-go pulled the 1.25 floor)
- `gcc` + `libc-dev` for the DuckDB cgo binding (ADR-001 §1.1)
- MinIO running on `:9000` with bucket `lakehouse-go-primary`
- Ollama running on `:11434` with `nomic-embed-text` loaded (G2)
- `/etc/lakehouse/secrets-go.toml` with `[s3.primary]` credentials
(storaged + queryd both read this)
## Layout
```
docs/ Direction + spec + ADRs + day-by-day
cmd/ One main package per binary
internal/ Shared packages — storeclient, catalogclient,
secrets, shared, embed, gateway, plus
per-service implementation packages
scripts/ Smokes + ancillary tooling
```
## Reading order
1. `docs/PRD.md` — what we're building and why
2. `docs/SPEC.md` — how, per-component
3. `docs/DECISIONS.md` — ADRs (ADR-001 foundational)
4. `docs/PHASE_G0_KICKOFF.md` — day-by-day from D1 through G2
5. `docs/RUST_PATHWAY_MEMORY_NOTE.md` — historical reference for the
Rust era's pathway memory (not migrated, by ADR-001 #5)
## Predecessor
The Rust Lakehouse this rewrite supersedes lives at
`git.agentview.dev/profit/lakehouse`. It remains the live system
serving `devop.live/lakehouse/` until this Go implementation reaches
feature parity per `docs/SPEC.md` §7. Then Rust enters
maintenance-only mode.