# golangLAKEHOUSE Go reimplementation of the Lakehouse — a versioned knowledge substrate for staffing analytics + local AI workloads. ## Status **Phase G0 complete + G1/G1P/G2 shipped.** Six binaries plus a seventh (vectord) and an eighth (embedd) on top, fronted by a single gateway. Acceptance smokes green for D1-D6 + G1 + G1P + G2. End-to-end staffing co-pilot pipeline functional through the gateway: ``` text → /v1/embed → /v1/vectors/index//add text → /v1/embed → /v1/vectors/index//search → top-K hits ``` Plus the SQL path: ``` CSV → /v1/ingest (parses, writes Parquet via storaged, registers manifest with catalogd) SQL → /v1/sql (DuckDB over the registered Parquets via httpfs) ``` See `docs/PHASE_G0_KICKOFF.md` for the day-by-day record (D1-D6 + real-scale validation + G1/G1P/G2 pointer at the bottom). ## Service inventory | Bin | Port | Role | |---|---|---| | `gateway` | 3110 | Reverse proxy fronting all backing services | | `storaged` | 3211 | Object I/O over S3 (MinIO in dev) | | `catalogd` | 3212 | Parquet manifest registry, ADR-020 idempotency | | `ingestd` | 3213 | CSV → Parquet → register loop | | `queryd` | 3214 | DuckDB SELECT over registered Parquets via httpfs | | `vectord` | 3215 | HNSW vector search (+ optional persistence to storaged) | | `embedd` | 3216 | Text → vector via Ollama (default `nomic-embed-text` 768-d) | | `mcpd` | stdio | Model Context Protocol server (Claude Desktop / Code consumers) | ## MCP server `bin/mcpd` exposes Lakehouse capabilities as MCP tools over stdio: `list_datasets`, `get_manifest`, `query_sql`, `embed_text`, `search_vectors`. All tools proxy to the gateway, so the gateway must be up first. Wire into Claude Desktop / Claude Code by adding to the MCP config: ```json { "mcpServers": { "lakehouse": { "command": "/path/to/golangLAKEHOUSE/bin/mcpd", "args": ["--gateway", "http://127.0.0.1:3110"] } } } ``` Replaces the Bun `mcp-server.ts` MCP-tool surface from the Rust system. HTTP demo routes (the staffing co-pilot UI) stay Bun until G5. ## Acceptance smokes ``` scripts/d1_smoke.sh # 5-binary skeleton + chi /health + gateway proxy probes scripts/d2_smoke.sh # storaged GET/PUT/LIST/DELETE + 256 MiB cap + concurrency cap scripts/d3_smoke.sh # catalogd register/manifest/list + rehydrate-across-restart scripts/d4_smoke.sh # ingestd CSV → Parquet round-trip + schema-drift 409 scripts/d5_smoke.sh # queryd DuckDB SELECT through httpfs over MinIO scripts/d6_smoke.sh # full ingest → query through gateway only scripts/g1_smoke.sh # vectord HNSW recall + dim mismatch + duplicate-create 409 scripts/g1p_smoke.sh # vectord state survives kill+restart via storaged scripts/g2_smoke.sh # embed → vectord add → search round-trip ``` Or run the full gate via the task runner (see below): ``` just verify # vet + tests + 9 smokes; ~33s wall ``` ## Task runner ``` just # show available recipes just verify # full Sprint 0 gate (vet + tests + 9 smokes) just smoke # single smoke (d1..d6, g1, g1p, g2) just doctor # check cold-start deps; --json for CI just install-hooks # install pre-push hook that runs just verify ``` After a fresh clone, run `just install-hooks` once so `git push` is gated on the same green chain that ran here. Hook lives in `.git/hooks/pre-push` (not tracked; recreated by the recipe). ## Cold-start dependencies - Go 1.25+ at `/usr/local/go/bin` (arrow-go pulled the 1.25 floor) - `gcc` + `libc-dev` for the DuckDB cgo binding (ADR-001 §1.1) - `just` task runner (`apt install just` on Debian 13+) - MinIO running on `:9000` with bucket `lakehouse-go-primary` - Ollama running on `:11434` with `nomic-embed-text` loaded (G2) - `/etc/lakehouse/secrets-go.toml` with `[s3.primary]` credentials (storaged + queryd both read this) `just doctor` probes all of the above and reports the fix command for each missing dep. CI / scripts can use `just doctor --json`. ## Layout ``` docs/ Direction + spec + ADRs + day-by-day cmd/ One main package per binary internal/ Shared packages — storeclient, catalogclient, secrets, shared, embed, gateway, plus per-service implementation packages scripts/ Smokes + ancillary tooling ``` ## Reading order 1. `docs/PRD.md` — what we're building and why 2. `docs/SPEC.md` — how, per-component 3. `docs/DECISIONS.md` — ADRs (ADR-001 foundational) 4. `docs/PHASE_G0_KICKOFF.md` — day-by-day from D1 through G2 5. `docs/RUST_PATHWAY_MEMORY_NOTE.md` — historical reference for the Rust era's pathway memory (not migrated, by ADR-001 #5) ## Predecessor The Rust Lakehouse this rewrite supersedes lives at `git.agentview.dev/profit/lakehouse`. It remains the live system serving `devop.live/lakehouse/` until this Go implementation reaches feature parity per `docs/SPEC.md` §7. Then Rust enters maintenance-only mode.