Second-pass review via opencode/kimi-k2.6 (different lineage than Opus 4.7 used in the first pass) caught 2 BLOCKs that Opus missed — and that the Opus-pass fixes themselves introduced: - K1: D0.6 used `go install pkg@latest` to verify cgo, but that command requires a main package; duckdb-go/v2 is a library, so the verification fails BEFORE exercising cgo and could pass on a broken-cgo box. Replaced with a real compile-and-run smoke (tmp module + 5-line main.go that imports + calls sql.Open). - K2: Gateway stubbed /v1/ingest and /v1/sql in D1.10, but ingestd serves /ingest and queryd serves /sql. httputil.NewSingleHostReverseProxy preserves the inbound path by default — D6.1 now specifies a custom Director that strips the /v1 prefix before forwarding. Demonstrates the cross-lineage rotation value: one model's review of the original ≠ different model's review of the post-fix version. Same dynamic the Rust auditor exploits with Kimi/Haiku/Opus. Disposition table appended below the Opus pass for full audit trail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
284 lines
17 KiB
Markdown
284 lines
17 KiB
Markdown
# Phase G0 Kickoff Plan
|
||
|
||
**Goal:** smallest end-to-end ingest+query path in Go. Upload a CSV
|
||
via `POST /ingest`, query it via `POST /sql`, get rows back. No
|
||
vector, no profile, no UI yet.
|
||
|
||
**Estimated duration:** 1 engineer-week (5 working days + gate day +
|
||
cleanup). Plan is calibrated for solo work; cut by ~40% with two
|
||
engineers in parallel on storaged/catalogd vs ingestd/queryd.
|
||
|
||
**Cutoff for G0:** the closing acceptance gate (Day 6) passes
|
||
end-to-end against `workers_500k.csv` as the test fixture.
|
||
|
||
---
|
||
|
||
## Day 0 — One-time setup
|
||
|
||
Done by an operator with sudo on the dev box. ~15 minutes.
|
||
|
||
| # | Step | Verify |
|
||
|---|---|---|
|
||
| 0.1 | Install Go 1.23+: `curl -L https://go.dev/dl/go1.23.linux-amd64.tar.gz \| sudo tar -C /usr/local -xz` | `go version` shows 1.23+ |
|
||
| 0.2 | Add `/usr/local/go/bin` to `PATH` (in `~/.bashrc`) | new shell sees `go` |
|
||
| 0.3 | Install cgo toolchain: `apt install build-essential` | `gcc --version` works |
|
||
| 0.4 | Clone repo: `git clone https://git.agentview.dev/profit/golangLAKEHOUSE.git` | `cd golangLAKEHOUSE && go version` from inside |
|
||
| 0.5 | Bring up MinIO locally (or point at existing) | `mc ls local/` lists buckets, or whatever the dev S3 is |
|
||
| 0.6 | Verify DuckDB cgo path with a real compile-and-run smoke (NOT `go install pkg@latest` — that requires a main package and would fail on the duckdb library before exercising cgo). Steps: `mkdir /tmp/duckdb-smoke && cd /tmp/duckdb-smoke && go mod init smoke && go get github.com/duckdb/duckdb-go/v2 && cat > main.go <<<'package main; import (_ "github.com/duckdb/duckdb-go/v2"; "database/sql"); func main(){db,_:=sql.Open("duckdb","");db.Ping();db.Close()}' && go run main.go` — proves cgo linker chain + static-linked duckdb-go-bindings work on this platform | exits 0, no link errors |
|
||
|
||
**Day 0 acceptance:** `go version` shows 1.23+, `gcc --version` works,
|
||
MinIO reachable on `localhost:9000`, the cgo smoke install above
|
||
succeeded. (`go mod tidy` is intentionally NOT run here — no imports
|
||
yet; verification moves to D1.)
|
||
|
||
---
|
||
|
||
## Day 1 — Skeleton + chi + /health × 5 binaries
|
||
|
||
**Goal:** five binaries build, each binds to its port, `/health`
|
||
returns `{"status":"ok","service":"<name>"}`.
|
||
|
||
| # | File | What |
|
||
|---|---|---|
|
||
| 1.1 | `internal/shared/server.go` | chi router factory, slog setup, graceful shutdown via `signal.NotifyContext` |
|
||
| 1.2 | `internal/shared/config.go` | TOML loader using `pelletier/go-toml/v2`, default + override pattern |
|
||
| 1.3 | `cmd/gateway/main.go` | port 3100, `/health` |
|
||
| 1.4 | `cmd/storaged/main.go` | port 3201, `/health` |
|
||
| 1.5 | `cmd/catalogd/main.go` | port 3202, `/health` |
|
||
| 1.6 | `cmd/ingestd/main.go` | port 3203, `/health` |
|
||
| 1.7 | `cmd/queryd/main.go` | port 3204, `/health` |
|
||
| 1.8 | `lakehouse.toml` | bind addresses, log level — sample committed |
|
||
| 1.9 | `Makefile` | `build`, `run-gateway`, etc. — convenience |
|
||
| 1.10 | `cmd/gateway/main.go` adds STUB routes `POST /v1/ingest` and `POST /v1/sql` returning `501 Not Implemented` with a header `X-Lakehouse-Stub: g0`. Real reverse-proxy wiring lands on Day 6, but the routes exist from D1 so D6 is just behavior change, not new endpoints. | `curl -X POST :3100/v1/ingest` returns 501 with the stub header |
|
||
|
||
**Acceptance D1:** `go mod tidy` populates `go.sum` cleanly; `go build
|
||
./cmd/...` exits 0; running each binary in a separate terminal,
|
||
`curl :3100/health` through `:3204/health` all return `200 OK` with
|
||
the expected JSON; gateway's stub `/v1/*` routes return 501.
|
||
|
||
**Dependencies pulled:** `go-chi/chi/v5`, `pelletier/go-toml/v2`.
|
||
|
||
---
|
||
|
||
## Day 2 — storaged: S3 GET/PUT/LIST
|
||
|
||
**Goal:** put a file, get it back, list it.
|
||
|
||
| # | File | What |
|
||
|---|---|---|
|
||
| 2.1 | `internal/storaged/bucket.go` | `aws-sdk-go-v2/service/s3` wrapper — `Get`, `Put`, `List`, `Delete` |
|
||
| 2.2 | `internal/storaged/registry.go` | `BucketRegistry` skeleton (per Rust ADR-017) — single bucket only in G0; multi-bucket lands in G2 |
|
||
| 2.3 | `internal/secrets/provider.go` | `SecretsProvider` interface + `FileSecretsProvider` reading `/etc/lakehouse/secrets.toml` |
|
||
| 2.4 | `cmd/storaged/main.go` | wire routes — `GET /storage/get/{key}`, `PUT /storage/put/{key}`, `GET /storage/list?prefix=...`. Bind to `127.0.0.1:3201` only (G0 is dev-only, no auth). Apply `http.MaxBytesReader` with a 2 GiB cap on PUT to bound memory + reject runaway uploads |
|
||
|
||
**Acceptance D2:** `curl -T sample.csv 127.0.0.1:3201/storage/put/test/sample.csv`
|
||
returns 200; `curl 127.0.0.1:3201/storage/get/test/sample.csv` echoes
|
||
the file bytes; `curl 127.0.0.1:3201/storage/list?prefix=test/` lists
|
||
`sample.csv`; PUT exceeding 2 GiB returns `413 Payload Too Large`.
|
||
|
||
**Open question:** error journal (Rust ADR-018 append-log pattern) —
|
||
defer to G2 with multi-bucket federation, or wire it now? Plan says
|
||
defer; revisit if errors surface during D3-D5.
|
||
|
||
---
|
||
|
||
## Day 3 — catalogd: Parquet manifests
|
||
|
||
**Goal:** register a dataset, persist to storaged, restart, manifest
|
||
still visible.
|
||
|
||
| # | File | What |
|
||
|---|---|---|
|
||
| 3.1 | `internal/catalogd/manifest.go` | Parquet read/write using `arrow-go/v18/parquet/pqarrow`. Schema: `dataset_id`, `name`, `schema_fingerprint`, `objects`, `created_at`, `updated_at`, `row_count` |
|
||
| 3.2 | `internal/catalogd/registry.go` | In-memory index (map[name]Manifest), rehydrated on startup from `primary://_catalog/manifests/*.parquet` |
|
||
| 3.3 | `cmd/catalogd/main.go` | wire routes — `POST /catalog/register` (idempotent by name + fingerprint per Rust ADR-020), `GET /catalog/manifest/{name}`, `GET /catalog/list` |
|
||
| 3.4 | `internal/catalogd/store_client.go` | thin HTTP client to `cmd/storaged` — round-trips manifest Parquets |
|
||
|
||
**Acceptance D3:** register a dataset, see it in `/catalog/list`,
|
||
restart catalogd, `/catalog/list` still shows it. Re-register same
|
||
name + same fingerprint → 200, same `dataset_id`. Different
|
||
fingerprint → 409 Conflict.
|
||
|
||
---
|
||
|
||
## Day 4 — ingestd: CSV → Parquet → catalog
|
||
|
||
**Goal:** `POST /ingest` with a CSV file produces a Parquet in
|
||
storaged + a manifest in catalogd.
|
||
|
||
| # | File | What |
|
||
|---|---|---|
|
||
| 4.1 | `internal/ingestd/schema.go` | infer Arrow schema from CSV header + first-N-row sampling. ADR-010 default-to-string on ambiguity |
|
||
| 4.2 | `internal/ingestd/csv.go` | stream CSV → `array.RecordBatch` → Parquet (arrow-go pqarrow writer) |
|
||
| 4.3 | `cmd/ingestd/main.go` | route `POST /ingest` — multipart form file → schema infer → write Parquet → call catalogd to register |
|
||
|
||
**Acceptance D4:** `curl -F file=@workers_500k.csv :3203/ingest?name=workers_500k`
|
||
returns 200 with the registered manifest; `aws s3 ls` (or `mc ls`)
|
||
shows the Parquet under `primary://datasets/workers_500k/`;
|
||
`curl :3202/catalog/manifest/workers_500k` returns the manifest with
|
||
`row_count=500000`.
|
||
|
||
---
|
||
|
||
## Day 5 — queryd: DuckDB SELECT
|
||
|
||
**Goal:** SQL queries over Parquet datasets.
|
||
|
||
| # | File | What |
|
||
|---|---|---|
|
||
| 5.1 | `internal/queryd/db.go` | `database/sql` connection to `github.com/duckdb/duckdb-go/v2` (cgo). Ensures DuckDB extensions Parquet + httpfs are loaded; on connection open, executes `CREATE SECRET` (TYPE S3) populated from `internal/secrets/provider.go` so `read_parquet('s3://...')` against MinIO authenticates per session |
|
||
| 5.2 | `internal/queryd/registrar.go` | reads catalogd `/catalog/list`, registers each dataset as a DuckDB view: `CREATE VIEW workers_500k AS SELECT * FROM read_parquet('s3://...')` |
|
||
| 5.3 | `cmd/queryd/main.go` | route `POST /sql` (JSON body `{"sql": "..."}`). View refresh strategy: cache views with a TTL (default 30s) + invalidate on `If-None-Match` against catalogd's manifest etag. **Don't** re-CREATE on every request — Opus review flagged that as the perf cliff during D6 timing capture |
|
||
|
||
**Acceptance D5:** after Day 4 ingestion,
|
||
`curl -X POST -d '{"sql":"SELECT count(*) FROM workers_500k"}' :3204/sql`
|
||
returns `[{"count_star()":500000}]`. A `SELECT role, count(*) FROM
|
||
workers_500k WHERE state='IL' GROUP BY role` returns expected rows.
|
||
|
||
---
|
||
|
||
## Day 6 — Gate day: end-to-end via gateway
|
||
|
||
**Goal:** the closing G0 acceptance gate passes.
|
||
|
||
| # | What |
|
||
|---|---|
|
||
| 6.1 | Promote `cmd/gateway/main.go` `/v1/ingest` + `/v1/sql` from D1.10 stubs (501) to real reverse-proxies. `httputil.NewSingleHostReverseProxy` preserves the inbound path by default, so the proxy must use a custom `Director` (or `Rewrite`) that strips the `/v1` prefix before forwarding — otherwise the call lands on `ingestd:3203/v1/ingest` which doesn't exist (ingestd serves `/ingest`, queryd serves `/sql`). Multipart forwarding for `/v1/ingest` is the riskiest hop — verify form parts pass through with the file body intact |
|
||
| 6.2 | Smoke script `scripts/g0_smoke.sh`: spin up MinIO + 5 services, ingest, query, assert row count |
|
||
| 6.3 | Run smoke against `workers_500k.csv` end-to-end |
|
||
| 6.4 | Capture timing — total ingest + query latency, file size, peak memory |
|
||
|
||
**Closing G0 acceptance:** `scripts/g0_smoke.sh` exits 0. Numbers
|
||
recorded in `docs/G0_BASELINE.md` for future regression comparison.
|
||
|
||
---
|
||
|
||
## Day 7 — Cleanup + retro
|
||
|
||
| # | What |
|
||
|---|---|
|
||
| 7.1 | Update SPEC §4 G0 with what actually shipped vs planned (deviations, surprises) |
|
||
| 7.2 | Write `docs/G0_BASELINE.md` — measured perf numbers + comparison hooks for G1+ |
|
||
| 7.3 | **Finalize** ADRs that were stubbed *before* their decisions landed — ADR stubs go in at the start of D4 (arrow-go version pin, schema inference policy) and D5 (DuckDB extension load order, S3 secret provisioning, view-refresh TTL) so reviewers can object in-flight; D7 just commits them after running real code against the calls |
|
||
| 7.4 | Tag the commit `phase-g0-complete` |
|
||
| 7.5 | Open follow-up issues for anything punted (error journal, multi-bucket, profile system, two-phase-write orphan GC, shared-server.go refactor for cgo-handle services) |
|
||
|
||
---
|
||
|
||
## Risks tracked across the week
|
||
|
||
| Risk | Where | Mitigation |
|
||
|---|---|---|
|
||
| cgo build fails on the dev box | D5 | D0.3 verifies `gcc` present; if cgo specifically breaks, fall back to DuckDB external process (SPEC §3.1 option B) |
|
||
| arrow-go pqarrow schema mismatch with CSV inference | D4 | Sample 1k rows for type inference, default to String per ADR-010, log when defaulting |
|
||
| DuckDB can't read S3 Parquet directly | D5 | Load `httpfs` extension explicitly; if it fails, copy Parquet to a local temp file before query (slow but correct) |
|
||
| `/catalog/register` race between ingestd writer and catalogd reader | D3-D4 | Same write-lock-across-storage-write pattern as Rust ADR-020 — serialize registers; throughput is OK at low ingest QPS |
|
||
| `workers_500k.csv` schema drifts vs Rust era | D4 | Plan calls for inferring fresh, not porting Rust schema. If staffer-domain features break in G3+, revisit |
|
||
|
||
---
|
||
|
||
## Out of scope for G0 (deferred to later phases)
|
||
|
||
- Vector indexing — Phase G1
|
||
- Multi-bucket / federation — Phase G2
|
||
- Profile system — Phase G2
|
||
- Hot-swap atomicity — Phase G2
|
||
- Pathway memory — Phase G3
|
||
- Distillation pipeline — Phase G3
|
||
- MCP server / observer / auditor — Phase G4
|
||
- HTMX UI — Phase G5
|
||
- TLS, auth — explicit non-goal until G2 (single-bucket no-auth dev)
|
||
|
||
---
|
||
|
||
## Open questions before Day 1
|
||
|
||
1. **MinIO instance** — reuse the existing one at `localhost:9000`
|
||
that lakehouse uses (shared dev box) or stand up a fresh one with
|
||
a separate bucket prefix?
|
||
2. **`/etc/lakehouse/secrets.toml`** — share the lakehouse repo's
|
||
secrets file or create `/etc/golangLAKEHOUSE/secrets.toml`?
|
||
3. **Workers CSV source** — derive from `workers_500k.parquet` (round-
|
||
trip back to CSV) or use `workers_500k_v9.csv` if it exists?
|
||
|
||
These are ops calls, not architecture. Answer when D0 is being
|
||
executed.
|
||
|
||
---
|
||
|
||
## Self-review — independent pass via gateway overseer
|
||
|
||
Reviewer: `opencode/claude-opus-4-7` via `localhost:3100/v1/chat`
|
||
(the same path the production overseer correction loop uses post-G0
|
||
in the Rust era). Run on the original draft before any of the inline
|
||
fixes above were applied. Findings dispositioned below.
|
||
|
||
### BLOCK — both real, both fixed inline
|
||
|
||
| # | Finding | Disposition | Fix location |
|
||
|---|---|---|---|
|
||
| B1 | `apt install build-essential` alone won't satisfy the cgo link step for `duckdb-go/v2` | **Fixed** — D0.6 now runs a smoke `go install` against an empty module on D0 to flush platform issues here, not on D5 | D0.6 |
|
||
| B2 | DuckDB session needs S3 credentials (`CREATE SECRET`) plumbed from SecretsProvider; "load httpfs" alone leaves auth unwired | **Fixed** — D5.1 now calls `CREATE SECRET (TYPE S3, ...)` on connection open, populated from `internal/secrets/provider.go` | D5.1 |
|
||
|
||
### WARN — 4 of 5 fixed inline; 1 deferred
|
||
|
||
| # | Finding | Disposition | Fix location |
|
||
|---|---|---|---|
|
||
| W1 | Two-phase write (storaged → catalogd register) leaves orphan Parquets on partial failure; no GC story | **Deferred** — punted to G2 alongside multi-bucket + error journal; tracked in §Risks and D7.5 follow-up | D7.5 |
|
||
| W2 | "Refresh views on each `/sql` call" will be the D6 perf cliff | **Fixed** — D5.3 now uses TTL-cached views with etag invalidation against catalogd | D5.3 |
|
||
| W3 | Shared `internal/shared/server.go` factory across heterogeneous binaries (HTTP ingress vs cgo-DB-holder) couples graceful-shutdown semantics that will need unwinding later | **Accepted with note** — G0 keeps the simple shared factory; refactor explicitly listed as a G1+ follow-up | D7.5 |
|
||
| W4 | storaged PUT/GET on a TCP port with no auth + no body cap is a footgun | **Fixed** — D2.4 now binds `127.0.0.1` only and applies a 2 GiB `MaxBytesReader` cap | D2.4 |
|
||
| W5 | Gateway reverse-proxy introduced cold on D6 gate day compresses risk into the deadline | **Fixed** — D1.10 now stubs the routes returning 501; D6.1 just promotes them to real proxies | D1.10 + D6.1 |
|
||
|
||
### INFO — both fixed inline
|
||
|
||
| # | Finding | Disposition | Fix location |
|
||
|---|---|---|---|
|
||
| I1 | `go mod tidy` before any imports is a trivial-true verification | **Fixed** — D0.6 re-purposed for the cgo smoke; tidy verification moved to D1 acceptance | D0.6 + D1 acceptance |
|
||
| I2 | Filing ADRs *after* the work is done inverts the usual pattern | **Fixed** — D7.3 reframed: ADR stubs go in at the start of D4/D5 so reviewers can object in-flight; D7.3 just finalizes them | D7.3 |
|
||
|
||
### Net change (Opus pass)
|
||
|
||
7 of 9 findings produced inline plan edits; 2 deferred to post-G0
|
||
follow-up issues (W1 orphan GC, W3 shared-server refactor) with the
|
||
deferral itself documented. No findings dismissed as confabulation.
|
||
|
||
---
|
||
|
||
## Self-review — second pass via Kimi K2.6 (cross-lineage)
|
||
|
||
Reviewer: `opencode/kimi-k2.6` via the same gateway path. Run on the
|
||
post-Opus-fix doc to surface what a different model lineage catches.
|
||
Per the Rust-era cross-lineage rotation pattern, Kimi tends to ground
|
||
on textual specifics where Opus surfaces architectural shape — both
|
||
lenses are useful.
|
||
|
||
Kimi's output was discursive (pre-format thinking rather than the
|
||
requested BLOCK/WARN/INFO bullets), but two concrete catches landed —
|
||
both BLOCKs that Opus missed and that the Opus-pass fixes themselves
|
||
introduced.
|
||
|
||
### BLOCK — both real, both fixed inline
|
||
|
||
| # | Finding | Disposition | Fix location |
|
||
|---|---|---|---|
|
||
| K1 | Opus's BLOCK B1 fix used `go install github.com/duckdb/duckdb-go/v2@latest` to verify cgo, but `go install pkg@version` requires a `main` package — duckdb-go/v2 is a library, so the command fails with "not a main package" *before* the cgo linker chain is exercised. The verification can pass on a broken-cgo box. | **Fixed** — D0.6 now creates a temporary module + a 5-line `main.go` that imports duckdb-go/v2 and calls `sql.Open("duckdb","")`, then `go run`s it — actually compiles + executes with cgo | D0.6 |
|
||
| K2 | Path mismatch: D1.10 stubs `/v1/ingest` and `/v1/sql` on the gateway, but D4 has ingestd serving `/ingest` and D5 has queryd serving `/sql`. D6.1's `httputil.NewSingleHostReverseProxy` preserves the inbound path by default, so the proxy would forward to `ingestd:3203/v1/ingest` which doesn't exist. Smoke would fail on D6 with a 404 on the backend, not the gateway. | **Fixed** — D6.1 now specifies a custom `Director` that strips the `/v1` prefix before forwarding | D6.1 |
|
||
|
||
### WARN — none extracted (Kimi response truncated at this section)
|
||
|
||
The remaining response stream considered D2.4 `MaxBytesReader` semantics
|
||
and D3 manifest registry concurrency, but cut off before producing
|
||
structured findings. A third pass could be run if more lineage
|
||
coverage is wanted — for now, two BLOCKs fixed is the cross-lineage
|
||
delta.
|
||
|
||
### Net change (Kimi pass)
|
||
|
||
2 BLOCKs landed, both fixed inline. Both were *introduced by Opus-pass
|
||
fixes* — illustrating exactly why cross-lineage rotation matters: one
|
||
model's review of the original is not the same as a different model's
|
||
review of the post-fix version. The Rust auditor's Kimi/Haiku/Opus
|
||
rotation captures this dynamic; today's two-pass doc review reproduces
|
||
it on a much smaller scale.
|