golangLAKEHOUSE/docs/PHASE_G0_KICKOFF.md
root 8cfcdb8e5f G0 D2: storaged S3 GET/PUT/LIST/DELETE · 3-lineage scrum · 4 fixes applied
Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes
binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length
up-front 413, and a 4-slot non-blocking semaphore returning 503 +
Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against
the dedicated MinIO bucket lakehouse-go-primary, isolated from the
Rust system's lakehouse bucket during coexistence.

Cross-lineage scrum on the shipped code:
  - Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO
  - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives)
  - Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k
    cap and the direct adapter's empty-content reasoning bug):
    1 BLOCK + 2 WARN + 1 INFO

Fixed:
  C1 buildRegistry ctx cancel footgun → context.Background()
     (Opus + Kimi convergent; future credential refresh chains)
  C2 MaxBytesReader unwrap through manager.Uploader multipart
     goroutines → Content-Length up-front 413 + string-suffix fallback
     (Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range)
  C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap
     (Opus + Kimi convergent; OOM guard)
  S1 PUT response Content-Type: application/json (Opus single-reviewer)

Strict validateKey policy (J approved): rejects empty, >1024B, NUL,
leading "/", ".." path components, CR/LF/tab control characters.
DELETE exposed at HTTP layer (J approved option A) for symmetry +
smoke ergonomics.

Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after
every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2).

Process finding worth recording: opencode caps non-streaming Kimi at
max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of
reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905
delivered structured output in ~33s. Future Kimi scrums should default
to that route.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:23:03 -05:00

33 KiB
Raw Blame History

Phase G0 Kickoff Plan

Goal: smallest end-to-end ingest+query path in Go. Upload a CSV via POST /ingest, query it via POST /sql, get rows back. No vector, no profile, no UI yet.

Estimated duration: 1 engineer-week (5 working days + gate day + cleanup). Plan is calibrated for solo work; cut by ~40% with two engineers in parallel on storaged/catalogd vs ingestd/queryd.

Cutoff for G0: the closing acceptance gate (Day 6) passes end-to-end against workers_500k.csv as the test fixture.


Day 0 — One-time setup

Done by an operator with sudo on the dev box. ~15 minutes.

# Step Verify
0.1 Install Go 1.23+: curl -L https://go.dev/dl/go1.23.linux-amd64.tar.gz | sudo tar -C /usr/local -xz go version shows 1.23+
0.2 Add /usr/local/go/bin to PATH (in ~/.bashrc) new shell sees go
0.3 Install cgo toolchain: apt install build-essential gcc --version works
0.4 Clone repo: git clone https://git.agentview.dev/profit/golangLAKEHOUSE.git cd golangLAKEHOUSE && go version from inside
0.5 Bring up MinIO locally (or point at existing) mc ls local/ lists buckets, or whatever the dev S3 is
0.6 Verify DuckDB cgo path with a real compile-and-run smoke (NOT go install pkg@latest — that requires a main package and would fail on the duckdb library before exercising cgo). Steps: mkdir /tmp/duckdb-smoke && cd /tmp/duckdb-smoke && go mod init smoke && go get github.com/duckdb/duckdb-go/v2 && cat > main.go <<<'package main; import (_ "github.com/duckdb/duckdb-go/v2"; "database/sql"); func main(){db,_:=sql.Open("duckdb","");db.Ping();db.Close()}' && go run main.go — proves cgo linker chain + static-linked duckdb-go-bindings work on this platform exits 0, no link errors

Day 0 acceptance: go version shows 1.23+, gcc --version works, MinIO reachable on localhost:9000, the cgo smoke install above succeeded. (go mod tidy is intentionally NOT run here — no imports yet; verification moves to D1.)


Day 1 — Skeleton + chi + /health × 5 binaries

Goal: five binaries build, each binds to its port, /health returns {"status":"ok","service":"<name>"}.

# File What
1.1 internal/shared/server.go chi router factory, slog setup, graceful shutdown via signal.NotifyContext
1.2 internal/shared/config.go TOML loader using pelletier/go-toml/v2, default + override pattern
1.3 cmd/gateway/main.go port 3110, /health
1.4 cmd/storaged/main.go port 3211, /health
1.5 cmd/catalogd/main.go port 3212, /health
1.6 cmd/ingestd/main.go port 3213, /health
1.7 cmd/queryd/main.go port 3214, /health

Port shift note: Original SPEC said 3100/32013204; D1 runtime caught a collision — the live Rust lakehouse owns 3100. Shifted Go dev ports to 3110+ so both systems can run concurrently during the migration. G5 cutover flips gateway back to 3100 when Rust retires. | 1.8 | lakehouse.toml | bind addresses, log level — sample committed | | 1.9 | Makefile | build, run-gateway, etc. — convenience | | 1.10 | cmd/gateway/main.go adds STUB routes POST /v1/ingest and POST /v1/sql returning 501 Not Implemented with a header X-Lakehouse-Stub: g0. Real reverse-proxy wiring lands on Day 6, but the routes exist from D1 so D6 is just behavior change, not new endpoints. | curl -X POST :3100/v1/ingest returns 501 with the stub header |

Acceptance D1: go mod tidy populates go.sum cleanly; go build ./cmd/... exits 0; running each binary in a separate terminal, curl :3100/health through :3204/health all return 200 OK with the expected JSON; gateway's stub /v1/* routes return 501.

Dependencies pulled: go-chi/chi/v5, pelletier/go-toml/v2.


Day 2 — storaged: S3 GET/PUT/LIST

Goal: put a file, get it back, list it.

# File What
2.1 internal/storaged/bucket.go aws-sdk-go-v2/service/s3 wrapper — Get, Put, List, Delete
2.2 internal/storaged/registry.go BucketRegistry skeleton (per Rust ADR-017) — single bucket only in G0; multi-bucket lands in G2
2.3 internal/secrets/provider.go SecretsProvider interface + FileSecretsProvider reading /etc/lakehouse/secrets.toml
2.4 cmd/storaged/main.go wire routes — GET /storage/get/{key}, PUT /storage/put/{key}, GET /storage/list?prefix=.... Bind to 127.0.0.1:3201 only (G0 is dev-only, no auth). Apply http.MaxBytesReader with a 256 MiB per-request cap (reduced from 2 GiB per Qwen3-coder Q1 — 2 GiB × N concurrent = blow the box) + a buffered semaphore capping concurrent in-flight PUTs at 4. PUTs exceeding the cap → 413; PUTs blocked on the semaphore → 503 with Retry-After: 5

Acceptance D2: curl -T sample.csv 127.0.0.1:3201/storage/put/test/sample.csv returns 200; curl 127.0.0.1:3201/storage/get/test/sample.csv echoes the file bytes; curl 127.0.0.1:3201/storage/list?prefix=test/ lists sample.csv; PUT exceeding 2 GiB returns 413 Payload Too Large.

Open question: error journal (Rust ADR-018 append-log pattern) — defer to G2 with multi-bucket federation, or wire it now? Plan says defer; revisit if errors surface during D3-D5.


Day 3 — catalogd: Parquet manifests

Goal: register a dataset, persist to storaged, restart, manifest still visible.

# File What
3.1 internal/catalogd/manifest.go Parquet read/write using arrow-go/v18/parquet/pqarrow. Schema: dataset_id, name, schema_fingerprint, objects, created_at, updated_at, row_count
3.2 internal/catalogd/registry.go In-memory index (map[name]Manifest), rehydrated on startup from primary://_catalog/manifests/*.parquet
3.3 cmd/catalogd/main.go wire routes — POST /catalog/register (idempotent by name + fingerprint per Rust ADR-020), GET /catalog/manifest/{name}, GET /catalog/list
3.4 internal/catalogd/store_client.go thin HTTP client to cmd/storaged — round-trips manifest Parquets

Acceptance D3: register a dataset, see it in /catalog/list, restart catalogd, /catalog/list still shows it. Re-register same name + same fingerprint → 200, same dataset_id. Different fingerprint → 409 Conflict.


Day 4 — ingestd: CSV → Parquet → catalog

Goal: POST /ingest with a CSV file produces a Parquet in storaged + a manifest in catalogd.

# File What
4.1 internal/ingestd/schema.go infer Arrow schema from CSV header + first-N-row sampling. ADR-010 default-to-string on ambiguity
4.2 internal/ingestd/csv.go stream CSV → array.RecordBatch → Parquet (arrow-go pqarrow writer)
4.3 cmd/ingestd/main.go route POST /ingest — multipart form file → schema infer → write Parquet → call catalogd to register

Acceptance D4: curl -F file=@workers_500k.csv :3203/ingest?name=workers_500k returns 200 with the registered manifest; aws s3 ls (or mc ls) shows the Parquet under primary://datasets/workers_500k/; curl :3202/catalog/manifest/workers_500k returns the manifest with row_count=500000.


Day 5 — queryd: DuckDB SELECT

Goal: SQL queries over Parquet datasets.

# File What
5.1 internal/queryd/db.go database/sql connection to github.com/duckdb/duckdb-go/v2 (cgo). Ensures DuckDB extensions Parquet + httpfs are loaded; on connection open, executes CREATE SECRET (TYPE S3) populated from internal/secrets/provider.go so read_parquet('s3://...') against MinIO authenticates per session
5.2 internal/queryd/registrar.go reads catalogd /catalog/list, registers each dataset as a DuckDB view: CREATE VIEW workers_500k AS SELECT * FROM read_parquet('s3://...')
5.3 cmd/queryd/main.go route POST /sql (JSON body {"sql": "..."}). View refresh strategy: cache views with a TTL (default 30s) + invalidate on If-None-Match against catalogd's manifest etag. Don't re-CREATE on every request — Opus review flagged that as the perf cliff during D6 timing capture

Acceptance D5: after Day 4 ingestion, curl -X POST -d '{"sql":"SELECT count(*) FROM workers_500k"}' :3204/sql returns [{"count_star()":500000}]. A SELECT role, count(*) FROM workers_500k WHERE state='IL' GROUP BY role returns expected rows.


Day 6 — Gate day: end-to-end via gateway

Goal: the closing G0 acceptance gate passes.

# What
6.1 Promote cmd/gateway/main.go /v1/ingest + /v1/sql from D1.10 stubs (501) to real reverse-proxies. httputil.NewSingleHostReverseProxy preserves the inbound path by default, so the proxy must use a custom Director (or Rewrite) that strips the /v1 prefix before forwarding — otherwise the call lands on ingestd:3203/v1/ingest which doesn't exist (ingestd serves /ingest, queryd serves /sql). Multipart forwarding for /v1/ingest is the riskiest hop — verify form parts pass through with the file body intact
6.2 Smoke script scripts/g0_smoke.sh: spin up MinIO + 5 services, ingest, query, assert row count
6.3 Run smoke against workers_500k.csv end-to-end
6.4 Capture timing — total ingest + query latency, file size, peak memory

Closing G0 acceptance: scripts/g0_smoke.sh exits 0. Numbers recorded in docs/G0_BASELINE.md for future regression comparison.


Day 7 — Cleanup + retro

# What
7.1 Update SPEC §4 G0 with what actually shipped vs planned (deviations, surprises)
7.2 Write docs/G0_BASELINE.md — measured perf numbers + comparison hooks for G1+
7.3 Finalize ADRs that were stubbed before their decisions landed — ADR stubs go in at the start of D4 (arrow-go version pin, schema inference policy) and D5 (DuckDB extension load order, S3 secret provisioning, view-refresh TTL) so reviewers can object in-flight; D7 just commits them after running real code against the calls
7.4 Tag the commit phase-g0-complete
7.5 Open follow-up issues for anything punted (error journal, multi-bucket, profile system, two-phase-write orphan GC, shared-server.go refactor for cgo-handle services)

Risks tracked across the week

Risk Where Mitigation
cgo build fails on the dev box D5 D0.3 verifies gcc present; if cgo specifically breaks, fall back to DuckDB external process (SPEC §3.1 option B)
arrow-go pqarrow schema mismatch with CSV inference D4 Sample 1k rows for type inference, default to String per ADR-010, log when defaulting
DuckDB can't read S3 Parquet directly D5 Load httpfs extension explicitly; if it fails, copy Parquet to a local temp file before query (slow but correct)
/catalog/register race between ingestd writer and catalogd reader D3-D4 Same write-lock-across-storage-write pattern as Rust ADR-020 — serialize registers; throughput is OK at low ingest QPS
workers_500k.csv schema drifts vs Rust era D4 Plan calls for inferring fresh, not porting Rust schema. If staffer-domain features break in G3+, revisit

Out of scope for G0 (deferred to later phases)

  • Vector indexing — Phase G1
  • Multi-bucket / federation — Phase G2
  • Profile system — Phase G2
  • Hot-swap atomicity — Phase G2
  • Pathway memory — Phase G3
  • Distillation pipeline — Phase G3
  • MCP server / observer / auditor — Phase G4
  • HTMX UI — Phase G5
  • TLS, auth — explicit non-goal until G2 (single-bucket no-auth dev)

Open questions before Day 1

  1. MinIO instance — reuse the existing one at localhost:9000 that lakehouse uses (shared dev box) or stand up a fresh one with a separate bucket prefix?
  2. /etc/lakehouse/secrets.toml — share the lakehouse repo's secrets file or create /etc/golangLAKEHOUSE/secrets.toml?
  3. Workers CSV source — derive from workers_500k.parquet (round- trip back to CSV) or use workers_500k_v9.csv if it exists?

These are ops calls, not architecture. Answer when D0 is being executed.


Self-review — independent pass via gateway overseer

Reviewer: opencode/claude-opus-4-7 via localhost:3100/v1/chat (the same path the production overseer correction loop uses post-G0 in the Rust era). Run on the original draft before any of the inline fixes above were applied. Findings dispositioned below.

BLOCK — both real, both fixed inline

# Finding Disposition Fix location
B1 apt install build-essential alone won't satisfy the cgo link step for duckdb-go/v2 Fixed — D0.6 now runs a smoke go install against an empty module on D0 to flush platform issues here, not on D5 D0.6
B2 DuckDB session needs S3 credentials (CREATE SECRET) plumbed from SecretsProvider; "load httpfs" alone leaves auth unwired Fixed — D5.1 now calls CREATE SECRET (TYPE S3, ...) on connection open, populated from internal/secrets/provider.go D5.1

WARN — 4 of 5 fixed inline; 1 deferred

# Finding Disposition Fix location
W1 Two-phase write (storaged → catalogd register) leaves orphan Parquets on partial failure; no GC story Deferred — punted to G2 alongside multi-bucket + error journal; tracked in §Risks and D7.5 follow-up D7.5
W2 "Refresh views on each /sql call" will be the D6 perf cliff Fixed — D5.3 now uses TTL-cached views with etag invalidation against catalogd D5.3
W3 Shared internal/shared/server.go factory across heterogeneous binaries (HTTP ingress vs cgo-DB-holder) couples graceful-shutdown semantics that will need unwinding later Accepted with note — G0 keeps the simple shared factory; refactor explicitly listed as a G1+ follow-up D7.5
W4 storaged PUT/GET on a TCP port with no auth + no body cap is a footgun Fixed — D2.4 now binds 127.0.0.1 only and applies a 2 GiB MaxBytesReader cap D2.4
W5 Gateway reverse-proxy introduced cold on D6 gate day compresses risk into the deadline Fixed — D1.10 now stubs the routes returning 501; D6.1 just promotes them to real proxies D1.10 + D6.1

INFO — both fixed inline

# Finding Disposition Fix location
I1 go mod tidy before any imports is a trivial-true verification Fixed — D0.6 re-purposed for the cgo smoke; tidy verification moved to D1 acceptance D0.6 + D1 acceptance
I2 Filing ADRs after the work is done inverts the usual pattern Fixed — D7.3 reframed: ADR stubs go in at the start of D4/D5 so reviewers can object in-flight; D7.3 just finalizes them D7.3

Net change (Opus pass)

7 of 9 findings produced inline plan edits; 2 deferred to post-G0 follow-up issues (W1 orphan GC, W3 shared-server refactor) with the deferral itself documented. No findings dismissed as confabulation.


Self-review — second pass via Kimi K2.6 (cross-lineage)

Reviewer: opencode/kimi-k2.6 via the same gateway path. Run on the post-Opus-fix doc to surface what a different model lineage catches. Per the Rust-era cross-lineage rotation pattern, Kimi tends to ground on textual specifics where Opus surfaces architectural shape — both lenses are useful.

Kimi's output was discursive (pre-format thinking rather than the requested BLOCK/WARN/INFO bullets), but two concrete catches landed — both BLOCKs that Opus missed and that the Opus-pass fixes themselves introduced.

BLOCK — both real, both fixed inline

# Finding Disposition Fix location
K1 Opus's BLOCK B1 fix used go install github.com/duckdb/duckdb-go/v2@latest to verify cgo, but go install pkg@version requires a main package — duckdb-go/v2 is a library, so the command fails with "not a main package" before the cgo linker chain is exercised. The verification can pass on a broken-cgo box. Fixed — D0.6 now creates a temporary module + a 5-line main.go that imports duckdb-go/v2 and calls sql.Open("duckdb",""), then go runs it — actually compiles + executes with cgo D0.6
K2 Path mismatch: D1.10 stubs /v1/ingest and /v1/sql on the gateway, but D4 has ingestd serving /ingest and D5 has queryd serving /sql. D6.1's httputil.NewSingleHostReverseProxy preserves the inbound path by default, so the proxy would forward to ingestd:3203/v1/ingest which doesn't exist. Smoke would fail on D6 with a 404 on the backend, not the gateway. Fixed — D6.1 now specifies a custom Director that strips the /v1 prefix before forwarding D6.1

WARN — none extracted (Kimi response truncated at this section)

The remaining response stream considered D2.4 MaxBytesReader semantics and D3 manifest registry concurrency, but cut off before producing structured findings. A third pass could be run if more lineage coverage is wanted — for now, two BLOCKs fixed is the cross-lineage delta.

Net change (Kimi pass)

2 BLOCKs landed, both fixed inline. Both were introduced by Opus-pass fixes — illustrating exactly why cross-lineage rotation matters: one model's review of the original is not the same as a different model's review of the post-fix version. The Rust auditor's Kimi/Haiku/Opus rotation captures this dynamic; today's two-pass doc review reproduces it on a much smaller scale.


Self-review — third pass via Qwen3-coder:480b (cross-lineage)

Reviewer: qwen3-coder:480b via ollama_cloud (Ollama Pro). Largest single-model open-weights coder in the fleet. Run on the post-Kimi-fix doc.

Output style note

Qwen3-coder's response was 80% approval of prior fixes (citing B1/B2/K1/K2/I2 as "NEW BLOCKs/INFOs" — actually a misuse of the NEW label, but useful as triangulation: three independent lineages now agree the prior fixes are correct). Only 2 genuinely new WARNs.

WARN — 1 fixed inline, 1 deferred

# Finding Disposition Fix location
Q1 D2.4 MaxBytesReader at 2 GiB still permits memory exhaustion under concurrent uploads — N concurrent × 2 GiB blows the box Fixed — D2.4 cap reduced to 256 MiB per request, plus a 4-slot semaphore on concurrent in-flight PUTs (503 + Retry-After when full) D2.4
Q2 D5.3 view refresh has TTL + etag invalidation but no batching, so bursty queries against many datasets can re-issue catalog reads redundantly Deferred — minor under G0's single-tenant load; revisit in G2 alongside the multi-bucket / profile work that creates more catalog churn D7.5

Net change (Qwen pass)

1 fixed inline, 1 deferred. The mislabeled "approval" output was net positive — it's the closest thing to a triangulation signal we get without a fourth model. With Opus + Kimi + Qwen all confirming the prior fixes hold, the plan is unusually well-cross-checked for a greenfield kickoff.

Cumulative disposition across all 3 lineages

Pass Reviewer New findings Fixed Deferred
1 opencode/claude-opus-4-7 9 7 2
2 opencode/kimi-k2.6 2 2 0
3 qwen3-coder:480b (ollama_cloud) 2 1 1
4 runtime smoke (D1 actual launch) 1 1 0
Total 14 11 3

The 4th pass — runtime smoke — caught the only thing three doc-review lineages couldn't: port 3100 was already occupied by the live Rust lakehouse. Documented and dispositioned same pattern as the model findings.

Three deferrals, all to G2: orphan GC on two-phase write (W1), shared-server refactor for cgo-handle services (W3), batched view refresh (Q2). Tracked in D7.5.


D1 — actual run results (2026-04-28)

Phase G0 Day 1 executed end-to-end. Output of scripts/d1_smoke.sh:

[d1-smoke] building...
[d1-smoke] launching...
[d1-smoke] /health probes:
  ✓ gateway (:3110) → {"status":"ok","service":"gateway"}
  ✓ storaged (:3211) → {"status":"ok","service":"storaged"}
  ✓ catalogd (:3212) → {"status":"ok","service":"catalogd"}
  ✓ ingestd (:3213) → {"status":"ok","service":"ingestd"}
  ✓ queryd (:3214) → {"status":"ok","service":"queryd"}
[d1-smoke] gateway 501 stub probes:
  ✓ POST /v1/ingest → 501 + X-Lakehouse-Stub: g0
  ✓ POST /v1/sql → 501 + X-Lakehouse-Stub: g0
[d1-smoke] D1 acceptance gate: PASSED

What landed:

  • internal/shared/server.go — chi factory, slog JSON logging, /health, graceful shutdown via signal.NotifyContext
  • internal/shared/config.go — TOML loader with DefaultConfig() and env-override-via-flag pattern (-config flag)
  • cmd/{gateway,storaged,catalogd,ingestd,queryd}/main.go — five binaries, each ~30 lines, all using the shared factory
  • lakehouse.toml — G0 dev config with the shifted 3110+ ports
  • scripts/d1_smoke.sh — repeatable smoke test, exits 0 on PASS
  • bin/{gateway,storaged,catalogd,ingestd,queryd} — built binaries (~9.7 MB each, no debug stripping)

What G0 didn't need but I added anyway (intentional):

  • Gateway already has the D1.10 stub routes wired; D6.1 is just swap-the-handler.
  • TOML config supports an S3 section even though storaged doesn't consume it until D2 — saves a config schema bump on D2.

Next: D2 — storaged's actual S3 GET/PUT/LIST routes against MinIO.


D1 — code scrum review (3-lineage parallel pass)

After D1 shipped, the actual code (not docs) was reviewed by all three lineages running in parallel — the cross-lineage discipline applied to the implementation, not just the plan.

Pass Reviewer Latency Findings
1 opencode/claude-opus-4-7 24.9s 2 BLOCK + 4 WARN + 2 INFO = 8
2 opencode/kimi-k2.6 18.3s discursive output; 1 effective WARN extracted
3 qwen3-coder:480b (ollama_cloud) 62.0s 2 BLOCK + 2 WARN + 1 INFO = 5

Convergent findings (caught by ≥2 reviewers — high confidence)

# Severity Finding Reviewers Disposition
C1 BLOCK Run() errCh/select race: fast bind error can be silently dropped Opus + Qwen Fixednet.Listen() is now called synchronously before the goroutine; bind errors return as Run's return value before any select
C2 BLOCK sleep 0.5 in d1_smoke.sh races bind on cold-start boxes Opus + Qwen Fixed — replaced with poll_health() loop, 5s budget per service, 50ms poll interval
C3 WARN LoadConfig silent fallback when file missing Opus + Qwen Fixed — emits slog.Warn with path + hint when path was given but file absent

Single-reviewer findings (lineage-specific catches)

# Severity Finding Reviewer Disposition
S1 WARN slog.SetDefault mutates global state from a library function Kimi FixedRun() no longer calls SetDefault; logger is constructed locally and passed to middleware
S2 WARN os.IsNotExisterrors.Is(err, fs.ErrNotExist) idiom Opus Fixed
S3 WARN secrets in lakehouse.toml — no .gitignore / env path Opus Already planned — D2.3 introduces SecretsProvider; D1 fields are empty strings
S4 INFO 5x near-identical cmd/*/main.go — could share a Main() helper Opus Accepted — defer until D2 surfaces the real per-service config consumption pattern
S5 INFO /health logs at Info — k8s liveness probes will dominate volume Opus Accepted — defer post-G0; G0 isn't on k8s yet
S6 WARN smoke double-curl per route doubles load + creates state window Opus Fixed — single curl -i parsed once for both code + headers

Second-pass review (post-fix code, Opus only)

After applying the first-round fixes, a second-pass Opus review caught 1 self-downgraded BLOCK + 4 WARN + 2 INFO. Of those, only one was a real new actionable: head -1 on curl -i is fragile against 1xx interim lines. Fixed — switched to awk '/^HTTP\//{code=$2} END{print code}' which picks the last status line (robust to 100 Continue etc).

The other 5 second-pass findings were:

  • Theoretical (clean-exit-without-Shutdown path that doesn't trigger today)
  • Aspirational (defensive defer ln.Close() for hypothetical future code)
  • Caught downstream (poll_health not identity-checking — but the followup probe IS identity-checked)
  • Stylistic (config.go's slog.Warn lands on the global default — but config loads BEFORE the service-level logger exists; accepting)
  • Aspirational (newListener "for testability" indirection isn't needed yet)

Cumulative D1 disposition

  • Pass 1 (3 lineages parallel): 14 distinct findings (3 convergent + 11 single-reviewer; some overlap with prior doc-review findings)
  • Pass 2 (Opus second-pass on post-fix code): 7 findings, 1 actionable fixed, 6 dispositioned
  • Total fixed in D1 code review: 7 · accepted-with-rationale: 5 · deferred: 0
  • Build clean (go build ./cmd/... exit 0), go vet ./... clean, smoke PASS after every fix round.

D1 ships harder than it would have without the scrum: bind-error handling is now race-free, smoke is deterministic, log volume on /health is acknowledged, secrets handling has a flagged path forward.


D2 — actual run results (2026-04-28 evening)

Phase G0 Day 2 executed end-to-end. Output of scripts/d2_smoke.sh:

[d2-smoke] PUT round-trip:
  ✓ PUT d2-smoke/<ts>.bin → 200
[d2-smoke] GET echoes bytes:
  ✓ GET d2-smoke/<ts>.bin → bytes match
[d2-smoke] LIST includes key:
  ✓ LIST prefix=d2-smoke/ → contains d2-smoke/<ts>.bin
[d2-smoke] DELETE then GET → 404:
  ✓ DELETE then GET → 404
[d2-smoke] 256 MiB cap → 413:
  ✓ PUT 257 MiB → 413
[d2-smoke] semaphore: 5th concurrent PUT → 503 + Retry-After:5
  ✓ 5th concurrent PUT → 503 + Retry-After: 5
[d2-smoke] D2 acceptance gate: PASSED

What landed:

  • internal/secrets/provider.goProvider interface, FileProvider reading /etc/lakehouse/secrets-go.toml with inline-fallback for G0 dev convenience, StaticProvider test helper
  • internal/storaged/bucket.goaws-sdk-go-v2/service/s3 wrapper: Get/Put/List/Delete; manager.Uploader for multipart-streaming PUT; MaxListResults=10_000 cap with ...truncated... sentinel
  • internal/storaged/registry.goBucketRegistry (single bucket in G0; G2 multi-bucket federation extends this)
  • cmd/storaged/main.go — verb-paths GET/PUT/LIST/DELETE, strict validateKey (rejects empty, >1024B, NUL, leading-/, .., CR/LF/tab), Content-Length up-front 413, MaxBytesReader 256 MiB body cap, 4-slot non-blocking semaphore (503+Retry-After:5)
  • scripts/d2_smoke.sh — 6 acceptance probes; uses curl -T --limit-rate for true streaming uploads (--data-binary @- first attempt buffered client-side, semaphore never engaged)
  • Per-package unit tests for provider + registry + validateKey
  • New bucket lakehouse-go-primary on the existing MinIO at :9000, isolated from the Rust lakehouse bucket during coexistence

Out of spec but added: DELETE was exposed at the HTTP layer (the kickoff D2.4 listed only GET/PUT/LIST in routes; bucket.Delete was in the wrapper). J approved option A (DELETE exposed) for symmetry

  • smoke ergonomics.

validateKey policy: J approved the strict stance — leading /, .. components, and CR/LF/tab control characters all rejected at the HTTP boundary. Costs ~5 lines, propagates safety to every downstream consumer.

Next: D3 — catalogd Parquet manifests with idempotent register.


D2 — code scrum review (3-lineage parallel pass)

After D2 shipped, the actual code went through the same 3-lineage parallel scrum as D1.

Pass Reviewer Latency Findings
1 opencode/claude-opus-4-7 ~30s 1 BLOCK + 3 WARN + 3 INFO = 7
2 openrouter/qwen/qwen3-coder ~32s 2 BLOCK + 1 WARN + 1 INFO = 4
3a opencode/kimi-k2.6 (max_tokens=4096) 18s discursive — finish_reason=length, no structured output (model spent budget thinking, never reached BLOCK/WARN format). opencode rejected max_tokens>4096 without stream=true
3b kimi/kimi-k2-turbo (direct adapter) 124s empty content, finish_reason=length (8192 reasoning tokens, no surfaced output)
3c openrouter/moonshotai/kimi-k2-0905 33s 1 BLOCK + 2 WARN + 1 INFO = 4 (used as the K-lineage representative)

The Kimi route shopping (3a → 3c) was a process finding worth recording: opencode caps non-streaming Kimi calls at 4096 output tokens, the direct kimi.com adapter consumed 8192 tokens of reasoning but surfaced empty content, and openrouter's moonshotai/kimi-k2-0905 route delivered structured output in ~33s. For future scrums on Kimi, default to openrouter/moonshotai/kimi-k2-0905.

Convergent findings (≥2 reviewers — high confidence)

# Severity Finding Reviewers Disposition
C1 BLOCK buildRegistry defer cancel() cancels the ctx that the AWS SDK was loaded with — fine today (static creds, sync call) but breaks future credential refresh chains (EC2 IMDS, SSO, AssumeRole) Opus + Kimi Fixed — switched to context.Background() for SDK construction. Per-request lifetimes flow through r.Context()
C2 WARN→BLOCK *http.MaxBytesError may not unwrap through manager.Uploader's multipart goroutines for bodies >5 MiB; smoke covers 257 MiB single-PutObject path only — bodies in the multipart range that exceed 256 MiB could surface as 500 instead of 413 Opus (WARN) + Kimi (BLOCK) Fixed — added Content-Length up-front 413 (deterministic for honest clients) + string-suffix fallback ("http: request body too large") on the Put error path for chunked / lying-CL cases
C3 INFO→WARN Bucket.List accumulates unbounded results into a slice; stray prefix="" against a large bucket OOMs the daemon Opus (INFO) + Kimi (WARN) FixedMaxListResults=10_000 cap; truncated responses append a sentinel ObjectInfo{Key: "...truncated..."} so callers see they didn't get everything

Single-reviewer findings (lineage-specific catches)

# Severity Finding Reviewer Disposition
S1 WARN PUT 200 response missing Content-Type: application/json — Go's content-sniffing won't catch the JSON prefix, clients may see text/plain Opus Fixed — explicit header set before WriteHeader
S2 WARN Bucket.Get body close ordering fragile — currently correct but the contract is implicit Opus Accepted — current order is correct (early returns on errors where body is nil); no fix required
S3 INFO extractKey uses chi * wildcard — works correctly, just noting empty-key path is covered by validateKey Opus Accepted — already covered
S4 INFO FileProvider.mu sync.RWMutex is unused given the immutable-after-construction design Opus Accepted — drop the mutex when SIGHUP reload lands in G1 (S6 below); keep it for now as a placeholder
S5 INFO Bucket.Delete doesn't translate not-found to ErrKeyNotFound (Kimi noted, not flagged) Accepted — S3 DeleteObject is idempotent by spec; non-error on missing key is the correct behavior
S6 INFO FileProvider never reloads (no SIGHUP handler) Kimi Deferred to G1 — reload-on-SIGHUP is on the G1 list per internal/shared/server.go opening comment
S7 INFO Error messages on no-creds-found could specify which source (file vs fallback) was missing Qwen Deferred — minor debug ergonomics, no production impact
F1 BLOCK "Body close happens after semaphore release" Qwen Dismissed — false. Defer order is LIFO; r.Body.Close() was registered AFTER <-h.putSem, so it fires FIRST. Body closes before slot frees
F2 BLOCK "ObjectInfo fields can be nil-dereferenced in List" Qwen Dismissed — false. bucket.go:147-160 checks if o.Key != nil etc. before dereferencing every field
F3 WARN "MaxBytesReader not applied to semaphore-protected path" Qwen Dismissed — false. Semaphore is non-blocking try-acquire (select { default }); there is no waiting state where pre-cap MaxBytesReader matters

Cumulative D2 disposition

  • 3-lineage parallel pass: 15 distinct findings (3 convergent + 7 single-reviewer real + 3 false + 2 absorbed by other findings)
  • Fixed: 5 (3 convergent + 2 single-reviewer Opus)
  • Accepted-with-rationale: 5
  • Deferred to G1+: 2
  • Dismissed (false positives): 3 (all from Qwen — its strength on D1 doesn't reproduce on D2 code; the lineage caught real C1/C2 issues there but misread defer order + nil-checks here)
  • Build clean, vet clean, all tests pass, smoke 6/6 PASS after every fix round.

The Kimi/Opus convergence on C1 (ctx cancel) is the highest-signal finding of the round: two completely different lineages flagged the same architectural footgun. C2 (MaxBytesReader unwrap) was the most consequential — the smoke test would have stayed green while production multi-MiB uploads silently returned 500 on oversize. C3 was a latent OOM that's a 5-line fix.

The Qwen lineage delivered three false BLOCKs on D2 — different from its D1 contribution where it caught two real BLOCKs that Opus missed. Lineage rotation is real; on a given PR, one lineage may be the only one finding bugs and another may be confidently wrong. The convergence filter (≥2 reviewers) is the right gate.