golangLAKEHOUSE

Author	SHA1	Message	Date
root	fa56134b90	ADR-003 wiring: Bearer token + IP allowlist middleware Implements the auth posture from ADR-003 (commit 0d18ffa). Two independent layers — Bearer token (constant-time compare via crypto/subtle) and IP allowlist (CIDR set) — composed in shared.Run so every binary inherits the same gate without per-binary wiring. Together with the bind-gate from commit 6af0520, this mechanically closes audit risks R-001 + R-007: - non-loopback bind without auth.token = startup refuse - non-loopback bind WITH auth.token + override env = allowed - loopback bind = all gates open (G0 dev unchanged) internal/shared/auth.go (NEW) RequireAuth(cfg AuthConfig) returns chi-compatible middleware. Empty Token + empty AllowedIPs → pass-through (G0 dev mode). Token-only → 401 Bearer mismatch. AllowedIPs-only → 403 source IP not in CIDR set. Both → both gates apply. /health bypasses both layers (load-balancer / liveness probes shouldn't carry tokens). CIDR parsing pre-runs at boot; bare IP (no /N) treated as /32 (or /128 for IPv6). Invalid entries log warn and drop, fail-loud-but- not-fatal so a typo doesn't kill the binary. Token comparison: subtle.ConstantTimeCompare on the full "Bearer <token>" wire-format string. Length-mismatch returns 0 (per stdlib spec), so wrong-length tokens reject without timing leak. Pre-encoded comparison slice stored in the middleware closure — one allocation per request. Source-IP extraction prefers net.SplitHostPort fallback to RemoteAddr-as-is for httptest compatibility. X-Forwarded-For support is a follow-up when a trusted proxy fronts the gateway (config knob TBD per ADR-003 §"Future"). internal/shared/server.go Run signature: gained AuthConfig parameter (4th arg). /health stays mounted on the outer router (public). Registered routes go inside chi.Group with RequireAuth applied — empty config = transparent group. Added requireAuthOnNonLoopback startup check: non-loopback bind with empty Token = refuse to start (cites R-001 + R-007 by name). internal/shared/config.go AuthConfig type added with TOML tags. Fields: Token, AllowedIPs. Composed into Config under [auth]. cmd/<svc>/main.go × 7 (catalogd, embedd, gateway, ingestd, queryd, storaged, vectord, mcpd is unaffected — stdio doesn't bind a port) Each call site adds cfg.Auth as the 4th arg to shared.Run. No other changes — middleware applies via shared.Run uniformly. internal/shared/auth_test.go (12 test funcs) Empty config pass-through, missing-token 401, wrong-token 401, correct-token 200, raw-token-without-Bearer-prefix 401, /health always public, IP allowlist allow + reject, bare IP /32, both layers when both configured, invalid CIDR drop-with-warn, RemoteAddr shape extraction. The constant-time comparison is verified by inspection (comments in auth.go) plus the existence of the passthrough test (length-mismatch case). Verified: go test -count=1 ./internal/shared/ — all green (was 21, now 33 funcs) just verify — vet + test + 9 smokes 33s just proof contract — 53/0/1 unchanged Smokes + proof harness keep working without any token configuration: default Auth is empty struct → middleware is no-op → existing tests pass unchanged. To exercise the gate, operators set [auth].token in lakehouse.toml (or, per the "future" note in the ADR, via env var). Closes audit findings: R-001 HIGH — fully mechanically closed (was: partial via bind gate) R-007 MED — fully mechanically closed (was: design-only ADR-003) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:11:34 -05:00
root	423a3817c5	D: storaged per-prefix PUT cap — vectord _vectors/ → 4 GiB Closes the documented 500K-test gap (memory project_golang_lakehouse: "storaged 256 MiB PUT cap blocks single-file LHV1 persistence above ~150K vectors at d=768"). Vectord persistence under "_vectors/" now gets a 4 GiB cap; everything else (parquets, manifests, ingest) keeps the 256 MiB default. Why per-prefix and not "raise globally": - 256 MiB cap is a real DoS protection — runaway clients can't drain the daemon. Raising it for ALL traffic would expand the attack surface for routine paths that have no need. - Per-prefix preserves existing protection while opening the one documented production-scale path. Why not split LHV1 across multiple keys (the alternative): - G1P shipped a single-Put framed format SPECIFICALLY to eliminate the torn-write class (memory: "Single Put eliminates the torn- write class that the 3-way convergent scrum finding identified"). - Multi-key LHV1 would re-introduce the half-saved-state failure mode we just paid to fix. Streaming via existing manager.Uploader is the better architectural answer. Why not bump the cap operationally via env/config: - Future operator-driven cap can drop in cleanly via the maxPutBytesFor function. Started with hardcoded 4 GiB to keep this commit small; config knob is a follow-up if production workloads diverge from the documented 500K-vector ceiling. manager.Uploader is already streaming-multipart on the outbound S3 side; the inbound MaxBytesReader cap is a safety gate, not a memory bottleneck. So raising it for vectord just lets the existing streaming path actually flow, without introducing new memory pressure (4-slot semaphore × 4 GiB worst case = 16 GiB only if all slots simultaneously max out — vanishingly unlikely). Implementation: cmd/storaged/main.go: new constant maxPutBytesVectors = 4 GiB (covers >700K vectors @ d=768) new constant vectorsPrefix = "_vectors/" (synced with vectord.VectorPrefix) new function maxPutBytesFor(key) → cap-by-prefix handlePut: ContentLength check + MaxBytesReader use the per-key cap cmd/storaged/main_test.go (3 new test funcs): TestMaxPutBytesFor: 7 cases incl. nested prefix, substring-but-not- prefix, empty key, parquet/manifest paths. TestVectorPrefixSyncWithVectord: regression test that asserts vectorsPrefix == vectord.VectorPrefix. A future rename surfaces here instead of silently bypassing the larger cap. TestVectorCapAccommodates500KStaffingTest: bounds the cap above the documented production workload (~700 MiB conservative). Verified: go test ./cmd/storaged/ — all green (was 1 func, now 4) just verify — 9 smokes still pass · 32s wall just proof contract — 53/0/1 unchanged Out of scope for this commit (deserves its own): - Heavy integration smoke: 200K dim=768 synthetic vectors → ~700 MiB LHV1 → kill+restart vectord → recall=1. ~5-10 min wall; follow-up if you want production-scale persistence verified end-to-end. Unit tests + existing g1p_smoke cover the wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:00:09 -05:00
root	8cfcdb8e5f	G0 D2: storaged S3 GET/PUT/LIST/DELETE · 3-lineage scrum · 4 fixes applied Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length up-front 413, and a 4-slot non-blocking semaphore returning 503 + Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against the dedicated MinIO bucket lakehouse-go-primary, isolated from the Rust system's lakehouse bucket during coexistence. Cross-lineage scrum on the shipped code: - Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives) - Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k cap and the direct adapter's empty-content reasoning bug): 1 BLOCK + 2 WARN + 1 INFO Fixed: C1 buildRegistry ctx cancel footgun → context.Background() (Opus + Kimi convergent; future credential refresh chains) C2 MaxBytesReader unwrap through manager.Uploader multipart goroutines → Content-Length up-front 413 + string-suffix fallback (Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range) C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap (Opus + Kimi convergent; OOM guard) S1 PUT response Content-Type: application/json (Opus single-reviewer) Strict validateKey policy (J approved): rejects empty, >1024B, NUL, leading "/", ".." path components, CR/LF/tab control characters. DELETE exposed at HTTP layer (J approved option A) for symmetry + smoke ergonomics. Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2). Process finding worth recording: opencode caps non-streaming Kimi at max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905 delivered structured output in ~33s. Future Kimi scrums should default to that route. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:23:03 -05:00
Claw	1142f54f23	G0 D1 ships: skeleton + chi + /health × 5 binaries · acceptance gate PASSED Phase G0 Day 1 executed end-to-end after a third-pass review by qwen3-coder:480b consolidated all findings across Opus/Kimi/Qwen lineages. Cross-lineage review consolidation (3 model passes + 1 runtime pass): - Opus 4.7: 9 findings · 7 fixed inline · 2 deferred - Kimi K2.6: 2 BLOCKs (introduced by Opus fixes) · 2 fixed - Qwen3-coder:480b: 2 WARNs · 1 fixed (D2.4 256 MiB cap + 4-slot semaphore on PUTs) · 1 deferred (Q2 view refresh batching) - Runtime smoke: 1 finding (port 3100 collision with live Rust lakehouse) · fixed (Go dev ports shifted to 3110+) - Total: 14 findings · 11 fixed · 3 deferred to G2 What landed in code: - internal/shared/server.go — chi factory, slog JSON, /health, graceful shutdown via signal.NotifyContext - internal/shared/config.go — TOML loader, DefaultConfig, -config flag - cmd/{gateway,storaged,catalogd,ingestd,queryd}/main.go — five binaries, each ~30 lines using the shared factory - lakehouse.toml — G0 dev defaults (3110-3214) - scripts/d1_smoke.sh — repeatable smoke that exits 0 on PASS - go.mod / go.sum — chi v5.2.5, pelletier/go-toml/v2 v2.3.0 Verified end-to-end via scripts/d1_smoke.sh: - All 5 /health endpoints return 200 with correct service name - Gateway /v1/ingest + /v1/sql stubs return 501 with X-Lakehouse-Stub - Graceful shutdown logs cleanly on SIGTERM - DuckDB cgo path verified separately (sql.Open("duckdb","") + ping) D1 ACCEPTANCE GATE: PASSED. Next: D2 — storaged S3 GET/PUT/LIST against MinIO. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 07:00:37 -05:00

4 Commits