# Architecture Decision Records — Lakehouse-Go ADRs from the Go era. Numbered fresh from 001 to start clean lineage. Where a Rust ADR (numbered 001–021 in the Rust repo's `DECISIONS.md`) remains in force, this file references it explicitly. Where a Rust ADR is superseded, the new ADR records why. --- ## ADR-001: Foundational decisions for the Go rewrite **Date:** 2026-04-28 **Decided by:** J **Status:** Ratified — Phase G0 unblocked The six questions that gated Phase G0 (per PRD.md / SPEC.md §8) are all answered. ### Decision 1.1 — DuckDB via cgo for the query engine **Decision:** `queryd` uses `marcboeker/go-duckdb` (cgo bindings to DuckDB). Pure-Go alternative was rejected. **Rationale:** DuckDB reads Parquet natively, supports the SQL surface DataFusion exposed in the Rust era (CTEs, window functions, hybrid joins), and runs in-process with cgo. The alternatives were: - Hand-rolling a query planner over arrow-go RecordBatches — multi-engineer-month research project; high risk of correctness bugs. - Running DuckDB as an external process — adds an operational surface and a network hop to every query. Cgo build complexity is the accepted cost. Single-binary deploy preserved (the cgo dependency embeds at link time). **Supersedes Rust ADR-001** (object storage as source of truth) — no. That ADR remains in force; the change is the *engine* over the storage, not the storage model. ### Decision 1.2 — HTMX for the UI **Decision:** Frontend is `html/template` + HTMX + Alpine.js, server-rendered by `cmd/gateway`. React/Vite in a separate repo is the fallback if UX requirements demand SPA-tier interactivity post-G5. **Rationale:** The existing Lakehouse UIs (`/lakehouse/` demo + staffer console) are mostly server-rendered HTML with vanilla JS that already fits the HTMX style. Single-binary deploy is preserved (gateway serves templates + static assets). No build chain beyond `go build`. The React fallback is named explicitly so it's not relitigated unless an actual UX requirement triggers it. ### Decision 1.3 — Gitea hosts the new repo **Decision:** Repo lives at `git.agentview.dev/profit/golangLAKEHOUSE` (same Gitea server that hosts the Rust lakehouse). **Rationale:** Single source of truth for repo hosting; existing auditor tooling (`lakehouse-auditor` systemd service) already speaks Gitea API; existing credentials work; no new ops surface. ### Decision 1.4 — Distillation rebuilt in Go, not ported verbatim **Decision:** The distillation v1.0.0 substrate (`tag distillation-v1.0.0` at `e7636f2` in the Rust repo) is **not** bit-identical-ported. The Go reimplementation: - Ports the LOGIC: SFT export pipeline, contamination firewall (the `quality_score` enum + `SFT_NEVER` constant), category mapping rules, audit-baselines append-only pattern. - Does NOT port the FIXTURES: `tests/fixtures/distillation/acceptance/` is rebuilt from scratch in Go with new ground-truth golden files. - Does NOT port the bit-identical reproducibility PROPERTY: that was measured against the Rust implementation. The Go implementation establishes its own reproducibility baseline. **Rationale:** Bit-identical reproducibility was a measured property of a specific implementation, not a portable invariant. Re-establishing it in Go means new fixtures, new gates, new audit-baselines. This is honest about what's transferring (logic) versus what's a Rust-era artifact (the specific bit-identical hashes). **Risk:** the contamination firewall is the most consequential distillation safety net. The port must be reviewed line-by-line, and the new Go fixtures must include adversarial cases that prove the firewall works in the new implementation. See SPEC §7 acceptance gates. ### Decision 1.5 — Pathway memory starts clean; old traces preserved as reference **Decision:** Go pathway memory begins with zero traces. The existing 88 Rust traces at `/home/profit/lakehouse/data/_pathway_memory/state.json` are NOT loaded into the Go implementation. They are preserved as a historical record in the Rust repo and documented at `docs/RUST_PATHWAY_MEMORY_NOTE.md`. **Rationale:** The Rust pathway memory's value compounded over months of scrum cycles. Loading those traces into a Go implementation that hasn't proven its byte-matching contract risks corrupting the new substrate's signal with semantically-mismatched data. Starting clean keeps the Go pathway memory's lineage clean and lets the byte-match correctness be proven on a known input (per SPEC §3.4 G3.4.B). The historical note records the 88 traces' value (11/11 successful replays at the time of freeze) so the Go implementation has a reference baseline to outperform. ### Decision 1.6 — Auditor longitudinal signal restarts **Decision:** The Rust auditor's `audit_baselines.jsonl` (longitudinal drift signal accumulated across PRs #6–#13) is **not** ported to Go. The Go auditor begins a fresh `audit_baselines.jsonl` lineage on its first PR. **Rationale:** The drift signal is anchored to specific Rust commits, verdict shapes, and Kimi/Haiku/Opus rotation traces. Carrying it into the Go era would be like grafting Rust-PR audit history onto the first Go PR's prologue — confusing more than informative. Restarting gives the Go auditor a clean baseline to measure drift against. The existing Rust `audit_baselines.jsonl` stays in the Rust repo as a historical record. --- ## ADR-002: storaged per-prefix PUT cap (vectord _vectors/ → 4 GiB) **Date:** 2026-04-29 **Decided by:** J **Status:** Implemented (commit `423a381`) `storaged` enforces a 256 MiB per-PUT body cap as DoS protection (`MaxBytesReader` + Content-Length check). Keys under `_vectors/` (vectord LHV1 persistence) get a raised cap of 4 GiB; everything else stays at 256 MiB. **Rationale:** the 500K staffing test surfaced that single-file LHV1 above ~150K vectors at d=768 hits the 256 MiB cap. `manager.Uploader` already streams on the outbound side, so the cap is a safety gate not a memory bottleneck — raising it for the vector path doesn't introduce new memory pressure. Per-prefix preserves the safety gate for routine traffic while opening the documented production path. Splitting LHV1 across multiple keys was rejected because G1P specifically shipped the single-Put framed format to eliminate torn-write — multi-key would re-introduce that failure mode. **Follow-up:** if production workloads exceed 4 GiB single-file LHV1, refactor to operator-driven config (env/TOML) rather than bumping the constant. The function-level `maxPutBytesFor(key)` in `cmd/storaged/main.go` keeps that drop-in clean. --- ## ADR-003: Inter-service auth posture — Bearer token + IP allowlist **Date:** 2026-04-29 **Decided by:** J + Claude **Status:** Decided — wiring deferred to Sprint 1 **Decision:** When inter-service auth is needed (the moment any binary binds non-loopback or the deployment crosses a trust boundary), the auth model is **a Bearer token loaded from `secrets-go.toml` plus a configurable IP allowlist**. Both layers required: the token authenticates the caller; the allowlist narrows the network surface. **Status today (G0):** zero auth middleware. Every binary binds `127.0.0.1` by default; commit `6af0520` (R-001 partial fix) refuses non-loopback bind unless the per-service `LH__ALLOW_NONLOOPBACK=1` env override is set. The override-and-no-auth combination is the worst case — this ADR locks in what we'll require before any production override fires. ### What gets implemented when auth lands 1. **`secrets-go.toml` adds a `[auth]` section:** ```toml [auth] token = "..." # 32+ random bytes, hex-encoded allowed_ips = ["10.0.0.0/8", "127.0.0.1/32"] # CIDR list ``` 2. **`internal/shared/auth.go`** ships a single chi middleware: ```go func RequireAuth(cfg AuthConfig) func(http.Handler) http.Handler ``` - Empty `cfg.Token` → middleware is a no-op (G0 dev mode). - Non-empty token → reject 401 unless request has `Authorization: Bearer ` matching constant-time. - Non-empty `allowed_ips` → reject 403 unless `r.RemoteAddr` (or `X-Forwarded-For` first hop, configurable) is in CIDR set. - `/health` exempt — load balancers + monitors need it open. 3. **Every `cmd//main.go` adds one line:** ```go r.Use(shared.RequireAuth(cfg.Auth)) ``` Mounted before `register(r)` so it covers every route the binary exposes after `/health`. 4. **`shared.Run` startup gate:** if bind is non-loopback AND `cfg.Auth.Token == ""`, refuse to start. The implicit "localhost is the auth layer" guarantee becomes explicit when crossing the loopback boundary. ### Alternatives considered | Option | Why rejected | |---|---| | **mTLS** | Strongest but heaviest — every binary needs cert provisioning, rotation tooling, and cert-aware client wiring. Overkill for inter-service traffic that already passes through a single gateway. Reconsider when Lakehouse-Go runs across machines. | | **JWT with short TTL** | Buys nothing over Bearer here — there's no third-party identity provider, no claim hierarchy worth modelling. Pure token has the same security properties at half the wire complexity. | | **No auth, IP-allowlist only** | One stolen IP allowlist entry → full access. Token + IP is defense in depth; either alone is too weak. | | **OAuth2 via external IdP** | Rejected for G0–G3 timeline. No external IdP commitment. Revisit if Lakehouse-Go ever serves end-user requests directly (today everything fronts through the staffing co-pilot which has its own session model). | ### Constant-time comparison + token hygiene Token comparison must use `crypto/subtle.ConstantTimeCompare` — naive `==` is vulnerable to timing attacks against an attacker who can issue many requests and measure round-trip. Token rotation is operator-driven via `secrets-go.toml` edit + restart; G0 doesn't need rotate-without-restart. ### What this ADR does NOT do - **Does not implement the middleware.** Code lands in Sprint 1. - **Does not require token in G0 dev.** Empty token → no-op. Smokes + proof harness keep working without setting tokens. - **Does not address gateway → end-user auth.** Gateway terminates inter-service auth at its inbound; if end-users hit gateway from a browser, that's a different ADR (likely cookie/session, fronted by a reverse proxy that handles user auth). ### How this closes audit findings - **R-001 (queryd /sql RCE-equivalent off-loopback):** the bind gate prevents accidental exposure today; this ADR specifies the guardrail when intentional exposure is needed. - **R-007 (zero auth middleware):** answered by the design above; R-007 stays open until the middleware is implemented but is no longer "design TBD." - **R-010 (no CORS posture):** orthogonal to inter-service auth, but the `RequireAuth` middleware sits at the right layer to add CORS handling later (browsers don't reach inter-service routes in the current design, so CORS is also Sprint 1+ when end-user requests start landing). --- (Future ADRs from ADR-004 onward will be added as the Go implementation accrues design decisions — e.g. HNSW parameter choices, pathway-memory hash function, auditor model rotation, etc.)