golangLAKEHOUSE/docs/DECISIONS.md
Claw f07668064e docs: seed PRD + SPEC for the Go-direction rewrite
Two documents only — no Go code yet. PRD restates the problem and
preserves the Rust PRD's invariants verbatim, then maps the locked
stack to Go libraries and surfaces four hard problems (DuckDB-via-cgo
for the query engine, Lance dropped, Dioxus → HTMX, arrow-go maturity).
SPEC walks each Rust crate + TS surface and tags the port with library
choice / effort estimate / risk + a 5-phase migration plan from
skeleton (Phase G0) to demo parity (Phase G5).

Six open questions remain that gate Phase G0:
- DuckDB cgo OK?
- HTMX vs React for the UI?
- Repo location?
- Distillation v1.0.0 port verbatim or rebuild?
- Pathway memory data — port 88 traces or start clean?
- Auditor lineage — port audit_baselines.jsonl or restart?

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:35:23 -05:00

5.5 KiB
Raw Blame History

Architecture Decision Records — Lakehouse-Go

ADRs from the Go era. Numbered fresh from 001 to start clean lineage. Where a Rust ADR (numbered 001021 in the Rust repo's DECISIONS.md) remains in force, this file references it explicitly. Where a Rust ADR is superseded, the new ADR records why.


ADR-001: Foundational decisions for the Go rewrite

Date: 2026-04-28 Decided by: J Status: Ratified — Phase G0 unblocked

The six questions that gated Phase G0 (per PRD.md / SPEC.md §8) are all answered.

Decision 1.1 — DuckDB via cgo for the query engine

Decision: queryd uses marcboeker/go-duckdb (cgo bindings to DuckDB). Pure-Go alternative was rejected.

Rationale: DuckDB reads Parquet natively, supports the SQL surface DataFusion exposed in the Rust era (CTEs, window functions, hybrid joins), and runs in-process with cgo. The alternatives were:

  • Hand-rolling a query planner over arrow-go RecordBatches — multi-engineer-month research project; high risk of correctness bugs.
  • Running DuckDB as an external process — adds an operational surface and a network hop to every query.

Cgo build complexity is the accepted cost. Single-binary deploy preserved (the cgo dependency embeds at link time).

Supersedes Rust ADR-001 (object storage as source of truth) — no. That ADR remains in force; the change is the engine over the storage, not the storage model.

Decision 1.2 — HTMX for the UI

Decision: Frontend is html/template + HTMX + Alpine.js, server-rendered by cmd/gateway. React/Vite in a separate repo is the fallback if UX requirements demand SPA-tier interactivity post-G5.

Rationale: The existing Lakehouse UIs (/lakehouse/ demo + staffer console) are mostly server-rendered HTML with vanilla JS that already fits the HTMX style. Single-binary deploy is preserved (gateway serves templates + static assets). No build chain beyond go build.

The React fallback is named explicitly so it's not relitigated unless an actual UX requirement triggers it.

Decision 1.3 — Gitea hosts the new repo

Decision: Repo lives at git.agentview.dev/profit/golangLAKEHOUSE (same Gitea server that hosts the Rust lakehouse).

Rationale: Single source of truth for repo hosting; existing auditor tooling (lakehouse-auditor systemd service) already speaks Gitea API; existing credentials work; no new ops surface.

Decision 1.4 — Distillation rebuilt in Go, not ported verbatim

Decision: The distillation v1.0.0 substrate (tag distillation-v1.0.0 at e7636f2 in the Rust repo) is not bit-identical-ported. The Go reimplementation:

  • Ports the LOGIC: SFT export pipeline, contamination firewall (the quality_score enum + SFT_NEVER constant), category mapping rules, audit-baselines append-only pattern.
  • Does NOT port the FIXTURES: tests/fixtures/distillation/acceptance/ is rebuilt from scratch in Go with new ground-truth golden files.
  • Does NOT port the bit-identical reproducibility PROPERTY: that was measured against the Rust implementation. The Go implementation establishes its own reproducibility baseline.

Rationale: Bit-identical reproducibility was a measured property of a specific implementation, not a portable invariant. Re-establishing it in Go means new fixtures, new gates, new audit-baselines. This is honest about what's transferring (logic) versus what's a Rust-era artifact (the specific bit-identical hashes).

Risk: the contamination firewall is the most consequential distillation safety net. The port must be reviewed line-by-line, and the new Go fixtures must include adversarial cases that prove the firewall works in the new implementation. See SPEC §7 acceptance gates.

Decision 1.5 — Pathway memory starts clean; old traces preserved as reference

Decision: Go pathway memory begins with zero traces. The existing 88 Rust traces at /home/profit/lakehouse/data/_pathway_memory/state.json are NOT loaded into the Go implementation. They are preserved as a historical record in the Rust repo and documented at docs/RUST_PATHWAY_MEMORY_NOTE.md.

Rationale: The Rust pathway memory's value compounded over months of scrum cycles. Loading those traces into a Go implementation that hasn't proven its byte-matching contract risks corrupting the new substrate's signal with semantically-mismatched data. Starting clean keeps the Go pathway memory's lineage clean and lets the byte-match correctness be proven on a known input (per SPEC §3.4 G3.4.B).

The historical note records the 88 traces' value (11/11 successful replays at the time of freeze) so the Go implementation has a reference baseline to outperform.

Decision 1.6 — Auditor longitudinal signal restarts

Decision: The Rust auditor's audit_baselines.jsonl (longitudinal drift signal accumulated across PRs #6#13) is not ported to Go. The Go auditor begins a fresh audit_baselines.jsonl lineage on its first PR.

Rationale: The drift signal is anchored to specific Rust commits, verdict shapes, and Kimi/Haiku/Opus rotation traces. Carrying it into the Go era would be like grafting Rust-PR audit history onto the first Go PR's prologue — confusing more than informative. Restarting gives the Go auditor a clean baseline to measure drift against.

The existing Rust audit_baselines.jsonl stays in the Rust repo as a historical record.


(Future ADRs from ADR-002 onward will be added as the Go implementation accrues design decisions — e.g. HNSW parameter choices, pathway-memory hash function, auditor model rotation, etc.)