Two documents only — no Go code yet. PRD restates the problem and preserves the Rust PRD's invariants verbatim, then maps the locked stack to Go libraries and surfaces four hard problems (DuckDB-via-cgo for the query engine, Lance dropped, Dioxus → HTMX, arrow-go maturity). SPEC walks each Rust crate + TS surface and tags the port with library choice / effort estimate / risk + a 5-phase migration plan from skeleton (Phase G0) to demo parity (Phase G5). Six open questions remain that gate Phase G0: - DuckDB cgo OK? - HTMX vs React for the UI? - Repo location? - Distillation v1.0.0 port verbatim or rebuild? - Pathway memory data — port 88 traces or start clean? - Auditor lineage — port audit_baselines.jsonl or restart? Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
127 lines
5.5 KiB
Markdown
127 lines
5.5 KiB
Markdown
# Architecture Decision Records — Lakehouse-Go
|
||
|
||
ADRs from the Go era. Numbered fresh from 001 to start clean lineage.
|
||
Where a Rust ADR (numbered 001–021 in the Rust repo's `DECISIONS.md`)
|
||
remains in force, this file references it explicitly. Where a Rust
|
||
ADR is superseded, the new ADR records why.
|
||
|
||
---
|
||
|
||
## ADR-001: Foundational decisions for the Go rewrite
|
||
**Date:** 2026-04-28
|
||
**Decided by:** J
|
||
**Status:** Ratified — Phase G0 unblocked
|
||
|
||
The six questions that gated Phase G0 (per PRD.md / SPEC.md §8) are
|
||
all answered.
|
||
|
||
### Decision 1.1 — DuckDB via cgo for the query engine
|
||
|
||
**Decision:** `queryd` uses `marcboeker/go-duckdb` (cgo bindings to
|
||
DuckDB). Pure-Go alternative was rejected.
|
||
|
||
**Rationale:** DuckDB reads Parquet natively, supports the SQL surface
|
||
DataFusion exposed in the Rust era (CTEs, window functions, hybrid
|
||
joins), and runs in-process with cgo. The alternatives were:
|
||
- Hand-rolling a query planner over arrow-go RecordBatches —
|
||
multi-engineer-month research project; high risk of correctness
|
||
bugs.
|
||
- Running DuckDB as an external process — adds an operational surface
|
||
and a network hop to every query.
|
||
|
||
Cgo build complexity is the accepted cost. Single-binary deploy
|
||
preserved (the cgo dependency embeds at link time).
|
||
|
||
**Supersedes Rust ADR-001** (object storage as source of truth) — no.
|
||
That ADR remains in force; the change is the *engine* over the
|
||
storage, not the storage model.
|
||
|
||
### Decision 1.2 — HTMX for the UI
|
||
|
||
**Decision:** Frontend is `html/template` + HTMX + Alpine.js,
|
||
server-rendered by `cmd/gateway`. React/Vite in a separate repo is the
|
||
fallback if UX requirements demand SPA-tier interactivity post-G5.
|
||
|
||
**Rationale:** The existing Lakehouse UIs (`/lakehouse/` demo + staffer
|
||
console) are mostly server-rendered HTML with vanilla JS that already
|
||
fits the HTMX style. Single-binary deploy is preserved (gateway serves
|
||
templates + static assets). No build chain beyond `go build`.
|
||
|
||
The React fallback is named explicitly so it's not relitigated unless
|
||
an actual UX requirement triggers it.
|
||
|
||
### Decision 1.3 — Gitea hosts the new repo
|
||
|
||
**Decision:** Repo lives at `git.agentview.dev/profit/golangLAKEHOUSE`
|
||
(same Gitea server that hosts the Rust lakehouse).
|
||
|
||
**Rationale:** Single source of truth for repo hosting; existing
|
||
auditor tooling (`lakehouse-auditor` systemd service) already speaks
|
||
Gitea API; existing credentials work; no new ops surface.
|
||
|
||
### Decision 1.4 — Distillation rebuilt in Go, not ported verbatim
|
||
|
||
**Decision:** The distillation v1.0.0 substrate (`tag
|
||
distillation-v1.0.0` at `e7636f2` in the Rust repo) is **not**
|
||
bit-identical-ported. The Go reimplementation:
|
||
- Ports the LOGIC: SFT export pipeline, contamination firewall (the
|
||
`quality_score` enum + `SFT_NEVER` constant), category mapping
|
||
rules, audit-baselines append-only pattern.
|
||
- Does NOT port the FIXTURES: `tests/fixtures/distillation/acceptance/`
|
||
is rebuilt from scratch in Go with new ground-truth golden files.
|
||
- Does NOT port the bit-identical reproducibility PROPERTY: that was
|
||
measured against the Rust implementation. The Go implementation
|
||
establishes its own reproducibility baseline.
|
||
|
||
**Rationale:** Bit-identical reproducibility was a measured property
|
||
of a specific implementation, not a portable invariant. Re-establishing
|
||
it in Go means new fixtures, new gates, new audit-baselines. This is
|
||
honest about what's transferring (logic) versus what's a Rust-era
|
||
artifact (the specific bit-identical hashes).
|
||
|
||
**Risk:** the contamination firewall is the most consequential
|
||
distillation safety net. The port must be reviewed line-by-line, and
|
||
the new Go fixtures must include adversarial cases that prove the
|
||
firewall works in the new implementation. See SPEC §7 acceptance gates.
|
||
|
||
### Decision 1.5 — Pathway memory starts clean; old traces preserved as reference
|
||
|
||
**Decision:** Go pathway memory begins with zero traces. The existing
|
||
88 Rust traces at
|
||
`/home/profit/lakehouse/data/_pathway_memory/state.json` are NOT loaded
|
||
into the Go implementation. They are preserved as a historical record
|
||
in the Rust repo and documented at `docs/RUST_PATHWAY_MEMORY_NOTE.md`.
|
||
|
||
**Rationale:** The Rust pathway memory's value compounded over months
|
||
of scrum cycles. Loading those traces into a Go implementation that
|
||
hasn't proven its byte-matching contract risks corrupting the new
|
||
substrate's signal with semantically-mismatched data. Starting clean
|
||
keeps the Go pathway memory's lineage clean and lets the byte-match
|
||
correctness be proven on a known input (per SPEC §3.4 G3.4.B).
|
||
|
||
The historical note records the 88 traces' value (11/11 successful
|
||
replays at the time of freeze) so the Go implementation has a
|
||
reference baseline to outperform.
|
||
|
||
### Decision 1.6 — Auditor longitudinal signal restarts
|
||
|
||
**Decision:** The Rust auditor's `audit_baselines.jsonl`
|
||
(longitudinal drift signal accumulated across PRs #6–#13) is **not**
|
||
ported to Go. The Go auditor begins a fresh `audit_baselines.jsonl`
|
||
lineage on its first PR.
|
||
|
||
**Rationale:** The drift signal is anchored to specific Rust commits,
|
||
verdict shapes, and Kimi/Haiku/Opus rotation traces. Carrying it into
|
||
the Go era would be like grafting Rust-PR audit history onto the first
|
||
Go PR's prologue — confusing more than informative. Restarting gives
|
||
the Go auditor a clean baseline to measure drift against.
|
||
|
||
The existing Rust `audit_baselines.jsonl` stays in the Rust repo as a
|
||
historical record.
|
||
|
||
---
|
||
|
||
(Future ADRs from ADR-002 onward will be added as the Go
|
||
implementation accrues design decisions — e.g. HNSW parameter
|
||
choices, pathway-memory hash function, auditor model rotation, etc.)
|