docs: ARCHITECTURE_COMPARISON.md as living source file

Per J's request: move the parallel-runtime comparison from reports/cutover/ (where it lived as cutover-prep evidence) into docs/ as the source-of-truth file. J will keep updating it as fixes ship on either side. Restructured for living-document use: - Status header (last refresh date, owner, update triggers) - 'How to update this doc' section with explicit dos and don'ts - Decisions tracker at top — actioned items with commit refs + open backlog with LOC estimates - Each comparison section now has 'Last verified' columns where numbers are time-sensitive - Change log section at bottom for one-line entries on every meaningful refresh The original at reports/cutover/architecture_comparison.md gains a 'THIS IS A SNAPSHOT' header pointing at the docs/ source. Kept as historical record but no longer the place to update. Sister pointer file in /home/profit/lakehouse/docs/ARCHITECTURE_COMPARISON.md so the doc is reachable from either repo side. That file explicitly says the source lives in golangLAKEHOUSE and warns against authoritative content in the pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:56:20 -05:00 · 2026-05-01 04:56:20 -05:00 · 2a974d6dea
commit 2a974d6dea
parent b03521a506
2 changed files with 327 additions and 1 deletions
--- a/docs/ARCHITECTURE_COMPARISON.md
+++ b/docs/ARCHITECTURE_COMPARISON.md
@ -0,0 +1,321 @@
+# Lakehouse: Rust vs Go architecture comparison
+
+> **Status**: Living document · primary source for the parallel-runtime
+> comparison.
+> **Owner**: J. Update this when either side ships a fix that changes
+> the table values, or when a new architectural axis surfaces.
+> **Last meaningful refresh**: 2026-05-01 (post-Rust-cache + Go-validator-port)
+
+This document compares the two parallel implementations of the lakehouse
+substrate — Rust at `/home/profit/lakehouse/` (production today), Go at
+`/home/profit/golangLAKEHOUSE/` (cutover-prep, Bun `/_go/*` slice live).
+The goal of running both lines is to find where each architecture is
+weak vs strong, address those gaps, and make the keep/maintain
+decision based on real evidence rather than preference.
+
+A snapshot of this document at any point in time is also captured at
+`reports/cutover/architecture_comparison.md`. The version in `docs/`
+is the source of truth; `reports/cutover/` is the historical record.
+
+---
+
+## How to update this doc
+
+Three triggers:
+
+1. **A fix lands on either side that moves a table value.** Update the
+   number, append a one-line entry to the change log at the bottom,
+   commit alongside the fix.
+2. **A new architectural axis surfaces.** Add a section. Match the
+   shape of existing sections (table + read paragraph).
+3. **A keep/maintain decision is made.** Update the Recommendation
+   section + change log.
+
+Don't:
+- Delete sections without recording the reason in the change log.
+- Embed unverified claims — every "Rust is X" or "Go is X" should
+  point to either a load-test number, a code reference (`crate/file:line`),
+  or an explicit "asserted, not measured" caveat.
+
+---
+
+## Decisions tracker
+
+| Date | Decision | Effect |
+|---|---|---|
+| 2026-05-01 | Add LRU embed cache to Rust aibridge | Closes 236× perf gap. **DONE** (commit `150cc3b` in lakehouse). |
+| 2026-05-01 | Port FillValidator + EmailValidator to Go | Production safety net Go was missing. **DONE** (commit `b03521a` in golangLAKEHOUSE). |
+| _open_ | Drop Python sidecar from Rust aibridge | Universal-win architectural cleanup. ~200 LOC, removes 1 runtime + 1 process. |
+| _open_ | Port Rust materializer to Go (transforms.ts) | Unblocks Go-only end-to-end pipeline. ~500-800 LOC. |
+| _open_ | Port Rust replay tool to Go | Closes audit-FULL phase 7 live invocation. ~400-600 LOC. |
+| _open_ | Decide on Lance vector backend | Defer until corpus exceeds ~5M rows. |
+| _open_ | Pick Go primary vs Rust primary | Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness. |
+
+---
+
+## Code volume
+
+| | Lines | Last verified |
+|---|---:|---|
+| Rust `crates/` (15 crates) | 35,447 | 2026-05-01 |
+| Rust `sidecar/` (Python) | 1,237 | 2026-05-01 |
+| Go `internal/` (20 packages) | 11,896 (+ validator 1190) | 2026-05-01 |
+| Go `cmd/` (14 binaries) | 3,232 | 2026-05-01 |
+| **Go total** | **~16,300** | 2026-05-01 |
+
+Go is ~46% the size of Rust on like-for-like surface (post-validator-port).
+The gap is largely `vectord` (Rust 11,005 lines vs Go 804) — Rust's
+vectord implements HNSW + Lance-format storage + benchmarking; Go's
+wraps `coder/hnsw` and stops there.
+
+---
+
+## Process model
+
+| | Rust | Go |
+|---|---|---|
+| Binaries running | **1** mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) | **11** dedicated daemons (~100-300MB RSS each) |
+| Inter-component comms | In-process axum.nest (no network) | HTTP between daemons |
+| Crash blast radius | Whole system if any subsystem panics | One daemon dies, rest survive |
+| Horizontal scale | One unit only — can't scale individual components | Each daemon scales independently |
+| Deploy unit | Single binary | 11 systemd units |
+
+**Reading**: Rust's mega-binary is simpler ops at small scale (one
+thing to start, one log to tail). Go's daemons are simpler ops at
+production scale (kill the misbehaving one, restart it, others stay
+up). Go also lets you tune per-daemon resource limits via systemd.
+
+---
+
+## Python dependency (the load-bearing axis)
+
+This is the architectural difference that drove the original perf gap.
+Both call Ollama at `:11434`, but the path differs:
+
+```
+Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
+Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434
+```
+
+The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a
+FastAPI wrapper around Ollama. It does pydantic validation + request
+shaping; **no fundamental compute** that Ollama can't do directly.
+
+### Performance impact (load-tested 2026-05-01, 6 rotating bodies, 10 concurrency, 30s)
+
+| Path | Pre-cache | Post-cache (`150cc3b`) | Δ |
+|---|---:|---:|---:|
+| **Rust /ai/embed** (via gateway) | 128 RPS · p50 78ms · p99 124ms | **30,279 RPS · p50 129µs · p99 5ms** | +236× RPS |
+| **Go /v1/embed** (via gateway → embedd) | 8,119 RPS · p50 0.79ms · p99 3ms | _unchanged_ | (already cached) |
+
+Rust now beats Go ~3.7× on cache-warm workloads. The cache being
+in-process inside Rust's gateway (no HTTP hop to a separate daemon)
+gives it the edge once both sides have caching.
+
+### What the cache fix did NOT do
+
+The Python sidecar is still in the Rust path on cache misses. Cold
+queries pay the full Python+Ollama tax. Dropping the sidecar
+(rewriting aibridge to call Ollama directly) is the next universal-win
+item — open in the Decisions tracker.
+
+---
+
+## Vector storage
+
+| | Rust | Go |
+|---|---|---|
+| HNSW lib | `hnsw_rs` (mature) | `coder/hnsw` (newer, smaller) |
+| Code size | 11,005 lines (`vectord` + `vectord-lance`) | 804 lines |
+| Lance-format storage | Yes (`vectord-lance` crate) | No |
+| Persistence | LanceDB or in-memory | MinIO + JSON envelope (v2 envelope as of `eb0dfdf`) |
+| Distance functions | cosine, euclidean, dot product | cosine, euclidean |
+
+**Reading**: Rust has the deeper substrate. Lance-format gives columnar
+persistence + zero-copy reads + Apache Arrow integration. For
+staffing-domain corpus sizes (5K-500K vectors) both work fine; for
+multi-million-row indexes Rust would have a real edge. **Defer the Go
+Lance port until corpus growth demands it.**
+
+---
+
+## Distillation pipeline (porting status)
+
+| Phase | Rust source | Go port |
+|---|---|---|
+| Materializer (transforms.ts) | TS, full | ❌ NOT YET PORTED |
+| Scorer | TS + Go | ✅ Ported |
+| Score categories + firewall | Pinned | ✅ Ported (`SftNever`) |
+| SFT export (synthesis) | TS, full (8 source classes) | ✅ Fully ported, 4-decimal byte-equal |
+| RAG export | TS | ❌ NOT YET PORTED |
+| Preference export | TS | ❌ NOT YET PORTED |
+| Audit-baselines | TS | ✅ Fully ported, byte-equal verified |
+| Audit-FULL phase 0/3/4 | TS | ✅ Ported |
+| Audit-FULL phase 1 (schema) | bun test | ✅ Via `go test` exec |
+| Audit-FULL phase 2 (materializer) | TS | ✅ Observer mode (read-only) |
+| Audit-FULL phase 5 (run summaries) | TS | ✅ Observer mode (read-only) |
+| Audit-FULL phase 6 (acceptance) | TS fixture harness | ❌ Skipped (TS-only deps) |
+| Audit-FULL phase 7 (replay) | TS | ✅ Observer mode (read-only) |
+| Replay tool | TS | ❌ NOT YET PORTED |
+| Quarantine writer | TS | ❌ NOT YET PORTED |
+
+**Reading**: Go has the substrate for everything observable (read
+paths) and SFT export end-to-end. The producer side (materializer,
+replay) is still Rust-only. To run the full pipeline from Go alone,
+the materializer + replay need porting.
+
+---
+
+## Production validators
+
+| | Rust | Go |
+|---|---|---|
+| FillValidator | `crates/validator/src/staffing/fill.rs` (12 unit tests) | ✅ **Ported 2026-05-01** (`internal/validator/fill.go` + 13 tests) |
+| EmailValidator | `crates/validator/src/staffing/email.rs` (12 tests) | ✅ **Ported 2026-05-01** (`internal/validator/email.go` + 11 tests) |
+| `/v1/validate` endpoint | Yes | ❌ NOT YET PORTED (validator network surface) |
+| `/v1/iterate` endpoint | Yes (gen→validate→correct→retry loop) | ❌ NOT YET PORTED |
+| Production validators load `workers_500k.parquet` at startup | Yes (75MB resident) | N/A — Go uses WorkerLookup interface; in-memory or adapter |
+
+**Reading**: With today's port, Go has the load-bearing validators.
+The network surface (`/v1/validate`, `/v1/iterate`) is the next
+piece — the in-memory validators work in-process; turning them into
+HTTP endpoints adds the production-shape access pattern.
+
+---
+
+## Substrate features unique to each side
+
+### Go has, Rust doesn't
+
+- **chatd 5-provider dispatcher** (kimi / opencode / openrouter / ollama_cloud / ollama).
+- **Cross-role gate** in matrix retrieve (real_001 fix). Verified by reality tests real_001..005.
+- **Multi-corpus matrix indexer** (Spec §3.4 component 2).
+- **Pathway memory** (Mem0-style versioned traces).
+- **Observer fail-safe semantics** (ADR-005 Decision 5.1).
+- **In-process embed cache** (CachedProvider + LRU). _Note: Rust got this 2026-05-01 too._
+- **LLM-based role extractor** (regex + qwen2.5 fallback).
+- **Persistent stack 3-layer isolation** (`scripts/cutover/start_go_stack.sh`).
+- **Cutover slice** (Bun `/_go/*` route, opt-in via systemd drop-in).
+- **Production load test** (`scripts/cutover/loadgen/`) with Bun-frontend + direct comparison.
+
+### Rust has, Go doesn't
+
+- **Lance-format vector storage** (vectord-lance crate, 605 lines).
+- **`truth` crate** (970 lines). Cross-source claim reconciliation.
+- **`journald` crate** (455 lines). Structured event journal.
+- **`/v1/validate` + `/v1/iterate` endpoints** (network surface).
+- **`ui` crate (Dioxus, 1,509 lines)**. Native desktop/web UI.
+- **Materializer + replay tools** (the "produce evidence" side).
+- **Acceptance harness** (22 invariants over fixtures, TS).
+- **Production deployment** (devop.live/lakehouse/* serves through Rust today).
+
+---
+
+## Strengths and weaknesses
+
+### Rust strengths
+
+- Mature, in production, serving real demo traffic.
+- Single deploy unit; one binary, one systemd service, one log.
+- Type system + memory safety; fewer runtime bugs in hot paths.
+- Mature library ecosystem (axum, tokio, polars, arrow, hnsw_rs, lance).
+- Native distillation pipeline; Go is the porter.
+- Production validators (now also in Go but Rust authored them).
+- Lance vector storage scales beyond 5M rows.
+- **In-process embed cache (post-`150cc3b`) makes Rust the fastest path on warm workloads.**
+
+### Rust weaknesses
+
+- **Python sidecar dependency** — every cache-miss AI call goes through Python. Adds 1 runtime + 1 process to ops. ~200 LOC to fix.
+- **Mega-binary blast radius** — gateway at 14.9G RSS means any panic kills the whole production system.
+- **Tail latency cliff under uncached load** — single async runtime serializes I/O completions.
+- **Compile times** — slow iteration vs Go's per-package builds.
+- **Coupling** — adding a feature touches gateway/v1/ and ripples across crates.
+
+### Go strengths
+
+- **Process isolation** — daemons crash independently; ops can `systemctl restart vectord` without touching gateway.
+- **Per-daemon scale** — embed cache lives in embedd; vectord shards independently. Hot daemons scale horizontally.
+- **No Python dependency** — every daemon talks to peers in HTTP/JSON. Native Go down to Ollama.
+- **In-process embed cache** at the daemon level (was the perf lever pre-Rust-cache).
+- **Smaller, denser code** — 16,300 lines vs Rust's 35,447 + 1,237 sidecar (~46% the size).
+- **Faster iteration** — `go build` of all 14 binaries is ~3-5s; Rust full rebuild is minutes.
+- **Cross-runtime artifact compatibility verified** — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal.
+
+### Go weaknesses
+
+- **Distillation pipeline incomplete** — materializer + replay + RAG export + preference export still Rust-only.
+- **Validator network surface missing** — in-memory validators work, but `/v1/validate` HTTP endpoint not yet ported. Operators can't call validators over the wire from Go.
+- **Vector storage HNSW-only** — no Lance equivalent. Fine for current scale.
+- **Less production-tested** — cutover slice live but no real coordinator traffic yet.
+- **HTTP between daemons** — every cross-daemon call is a network round-trip. Latency fine on localhost (microseconds) but tail-latency contributes more than Rust's in-process composition.
+- **`coder/hnsw` is newer** than Rust's `hnsw_rs`. Less battle-tested.
+
+---
+
+## Cross-cutting abstracts to address
+
+The list below is a working backlog. Move items to "Decisions tracker"
+(at top) when actioned with a commit reference.
+
+### Universal wins (apply regardless of primary line)
+
+1. ✅ **Embed cache in Rust aibridge** — DONE 2026-05-01 (`150cc3b`).
+2. ✅ **FillValidator + EmailValidator in Go** — DONE 2026-05-01 (`b03521a`).
+3. **Drop Python sidecar from Rust** — Rewrite aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly. Removes 1 runtime + 1 process from ops. ~200 LOC.
+4. **Cross-runtime contract tests** — Pin shared JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` with Go-side validators consuming the same definitions.
+
+### If keeping Go primary
+
+5. **Port materializer** (highest leverage — unblocks full Go pipeline). ~500-800 LOC.
+6. **Port replay tool** (closes audit-FULL phase 7 live invocation). ~400-600 LOC.
+7. **Port `/v1/validate` + `/v1/iterate` HTTP surface** for the now-Go-side validators. ~200 LOC.
+8. **Skip Lance** until corpus growth demands it (>5M rows).
+9. **Keep chatd, observer fail-safe, role gate, multi-corpus matrix** — real Go wins worth preserving.
+
+### If keeping Rust primary
+
+10. **Port chatd's 5-provider dispatcher to Rust** — unified cloud LLM access.
+11. **Port the cross-role gate to Rust matrix retrieve** — production safety on the matrix layer (verified by Go reality tests real_001..005).
+12. **Consider process splitting** — even partial decomposition (split out vectord into its own process) would help with the mega-binary blast radius.
+
+---
+
+## Recommendation (working hypothesis)
+
+**Go for the primary line, Rust for production-bridge maintenance.**
+
+Reasons:
+1. **Operations** — process isolation is genuinely simpler at production scale than a 14.9G mega-binary.
+2. **Code volume** — Go does the same job in ~46% the lines.
+3. **Cross-runtime parity verified** — every artifact round-trips byte-equal between runtimes.
+4. **The 4 missing pieces are bounded** — materializer + replay + validators-network + RAG/preference exports are concrete porting targets, not research questions.
+5. **Performance is no longer a deciding factor** post-`150cc3b` — Rust is faster on warm cache, but both are well above staffing-domain demand levels (<1 RPS typical).
+
+But **don't abandon Rust**:
+1. devop.live/lakehouse/ runs through Rust today; cutover is multi-week.
+2. Several Go improvements would be downstream of Rust patterns. Keeping Rust live means anything new there is a porting opportunity for Go.
+3. The Python sidecar drop + cross-role gate port are valuable Rust improvements regardless of which line is primary.
+
+---
+
+## Change log
+
+Append entries here when this doc gets updated. One-line entries; link to commits.
+
+- 2026-05-01 — Initial draft (`b3ad148` golangLAKEHOUSE).
+- 2026-05-01 — Recorded Rust embed cache shipping (`150cc3b` lakehouse), updated Python-dependency section + table.
+- 2026-05-01 — Recorded Go validator port shipping (`b03521a` golangLAKEHOUSE), updated production-validators section.
+- 2026-05-01 — Reframed as living document in `docs/`, added Decisions tracker + Update guidance + Change log sections.
+
+---
+
+## See also
+
+- **`reports/cutover/architecture_comparison.md`** — historical snapshot (matched this doc as of the date stamp at top).
+- **`docs/SPEC.md`** — Go-side architectural spec.
+- **`docs/DECISIONS.md`** — Go-side ADRs.
+- **`/home/profit/lakehouse/docs/DECISIONS.md`** — Rust-side ADRs.
+- **`/home/profit/lakehouse/docs/go-rewrite/`** — Rust-side notes on the rewrite.
+- **`reports/cutover/SUMMARY.md`** — running log of cross-runtime parity probes.
+- **`reports/cutover/g5_load_test.md`** — load-test methodology + numbers.
--- a/reports/cutover/architecture_comparison.md
+++ b/reports/cutover/architecture_comparison.md
@ -1,4 +1,9 @@
-# Lakehouse: Rust vs Go architecture comparison
+# Lakehouse: Rust vs Go architecture comparison (snapshot)
+
+> **THIS IS A SNAPSHOT — NOT THE SOURCE OF TRUTH.**
+> The living document is at **`docs/ARCHITECTURE_COMPARISON.md`**.
+> Update there; this file is a frozen historical record.
+> Snapshot date: 2026-05-01.

 Produced 2026-05-01 to inform the keep/maintain decision and surface
 abstractions that should be addressed regardless of which side is the