From b3ad14832d25ccf130c33dd3161392e1045cdd73 Mon Sep 17 00:00:00 2001
From: root <root@island37.com>
Date: Fri, 1 May 2026 04:34:24 -0500
Subject: [PATCH] =?UTF-8?q?architecture=5Fcomparison:=20Rust=20vs=20Go=20l?=
 =?UTF-8?q?akehouse=20=E2=80=94=20weaknesses,=20strengths,=20abstracts=20t?=
 =?UTF-8?q?o=20address?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

J asked for the comparison before locking in primary line. This
report documents what's actually structurally different vs
implementation-level different, and what to do about each.

Key findings:

1. Python sidecar is the single biggest architectural lever
   - Rust: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama
   - Go:   gateway → HTTP → embedd → HTTP → Ollama (no Python)
   - Sidecar adds zero compute over Ollama (just pydantic + httpx)
   - 63× perf gap (8,119 vs 128 RPS) driven by sidecar + cache absence

2. Process model: Rust 1 mega-binary (14.9G RSS), Go 11 daemons
   - Rust: simpler ops at small scale, panic blast radius = whole system
   - Go: per-daemon scale + crash isolation, more config surface

3. Code volume: Go 15,128 lines vs Rust 35,447 + 1,237 sidecar
   - Go is 43% the size doing similar work
   - Gap concentrated in vectord (Rust 11k lines, Go 804 — Lance + benchmarking)

4. Distillation pipeline asymmetry
   - Audit/observation: BOTH sides parallel-mature
   - Production: Rust-only (materializer + replay + RAG/pref export)
   - Go can READ everything but can't PRODUCE evidence

5. Production validators (FillValidator/EmailValidator/'/v1/validate')
   - Rust has them (1,286 lines, 12 tests each)
   - Go doesn't — matrix gate covers role bleed but not structural validation

Cross-cutting abstracts to address regardless of which wins:
- Drop Python sidecar from Rust (call Ollama directly)
- Add LRU embed cache to Rust aibridge
- Port materializer + replay + validators to Go
- Pin shared JSONL schemas as canonical (both runtimes consume same spec)
- Decide on Lance backend (defer until corpus > 5M rows)

If keeping Go primary: port materializer first, validators second,
skip Lance. If keeping Rust primary: drop Python + add cache,
port chatd 5-provider dispatcher + cross-role gate from Go.

Bottom line: substrate is parallel-mature on observation; producer
side is Rust-only; performance structurally favors Go ~60× on warm
workloads; operations favors Go on isolation; production deployment
favors Rust today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 reports/cutover/architecture_comparison.md | 264 +++++++++++++++++++++
 1 file changed, 264 insertions(+)
 create mode 100644 reports/cutover/architecture_comparison.md

diff --git a/reports/cutover/architecture_comparison.md b/reports/cutover/architecture_comparison.md
new file mode 100644
index 0000000..499ec4c
--- /dev/null
+++ b/reports/cutover/architecture_comparison.md
@@ -0,0 +1,264 @@
+# Lakehouse: Rust vs Go architecture comparison
+
+Produced 2026-05-01 to inform the keep/maintain decision and surface
+abstractions that should be addressed regardless of which side is the
+primary line going forward.
+
+## Code volume
+
+| | Lines |
+|---|---:|
+| Rust `crates/` (15 crates) | **35,447** |
+| Rust `sidecar/` (Python) | 1,237 |
+| Go `internal/` (20 packages) | 11,896 |
+| Go `cmd/` (14 binaries) | 3,232 |
+| **Go total** | **15,128** |
+
+Go is ~43% the size of Rust on like-for-like surface. The gap is
+largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord
+implements both HNSW + Lance-format storage + benchmarking; Go's
+wraps coder/hnsw and stops there.
+
+## Process model
+
+| | Rust | Go |
+|---|---|---|
+| Binaries running | **1** mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) | **11** dedicated daemons (each ~100-300MB RSS) |
+| Inter-component comms | In-process axum.nest (no network) | HTTP between daemons |
+| Crash blast radius | Whole system if any subsystem panics | One daemon dies, rest survive |
+| Horizontal scale | One unit only — can't scale individual components | Each daemon scales independently |
+| Deploy unit | Single binary | 11 systemd units |
+
+**Reading**: Rust's mega-binary is simpler ops at small scale (one thing
+to start, one log to tail). Go's daemons are simpler ops at production
+scale (kill the misbehaving one, restart it, others stay up). Go also
+lets you tune per-daemon resource limits via systemd (e.g.
+`MemoryHigh=4G` on vectord but unlimited on chatd).
+
+## Python dependency
+
+This is the architectural difference J flagged. Both call Ollama at
+:11434, but the path is different:
+
+```
+Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
+Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434
+```
+
+The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a
+FastAPI wrapper around Ollama. It does:
+- `/embed` — pydantic validation + iterates over texts calling Ollama
+- `/generate` — pydantic validation + forwards to Ollama
+- `/rerank` — pydantic validation + Ollama-prompt scoring
+
+It adds **no fundamental compute** that Ollama can't do directly. It's a
+type-validation + request-shape adapter layer.
+
+The cost shows up in load tests:
+
+| Path | RPS @ conc=10 | p50 | p99 | max |
+|---|---:|---:|---:|---:|
+| Go gateway → embedd → Ollama (warm cache) | **8,119** | 0.79ms | 3.2ms | 8ms |
+| Rust gateway → sidecar → Ollama (no cache) | 128 | 77.8ms | 124ms | 214ms |
+
+Go is **63× faster on RPS, 100× lower p50 latency** on this workload.
+
+Two effects compound:
+1. **Go has an in-process embed cache** (`internal/embed/cached.go`).
+   For 6 rotating bodies × 240k requests, cache hit rate approaches
+   100%. Rust's sidecar doesn't cache.
+2. **Rust pays a Python serialization tax** on every request — JSON
+   in, pydantic validate, build httpx call, JSON out, pydantic
+   serialize. ~10-20ms per request before reaching Ollama. Go's
+   inline aibridge is native code with zero round-trips through a
+   second runtime.
+
+Even with cache parity, the structural Python-hop cost would leave
+Rust at maybe 1,000-2,000 RPS versus Go's 8,000+. The Python sidecar
+is the single biggest performance lever available on the Rust side.
+
+## Vector storage
+
+| | Rust | Go |
+|---|---|---|
+| HNSW lib | `hnsw_rs` (mature, used in production) | `coder/hnsw` (newer, smaller surface) |
+| Code size | 11,005 lines (vectord + vectord-lance) | 804 lines |
+| Lance-format storage | Yes (vectord-lance crate) | No |
+| Persistence | LanceDB or in-memory | MinIO + JSON envelope (v2 envelope as of `eb0dfdf`) |
+| Distance functions | cosine, euclidean, dot product | cosine, euclidean |
+
+**Reading**: Rust has the deeper vector-storage substrate. Lance-format
+gives columnar persistence + zero-copy reads + Apache-Arrow integration.
+Go relies on coder/hnsw with its own envelope format. For the
+staffing-domain corpus sizes (5K-500K vectors) both work fine; for
+multi-million-row indexes Rust would have an edge (Lance scales
+better on disk).
+
+## Distillation pipeline
+
+This is where porting status matters most.
+
+| Phase | Rust source | Go port status |
+|---|---|---|
+| Materializer (transforms.ts) | TS, full | ❌ NOT YET PORTED — Go can READ data/evidence/ as observer but cannot PRODUCE evidence. Phase 2 of audit-FULL is observer-only on Go. |
+| Scorer | TS + Go | ✅ Ported (`internal/distillation/scorer.go`) |
+| Score categories + firewall | Pinned | ✅ Ported (`SftNever` + `IsSftNever`) |
+| SFT export (synthesis) | TS, full (8 source-class templates) | ✅ Fully ported (`internal/distillation/sft_export.go`); 4-decimal byte-equal cross-runtime |
+| RAG export | TS | ❌ NOT YET PORTED |
+| Preference export | TS | ❌ NOT YET PORTED |
+| Audit-baselines | TS | ✅ Fully ported + cross-runtime byte-equal verified on 7 live entries |
+| Audit-FULL phases 0/3/4 | TS | ✅ Ported |
+| Audit-FULL phase 1 (schema tests) | bun test | ✅ Ported via `go test` exec |
+| Audit-FULL phase 2 (materializer dry-run) | TS, calls materializer | ✅ Observer mode only (reads existing data/evidence/) |
+| Audit-FULL phase 5 (run summaries) | TS | ✅ Observer mode (reads existing summary.json) |
+| Audit-FULL phase 6 (acceptance) | TS, fixture harness | ❌ Skipped — TS-only fixture deps |
+| Audit-FULL phase 7 (replay) | TS, runs replay() | ✅ Observer mode (reads replay_runs.jsonl) |
+| Replay tool | TS | ❌ NOT YET PORTED |
+| Quarantine writer | TS | ❌ NOT YET PORTED |
+
+**Reading**: Go has the substrate for everything observable (read paths)
+and the SFT export end-to-end. The producer side (materializer, replay)
+is still Rust-only. To run the full pipeline from Go alone, the
+materializer + replay need porting.
+
+## Production validators
+
+| | Rust | Go |
+|---|---|---|
+| FillValidator | `crates/validator/src/staffing/fill.rs` (12 unit tests) | ❌ NOT IN GO — closest is matrix gate's role check |
+| EmailValidator | SSN pattern + salary disclosure + name consistency (12 tests) | ❌ NOT IN GO |
+| `/v1/validate` endpoint | Yes (Rust `gateway/v1/validate`) | ❌ NOT IN GO |
+| `/v1/iterate` endpoint | Yes (gen→validate→correct→retry loop) | ❌ NOT IN GO |
+| Production validators load `workers_500k.parquet` at startup | Yes (75MB resident) | N/A |
+
+**Reading**: Rust has a formal validator layer the Go side hasn't
+ported. For staffing-domain production, these matter — they're the
+"don't generate phantom worker IDs / SSN-pattern phone numbers / wrong
+geography" guardrails. Go's matrix retrieve filters by geo + role via
+embedder semantics, but doesn't do the rigorous structural validation
+the Rust validator crate does.
+
+## Substrate features unique to each side
+
+### Go has, Rust doesn't
+
+- **chatd 5-provider dispatcher** (kimi / opencode / openrouter / ollama_cloud / ollama). One unified `/v1/chat` endpoint over many cloud LLM providers.
+- **Cross-role gate** in matrix retrieve (real_001 fix). Playbook recordings tagged with role; retrieve queries pass query_role; gate prevents cross-role bleed.
+- **Multi-corpus matrix indexer** (Spec §3.4 component 2). Compose N single-corpus vectord indexes with attribution.
+- **Pathway memory** (Mem0-style versioned traces).
+- **Observer fail-safe semantics** (ADR-005 Decision 5.1).
+- **In-process embed cache** (CachedProvider + LRU).
+- **LLM-based role extractor** (regex + qwen2.5 fallback).
+- **Persistent stack 3-layer isolation** (`scripts/cutover/start_go_stack.sh`).
+- **Cutover slice** (Bun `/_go/*` route, opt-in via systemd drop-in).
+
+### Rust has, Go doesn't
+
+- **Lance-format vector storage** (vectord-lance crate, 605 lines). Columnar persistence with Apache Arrow.
+- **`truth` crate** (970 lines). Cross-source claim reconciliation. Used by validators + auditor.
+- **`journald` crate** (455 lines). Structured event journal for audit trails.
+- **`validator` crate** with FillValidator + EmailValidator (1,286 lines). Production guardrails.
+- **`/v1/validate` + `/v1/iterate` endpoints**. Network-callable validators with auto-correct loop.
+- **`ui` crate (Dioxus, 1,509 lines)**. Native desktop/web UI. Plus Bun-frontend at :3700 + LLM Team UI at :5000.
+- **Materializer + replay tools** (the "produce evidence" side of distillation).
+- **Acceptance harness** (22 invariants over fixtures) — though it's TS not Rust.
+- **Production deployment** (devop.live/lakehouse/* serves through Rust today).
+
+## Strengths and weaknesses
+
+### Rust strengths
+
+- **Maturity** — production today, serving real demo traffic at devop.live/lakehouse/.
+- **Single deploy unit** — one binary, one systemd service, one log.
+- **Type system + memory safety** — fewer runtime bugs in the hot path.
+- **Mature library ecosystem** — axum, tokio, polars, arrow, hnsw_rs, lance.
+- **Native distillation pipeline** — every stage authored in Rust/TS first; Go is the porter.
+- **Production validators** — formal guardrails that the Go side doesn't have.
+- **Lance vector storage** — columnar format scales better at multi-million rows.
+
+### Rust weaknesses
+
+- **Python sidecar dependency** — every AI call goes through Python. 63× slower than direct Ollama. No structural reason it has to be Python; aibridge could call Ollama directly.
+- **No embed cache** — every embed pays the full Ollama latency. Adding a cache would close most of the 63× gap.
+- **Mega-binary blast radius** — gateway PID 1241 at 14.9G RSS means any panic kills the whole production system. No process isolation.
+- **Tail latency cliff** — Rust gateway hit 374% CPU during the load investigation earlier today. Single async runtime under load = tail-latency degradation across all subsystems.
+- **Compile times** — slow iteration. Go's per-package builds are seconds; Rust's incremental builds are minutes for large changes.
+- **Coupling** — adding a new feature touches gateway/v1/ and ripples across crates because everything composes via axum.nest.
+
+### Go strengths
+
+- **Process isolation** — daemons crash independently. ops can `systemctl restart vectord` without touching gateway/matrixd.
+- **Per-daemon scale** — embed cache lives in embedd; vectord shards independently. Hot daemons can scale horizontally without touching the rest.
+- **No Python dependency** — every daemon talks to its peers in HTTP/JSON. Native Go all the way down to Ollama.
+- **In-process embed cache** — yields 63× RPS improvement on warm workloads.
+- **Smaller, denser code** — 15,128 lines vs Rust's 35,447 + 1,237 sidecar (Go is 43% the size).
+- **Faster iteration** — `go build` of all 14 binaries is ~3-5s on this box; Rust full rebuild is minutes.
+- **Cross-runtime artifact compatibility verified** — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal between Rust and Go.
+
+### Go weaknesses
+
+- **Distillation pipeline incomplete** — materializer + replay + RAG export + preference export still Rust-only. Operators running Go end-to-end can't produce evidence; only consume it.
+- **Production validators missing** — no FillValidator/EmailValidator. Matrix gate covers role bleed but not structural validation (phantom IDs, SSN patterns, etc.).
+- **Vector storage is HNSW-only** — no Lance equivalent. Fine for current scale; would need Lance port for multi-million-row indexes.
+- **Less production-tested** — no real coordinator traffic against Go yet (cutover slice live but operator-controlled).
+- **HTTP between daemons** — every cross-daemon call is a network round-trip. Latency is fine on localhost (microseconds) but adds up. Rust's nest()-composed in-process services have zero IPC cost.
+- **Coder/hnsw is newer** — less battle-tested than Rust's hnsw_rs. The smoke-vs-persistent vector index pollution earlier today exposed an envelope versioning gap that's now fixed (v2 envelope) but Rust's lance-based persistence has had longer to mature.
+
+## Cross-cutting abstracts to address
+
+### Whichever side wins, both should grow these
+
+1. **Embed cache layer in Rust** — Mirror Go's `CachedProvider` shape inside Rust aibridge. Even if you keep the Python sidecar, putting the cache on the Rust side closes the biggest perf gap. Sketch: 100-line `lru::LruCache<(String, String), Vec<f32>>` with a sync mutex. Would close ~95% of the 63× gap.
+
+2. **Drop the Python sidecar** — Rewrite Rust aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly (skip Python). Reduces operations surface by one runtime + one process, gains performance, removes Python deploy dependency. The pydantic validation isn't doing anything Rust's serde can't already.
+
+3. **Materializer port (Rust → Go)** — Currently Rust-only. Without it Go can audit but not produce. Most-blocking missing piece for Go-only operation.
+
+4. **Validator port (Rust → Go)** — FillValidator + EmailValidator + `/v1/validate` + `/v1/iterate`. Production safety nets that Go doesn't have. About 1,300 lines of Rust to port.
+
+5. **Cross-runtime contract tests** — pin the JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` and add Go-side validators that read the same definitions. Currently we have ad-hoc parity tests; a formal contract test would catch drift.
+
+6. **Decide on Lance** — Rust has it, Go doesn't. For 5K-500K row corpora it doesn't matter. For 5M+ corpora the Lance backend wins on disk scaling. If staffing demand grows beyond a million workers, port it; otherwise leave it.
+
+### If keeping Go primary
+
+- **Port materializer first** (highest leverage — unblocks full pipeline)
+- **Port replay second** (closes audit-FULL phase 7 live invocation)
+- **Port validators third** (production safety)
+- **Skip Lance until corpus growth demands it**
+- **Keep chatd, observer fail-safe, role gate, multi-corpus matrix** — these are real Go wins worth preserving
+
+### If keeping Rust primary
+
+- **Drop Python sidecar — call Ollama directly from aibridge**. Single biggest perf gain available.
+- **Add embed cache in aibridge** (LRU). Closes most of the perf gap to Go.
+- **Port chatd's 5-provider dispatcher to Rust** — Go has unified cloud LLM access; Rust's v1/chat is single-provider.
+- **Port the cross-role gate** to Rust matrix retrieve — Go's role gate prevents real bleed (verified through real_001..005). Rust matrix-retrieve doesn't have it.
+- **Consider process splitting** — gateway at 14.9GB RSS is operationally awkward. Even partial decomposition (split out vectord into its own process) would help.
+
+## Recommendation
+
+**Go for the primary line, Rust for production-bridge maintenance.**
+
+Reasons:
+1. **Performance** — Go is 63× faster on the embed hot path, and the gap is structural (Python sidecar) not implementation-level.
+2. **Operations** — process isolation is genuinely operationally simpler than a 14.9GB mega-binary at production scale.
+3. **Code volume** — Go does the same job in 43% the lines; less surface area to audit, fewer places for bugs.
+4. **Cross-runtime parity verified** — every artifact (audit_baselines, sft_export, scored runs) round-trips byte-equal. Operators running Go don't lose Rust-side compatibility.
+5. **The 4 missing pieces are bounded** — materializer + replay + validators + RAG/preference exports are concrete porting targets, not research questions.
+
+But **don't abandon Rust**:
+1. devop.live/lakehouse/ runs through Rust today. Cutover is a multi-week process; Rust must stay healthy.
+2. Several Go improvements (validators, Lance, mature HNSW lib) would be downstream of Rust patterns. Keeping Rust live means the substrate keeps evolving — anything new there is a porting opportunity for Go.
+3. The Python sidecar drop + embed cache are valuable Rust improvements regardless. Rust under those would be 2-3x as competitive.
+
+## Bottom line
+
+The substrate is parallel-mature on both sides for the audit/observation surface. The producer side (materializer/replay/validators) is Rust-only. Performance favors Go ~60× on warm workloads, structurally driven by the Python-sidecar architectural choice on Rust. Operations favor Go on process isolation. Production deployment status favors Rust today.
+
+If the goal is "find the right primary line and harden the other,"
+both should drop the Python sidecar and add embed caches first —
+those are universal wins. Then port materializer + replay to Go for
+end-to-end Go operation; or stay Rust-primary and improve the
+process model. Both paths are valid; the deciding factor is operations
+preference.