From b3ad14832d25ccf130c33dd3161392e1045cdd73 Mon Sep 17 00:00:00 2001 From: root Date: Fri, 1 May 2026 04:34:24 -0500 Subject: [PATCH] =?UTF-8?q?architecture=5Fcomparison:=20Rust=20vs=20Go=20l?= =?UTF-8?q?akehouse=20=E2=80=94=20weaknesses,=20strengths,=20abstracts=20t?= =?UTF-8?q?o=20address?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit J asked for the comparison before locking in primary line. This report documents what's actually structurally different vs implementation-level different, and what to do about each. Key findings: 1. Python sidecar is the single biggest architectural lever - Rust: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama - Go: gateway → HTTP → embedd → HTTP → Ollama (no Python) - Sidecar adds zero compute over Ollama (just pydantic + httpx) - 63× perf gap (8,119 vs 128 RPS) driven by sidecar + cache absence 2. Process model: Rust 1 mega-binary (14.9G RSS), Go 11 daemons - Rust: simpler ops at small scale, panic blast radius = whole system - Go: per-daemon scale + crash isolation, more config surface 3. Code volume: Go 15,128 lines vs Rust 35,447 + 1,237 sidecar - Go is 43% the size doing similar work - Gap concentrated in vectord (Rust 11k lines, Go 804 — Lance + benchmarking) 4. Distillation pipeline asymmetry - Audit/observation: BOTH sides parallel-mature - Production: Rust-only (materializer + replay + RAG/pref export) - Go can READ everything but can't PRODUCE evidence 5. Production validators (FillValidator/EmailValidator/'/v1/validate') - Rust has them (1,286 lines, 12 tests each) - Go doesn't — matrix gate covers role bleed but not structural validation Cross-cutting abstracts to address regardless of which wins: - Drop Python sidecar from Rust (call Ollama directly) - Add LRU embed cache to Rust aibridge - Port materializer + replay + validators to Go - Pin shared JSONL schemas as canonical (both runtimes consume same spec) - Decide on Lance backend (defer until corpus > 5M rows) If keeping Go primary: port materializer first, validators second, skip Lance. If keeping Rust primary: drop Python + add cache, port chatd 5-provider dispatcher + cross-role gate from Go. Bottom line: substrate is parallel-mature on observation; producer side is Rust-only; performance structurally favors Go ~60× on warm workloads; operations favors Go on isolation; production deployment favors Rust today. Co-Authored-By: Claude Opus 4.7 (1M context) --- reports/cutover/architecture_comparison.md | 264 +++++++++++++++++++++ 1 file changed, 264 insertions(+) create mode 100644 reports/cutover/architecture_comparison.md diff --git a/reports/cutover/architecture_comparison.md b/reports/cutover/architecture_comparison.md new file mode 100644 index 0000000..499ec4c --- /dev/null +++ b/reports/cutover/architecture_comparison.md @@ -0,0 +1,264 @@ +# Lakehouse: Rust vs Go architecture comparison + +Produced 2026-05-01 to inform the keep/maintain decision and surface +abstractions that should be addressed regardless of which side is the +primary line going forward. + +## Code volume + +| | Lines | +|---|---:| +| Rust `crates/` (15 crates) | **35,447** | +| Rust `sidecar/` (Python) | 1,237 | +| Go `internal/` (20 packages) | 11,896 | +| Go `cmd/` (14 binaries) | 3,232 | +| **Go total** | **15,128** | + +Go is ~43% the size of Rust on like-for-like surface. The gap is +largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord +implements both HNSW + Lance-format storage + benchmarking; Go's +wraps coder/hnsw and stops there. + +## Process model + +| | Rust | Go | +|---|---|---| +| Binaries running | **1** mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) | **11** dedicated daemons (each ~100-300MB RSS) | +| Inter-component comms | In-process axum.nest (no network) | HTTP between daemons | +| Crash blast radius | Whole system if any subsystem panics | One daemon dies, rest survive | +| Horizontal scale | One unit only — can't scale individual components | Each daemon scales independently | +| Deploy unit | Single binary | 11 systemd units | + +**Reading**: Rust's mega-binary is simpler ops at small scale (one thing +to start, one log to tail). Go's daemons are simpler ops at production +scale (kill the misbehaving one, restart it, others stay up). Go also +lets you tune per-daemon resource limits via systemd (e.g. +`MemoryHigh=4G` on vectord but unlimited on chatd). + +## Python dependency + +This is the architectural difference J flagged. Both call Ollama at +:11434, but the path is different: + +``` +Rust embed: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434 +Go embed: gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434 +``` + +The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a +FastAPI wrapper around Ollama. It does: +- `/embed` — pydantic validation + iterates over texts calling Ollama +- `/generate` — pydantic validation + forwards to Ollama +- `/rerank` — pydantic validation + Ollama-prompt scoring + +It adds **no fundamental compute** that Ollama can't do directly. It's a +type-validation + request-shape adapter layer. + +The cost shows up in load tests: + +| Path | RPS @ conc=10 | p50 | p99 | max | +|---|---:|---:|---:|---:| +| Go gateway → embedd → Ollama (warm cache) | **8,119** | 0.79ms | 3.2ms | 8ms | +| Rust gateway → sidecar → Ollama (no cache) | 128 | 77.8ms | 124ms | 214ms | + +Go is **63× faster on RPS, 100× lower p50 latency** on this workload. + +Two effects compound: +1. **Go has an in-process embed cache** (`internal/embed/cached.go`). + For 6 rotating bodies × 240k requests, cache hit rate approaches + 100%. Rust's sidecar doesn't cache. +2. **Rust pays a Python serialization tax** on every request — JSON + in, pydantic validate, build httpx call, JSON out, pydantic + serialize. ~10-20ms per request before reaching Ollama. Go's + inline aibridge is native code with zero round-trips through a + second runtime. + +Even with cache parity, the structural Python-hop cost would leave +Rust at maybe 1,000-2,000 RPS versus Go's 8,000+. The Python sidecar +is the single biggest performance lever available on the Rust side. + +## Vector storage + +| | Rust | Go | +|---|---|---| +| HNSW lib | `hnsw_rs` (mature, used in production) | `coder/hnsw` (newer, smaller surface) | +| Code size | 11,005 lines (vectord + vectord-lance) | 804 lines | +| Lance-format storage | Yes (vectord-lance crate) | No | +| Persistence | LanceDB or in-memory | MinIO + JSON envelope (v2 envelope as of `eb0dfdf`) | +| Distance functions | cosine, euclidean, dot product | cosine, euclidean | + +**Reading**: Rust has the deeper vector-storage substrate. Lance-format +gives columnar persistence + zero-copy reads + Apache-Arrow integration. +Go relies on coder/hnsw with its own envelope format. For the +staffing-domain corpus sizes (5K-500K vectors) both work fine; for +multi-million-row indexes Rust would have an edge (Lance scales +better on disk). + +## Distillation pipeline + +This is where porting status matters most. + +| Phase | Rust source | Go port status | +|---|---|---| +| Materializer (transforms.ts) | TS, full | ❌ NOT YET PORTED — Go can READ data/evidence/ as observer but cannot PRODUCE evidence. Phase 2 of audit-FULL is observer-only on Go. | +| Scorer | TS + Go | ✅ Ported (`internal/distillation/scorer.go`) | +| Score categories + firewall | Pinned | ✅ Ported (`SftNever` + `IsSftNever`) | +| SFT export (synthesis) | TS, full (8 source-class templates) | ✅ Fully ported (`internal/distillation/sft_export.go`); 4-decimal byte-equal cross-runtime | +| RAG export | TS | ❌ NOT YET PORTED | +| Preference export | TS | ❌ NOT YET PORTED | +| Audit-baselines | TS | ✅ Fully ported + cross-runtime byte-equal verified on 7 live entries | +| Audit-FULL phases 0/3/4 | TS | ✅ Ported | +| Audit-FULL phase 1 (schema tests) | bun test | ✅ Ported via `go test` exec | +| Audit-FULL phase 2 (materializer dry-run) | TS, calls materializer | ✅ Observer mode only (reads existing data/evidence/) | +| Audit-FULL phase 5 (run summaries) | TS | ✅ Observer mode (reads existing summary.json) | +| Audit-FULL phase 6 (acceptance) | TS, fixture harness | ❌ Skipped — TS-only fixture deps | +| Audit-FULL phase 7 (replay) | TS, runs replay() | ✅ Observer mode (reads replay_runs.jsonl) | +| Replay tool | TS | ❌ NOT YET PORTED | +| Quarantine writer | TS | ❌ NOT YET PORTED | + +**Reading**: Go has the substrate for everything observable (read paths) +and the SFT export end-to-end. The producer side (materializer, replay) +is still Rust-only. To run the full pipeline from Go alone, the +materializer + replay need porting. + +## Production validators + +| | Rust | Go | +|---|---|---| +| FillValidator | `crates/validator/src/staffing/fill.rs` (12 unit tests) | ❌ NOT IN GO — closest is matrix gate's role check | +| EmailValidator | SSN pattern + salary disclosure + name consistency (12 tests) | ❌ NOT IN GO | +| `/v1/validate` endpoint | Yes (Rust `gateway/v1/validate`) | ❌ NOT IN GO | +| `/v1/iterate` endpoint | Yes (gen→validate→correct→retry loop) | ❌ NOT IN GO | +| Production validators load `workers_500k.parquet` at startup | Yes (75MB resident) | N/A | + +**Reading**: Rust has a formal validator layer the Go side hasn't +ported. For staffing-domain production, these matter — they're the +"don't generate phantom worker IDs / SSN-pattern phone numbers / wrong +geography" guardrails. Go's matrix retrieve filters by geo + role via +embedder semantics, but doesn't do the rigorous structural validation +the Rust validator crate does. + +## Substrate features unique to each side + +### Go has, Rust doesn't + +- **chatd 5-provider dispatcher** (kimi / opencode / openrouter / ollama_cloud / ollama). One unified `/v1/chat` endpoint over many cloud LLM providers. +- **Cross-role gate** in matrix retrieve (real_001 fix). Playbook recordings tagged with role; retrieve queries pass query_role; gate prevents cross-role bleed. +- **Multi-corpus matrix indexer** (Spec §3.4 component 2). Compose N single-corpus vectord indexes with attribution. +- **Pathway memory** (Mem0-style versioned traces). +- **Observer fail-safe semantics** (ADR-005 Decision 5.1). +- **In-process embed cache** (CachedProvider + LRU). +- **LLM-based role extractor** (regex + qwen2.5 fallback). +- **Persistent stack 3-layer isolation** (`scripts/cutover/start_go_stack.sh`). +- **Cutover slice** (Bun `/_go/*` route, opt-in via systemd drop-in). + +### Rust has, Go doesn't + +- **Lance-format vector storage** (vectord-lance crate, 605 lines). Columnar persistence with Apache Arrow. +- **`truth` crate** (970 lines). Cross-source claim reconciliation. Used by validators + auditor. +- **`journald` crate** (455 lines). Structured event journal for audit trails. +- **`validator` crate** with FillValidator + EmailValidator (1,286 lines). Production guardrails. +- **`/v1/validate` + `/v1/iterate` endpoints**. Network-callable validators with auto-correct loop. +- **`ui` crate (Dioxus, 1,509 lines)**. Native desktop/web UI. Plus Bun-frontend at :3700 + LLM Team UI at :5000. +- **Materializer + replay tools** (the "produce evidence" side of distillation). +- **Acceptance harness** (22 invariants over fixtures) — though it's TS not Rust. +- **Production deployment** (devop.live/lakehouse/* serves through Rust today). + +## Strengths and weaknesses + +### Rust strengths + +- **Maturity** — production today, serving real demo traffic at devop.live/lakehouse/. +- **Single deploy unit** — one binary, one systemd service, one log. +- **Type system + memory safety** — fewer runtime bugs in the hot path. +- **Mature library ecosystem** — axum, tokio, polars, arrow, hnsw_rs, lance. +- **Native distillation pipeline** — every stage authored in Rust/TS first; Go is the porter. +- **Production validators** — formal guardrails that the Go side doesn't have. +- **Lance vector storage** — columnar format scales better at multi-million rows. + +### Rust weaknesses + +- **Python sidecar dependency** — every AI call goes through Python. 63× slower than direct Ollama. No structural reason it has to be Python; aibridge could call Ollama directly. +- **No embed cache** — every embed pays the full Ollama latency. Adding a cache would close most of the 63× gap. +- **Mega-binary blast radius** — gateway PID 1241 at 14.9G RSS means any panic kills the whole production system. No process isolation. +- **Tail latency cliff** — Rust gateway hit 374% CPU during the load investigation earlier today. Single async runtime under load = tail-latency degradation across all subsystems. +- **Compile times** — slow iteration. Go's per-package builds are seconds; Rust's incremental builds are minutes for large changes. +- **Coupling** — adding a new feature touches gateway/v1/ and ripples across crates because everything composes via axum.nest. + +### Go strengths + +- **Process isolation** — daemons crash independently. ops can `systemctl restart vectord` without touching gateway/matrixd. +- **Per-daemon scale** — embed cache lives in embedd; vectord shards independently. Hot daemons can scale horizontally without touching the rest. +- **No Python dependency** — every daemon talks to its peers in HTTP/JSON. Native Go all the way down to Ollama. +- **In-process embed cache** — yields 63× RPS improvement on warm workloads. +- **Smaller, denser code** — 15,128 lines vs Rust's 35,447 + 1,237 sidecar (Go is 43% the size). +- **Faster iteration** — `go build` of all 14 binaries is ~3-5s on this box; Rust full rebuild is minutes. +- **Cross-runtime artifact compatibility verified** — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal between Rust and Go. + +### Go weaknesses + +- **Distillation pipeline incomplete** — materializer + replay + RAG export + preference export still Rust-only. Operators running Go end-to-end can't produce evidence; only consume it. +- **Production validators missing** — no FillValidator/EmailValidator. Matrix gate covers role bleed but not structural validation (phantom IDs, SSN patterns, etc.). +- **Vector storage is HNSW-only** — no Lance equivalent. Fine for current scale; would need Lance port for multi-million-row indexes. +- **Less production-tested** — no real coordinator traffic against Go yet (cutover slice live but operator-controlled). +- **HTTP between daemons** — every cross-daemon call is a network round-trip. Latency is fine on localhost (microseconds) but adds up. Rust's nest()-composed in-process services have zero IPC cost. +- **Coder/hnsw is newer** — less battle-tested than Rust's hnsw_rs. The smoke-vs-persistent vector index pollution earlier today exposed an envelope versioning gap that's now fixed (v2 envelope) but Rust's lance-based persistence has had longer to mature. + +## Cross-cutting abstracts to address + +### Whichever side wins, both should grow these + +1. **Embed cache layer in Rust** — Mirror Go's `CachedProvider` shape inside Rust aibridge. Even if you keep the Python sidecar, putting the cache on the Rust side closes the biggest perf gap. Sketch: 100-line `lru::LruCache<(String, String), Vec>` with a sync mutex. Would close ~95% of the 63× gap. + +2. **Drop the Python sidecar** — Rewrite Rust aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly (skip Python). Reduces operations surface by one runtime + one process, gains performance, removes Python deploy dependency. The pydantic validation isn't doing anything Rust's serde can't already. + +3. **Materializer port (Rust → Go)** — Currently Rust-only. Without it Go can audit but not produce. Most-blocking missing piece for Go-only operation. + +4. **Validator port (Rust → Go)** — FillValidator + EmailValidator + `/v1/validate` + `/v1/iterate`. Production safety nets that Go doesn't have. About 1,300 lines of Rust to port. + +5. **Cross-runtime contract tests** — pin the JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` and add Go-side validators that read the same definitions. Currently we have ad-hoc parity tests; a formal contract test would catch drift. + +6. **Decide on Lance** — Rust has it, Go doesn't. For 5K-500K row corpora it doesn't matter. For 5M+ corpora the Lance backend wins on disk scaling. If staffing demand grows beyond a million workers, port it; otherwise leave it. + +### If keeping Go primary + +- **Port materializer first** (highest leverage — unblocks full pipeline) +- **Port replay second** (closes audit-FULL phase 7 live invocation) +- **Port validators third** (production safety) +- **Skip Lance until corpus growth demands it** +- **Keep chatd, observer fail-safe, role gate, multi-corpus matrix** — these are real Go wins worth preserving + +### If keeping Rust primary + +- **Drop Python sidecar — call Ollama directly from aibridge**. Single biggest perf gain available. +- **Add embed cache in aibridge** (LRU). Closes most of the perf gap to Go. +- **Port chatd's 5-provider dispatcher to Rust** — Go has unified cloud LLM access; Rust's v1/chat is single-provider. +- **Port the cross-role gate** to Rust matrix retrieve — Go's role gate prevents real bleed (verified through real_001..005). Rust matrix-retrieve doesn't have it. +- **Consider process splitting** — gateway at 14.9GB RSS is operationally awkward. Even partial decomposition (split out vectord into its own process) would help. + +## Recommendation + +**Go for the primary line, Rust for production-bridge maintenance.** + +Reasons: +1. **Performance** — Go is 63× faster on the embed hot path, and the gap is structural (Python sidecar) not implementation-level. +2. **Operations** — process isolation is genuinely operationally simpler than a 14.9GB mega-binary at production scale. +3. **Code volume** — Go does the same job in 43% the lines; less surface area to audit, fewer places for bugs. +4. **Cross-runtime parity verified** — every artifact (audit_baselines, sft_export, scored runs) round-trips byte-equal. Operators running Go don't lose Rust-side compatibility. +5. **The 4 missing pieces are bounded** — materializer + replay + validators + RAG/preference exports are concrete porting targets, not research questions. + +But **don't abandon Rust**: +1. devop.live/lakehouse/ runs through Rust today. Cutover is a multi-week process; Rust must stay healthy. +2. Several Go improvements (validators, Lance, mature HNSW lib) would be downstream of Rust patterns. Keeping Rust live means the substrate keeps evolving — anything new there is a porting opportunity for Go. +3. The Python sidecar drop + embed cache are valuable Rust improvements regardless. Rust under those would be 2-3x as competitive. + +## Bottom line + +The substrate is parallel-mature on both sides for the audit/observation surface. The producer side (materializer/replay/validators) is Rust-only. Performance favors Go ~60× on warm workloads, structurally driven by the Python-sidecar architectural choice on Rust. Operations favor Go on process isolation. Production deployment status favors Rust today. + +If the goal is "find the right primary line and harden the other," +both should drop the Python sidecar and add embed caches first — +those are universal wins. Then port materializer + replay to Go for +end-to-end Go operation; or stay Rust-primary and improve the +process model. Both paths are valid; the deciding factor is operations +preference.