J asked for the comparison before locking in primary line. This report documents what's actually structurally different vs implementation-level different, and what to do about each. Key findings: 1. Python sidecar is the single biggest architectural lever - Rust: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama - Go: gateway → HTTP → embedd → HTTP → Ollama (no Python) - Sidecar adds zero compute over Ollama (just pydantic + httpx) - 63× perf gap (8,119 vs 128 RPS) driven by sidecar + cache absence 2. Process model: Rust 1 mega-binary (14.9G RSS), Go 11 daemons - Rust: simpler ops at small scale, panic blast radius = whole system - Go: per-daemon scale + crash isolation, more config surface 3. Code volume: Go 15,128 lines vs Rust 35,447 + 1,237 sidecar - Go is 43% the size doing similar work - Gap concentrated in vectord (Rust 11k lines, Go 804 — Lance + benchmarking) 4. Distillation pipeline asymmetry - Audit/observation: BOTH sides parallel-mature - Production: Rust-only (materializer + replay + RAG/pref export) - Go can READ everything but can't PRODUCE evidence 5. Production validators (FillValidator/EmailValidator/'/v1/validate') - Rust has them (1,286 lines, 12 tests each) - Go doesn't — matrix gate covers role bleed but not structural validation Cross-cutting abstracts to address regardless of which wins: - Drop Python sidecar from Rust (call Ollama directly) - Add LRU embed cache to Rust aibridge - Port materializer + replay + validators to Go - Pin shared JSONL schemas as canonical (both runtimes consume same spec) - Decide on Lance backend (defer until corpus > 5M rows) If keeping Go primary: port materializer first, validators second, skip Lance. If keeping Rust primary: drop Python + add cache, port chatd 5-provider dispatcher + cross-role gate from Go. Bottom line: substrate is parallel-mature on observation; producer side is Rust-only; performance structurally favors Go ~60× on warm workloads; operations favors Go on isolation; production deployment favors Rust today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Lakehouse: Rust vs Go architecture comparison
Produced 2026-05-01 to inform the keep/maintain decision and surface abstractions that should be addressed regardless of which side is the primary line going forward.
Code volume
| Lines | |
|---|---|
Rust crates/ (15 crates) |
35,447 |
Rust sidecar/ (Python) |
1,237 |
Go internal/ (20 packages) |
11,896 |
Go cmd/ (14 binaries) |
3,232 |
| Go total | 15,128 |
Go is ~43% the size of Rust on like-for-like surface. The gap is largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord implements both HNSW + Lance-format storage + benchmarking; Go's wraps coder/hnsw and stops there.
Process model
| Rust | Go | |
|---|---|---|
| Binaries running | 1 mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) | 11 dedicated daemons (each ~100-300MB RSS) |
| Inter-component comms | In-process axum.nest (no network) | HTTP between daemons |
| Crash blast radius | Whole system if any subsystem panics | One daemon dies, rest survive |
| Horizontal scale | One unit only — can't scale individual components | Each daemon scales independently |
| Deploy unit | Single binary | 11 systemd units |
Reading: Rust's mega-binary is simpler ops at small scale (one thing
to start, one log to tail). Go's daemons are simpler ops at production
scale (kill the misbehaving one, restart it, others stay up). Go also
lets you tune per-daemon resource limits via systemd (e.g.
MemoryHigh=4G on vectord but unlimited on chatd).
Python dependency
This is the architectural difference J flagged. Both call Ollama at :11434, but the path is different:
Rust embed: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
Go embed: gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434
The Python sidecar (sidecar/sidecar/main.py, 1,237 lines) is a
FastAPI wrapper around Ollama. It does:
/embed— pydantic validation + iterates over texts calling Ollama/generate— pydantic validation + forwards to Ollama/rerank— pydantic validation + Ollama-prompt scoring
It adds no fundamental compute that Ollama can't do directly. It's a type-validation + request-shape adapter layer.
The cost shows up in load tests:
| Path | RPS @ conc=10 | p50 | p99 | max |
|---|---|---|---|---|
| Go gateway → embedd → Ollama (warm cache) | 8,119 | 0.79ms | 3.2ms | 8ms |
| Rust gateway → sidecar → Ollama (no cache) | 128 | 77.8ms | 124ms | 214ms |
Go is 63× faster on RPS, 100× lower p50 latency on this workload.
Two effects compound:
- Go has an in-process embed cache (
internal/embed/cached.go). For 6 rotating bodies × 240k requests, cache hit rate approaches 100%. Rust's sidecar doesn't cache. - Rust pays a Python serialization tax on every request — JSON in, pydantic validate, build httpx call, JSON out, pydantic serialize. ~10-20ms per request before reaching Ollama. Go's inline aibridge is native code with zero round-trips through a second runtime.
Even with cache parity, the structural Python-hop cost would leave Rust at maybe 1,000-2,000 RPS versus Go's 8,000+. The Python sidecar is the single biggest performance lever available on the Rust side.
Vector storage
| Rust | Go | |
|---|---|---|
| HNSW lib | hnsw_rs (mature, used in production) |
coder/hnsw (newer, smaller surface) |
| Code size | 11,005 lines (vectord + vectord-lance) | 804 lines |
| Lance-format storage | Yes (vectord-lance crate) | No |
| Persistence | LanceDB or in-memory | MinIO + JSON envelope (v2 envelope as of eb0dfdf) |
| Distance functions | cosine, euclidean, dot product | cosine, euclidean |
Reading: Rust has the deeper vector-storage substrate. Lance-format gives columnar persistence + zero-copy reads + Apache-Arrow integration. Go relies on coder/hnsw with its own envelope format. For the staffing-domain corpus sizes (5K-500K vectors) both work fine; for multi-million-row indexes Rust would have an edge (Lance scales better on disk).
Distillation pipeline
This is where porting status matters most.
| Phase | Rust source | Go port status |
|---|---|---|
| Materializer (transforms.ts) | TS, full | ❌ NOT YET PORTED — Go can READ data/evidence/ as observer but cannot PRODUCE evidence. Phase 2 of audit-FULL is observer-only on Go. |
| Scorer | TS + Go | ✅ Ported (internal/distillation/scorer.go) |
| Score categories + firewall | Pinned | ✅ Ported (SftNever + IsSftNever) |
| SFT export (synthesis) | TS, full (8 source-class templates) | ✅ Fully ported (internal/distillation/sft_export.go); 4-decimal byte-equal cross-runtime |
| RAG export | TS | ❌ NOT YET PORTED |
| Preference export | TS | ❌ NOT YET PORTED |
| Audit-baselines | TS | ✅ Fully ported + cross-runtime byte-equal verified on 7 live entries |
| Audit-FULL phases 0/3/4 | TS | ✅ Ported |
| Audit-FULL phase 1 (schema tests) | bun test | ✅ Ported via go test exec |
| Audit-FULL phase 2 (materializer dry-run) | TS, calls materializer | ✅ Observer mode only (reads existing data/evidence/) |
| Audit-FULL phase 5 (run summaries) | TS | ✅ Observer mode (reads existing summary.json) |
| Audit-FULL phase 6 (acceptance) | TS, fixture harness | ❌ Skipped — TS-only fixture deps |
| Audit-FULL phase 7 (replay) | TS, runs replay() | ✅ Observer mode (reads replay_runs.jsonl) |
| Replay tool | TS | ❌ NOT YET PORTED |
| Quarantine writer | TS | ❌ NOT YET PORTED |
Reading: Go has the substrate for everything observable (read paths) and the SFT export end-to-end. The producer side (materializer, replay) is still Rust-only. To run the full pipeline from Go alone, the materializer + replay need porting.
Production validators
| Rust | Go | |
|---|---|---|
| FillValidator | crates/validator/src/staffing/fill.rs (12 unit tests) |
❌ NOT IN GO — closest is matrix gate's role check |
| EmailValidator | SSN pattern + salary disclosure + name consistency (12 tests) | ❌ NOT IN GO |
/v1/validate endpoint |
Yes (Rust gateway/v1/validate) |
❌ NOT IN GO |
/v1/iterate endpoint |
Yes (gen→validate→correct→retry loop) | ❌ NOT IN GO |
Production validators load workers_500k.parquet at startup |
Yes (75MB resident) | N/A |
Reading: Rust has a formal validator layer the Go side hasn't ported. For staffing-domain production, these matter — they're the "don't generate phantom worker IDs / SSN-pattern phone numbers / wrong geography" guardrails. Go's matrix retrieve filters by geo + role via embedder semantics, but doesn't do the rigorous structural validation the Rust validator crate does.
Substrate features unique to each side
Go has, Rust doesn't
- chatd 5-provider dispatcher (kimi / opencode / openrouter / ollama_cloud / ollama). One unified
/v1/chatendpoint over many cloud LLM providers. - Cross-role gate in matrix retrieve (real_001 fix). Playbook recordings tagged with role; retrieve queries pass query_role; gate prevents cross-role bleed.
- Multi-corpus matrix indexer (Spec §3.4 component 2). Compose N single-corpus vectord indexes with attribution.
- Pathway memory (Mem0-style versioned traces).
- Observer fail-safe semantics (ADR-005 Decision 5.1).
- In-process embed cache (CachedProvider + LRU).
- LLM-based role extractor (regex + qwen2.5 fallback).
- Persistent stack 3-layer isolation (
scripts/cutover/start_go_stack.sh). - Cutover slice (Bun
/_go/*route, opt-in via systemd drop-in).
Rust has, Go doesn't
- Lance-format vector storage (vectord-lance crate, 605 lines). Columnar persistence with Apache Arrow.
truthcrate (970 lines). Cross-source claim reconciliation. Used by validators + auditor.journaldcrate (455 lines). Structured event journal for audit trails.validatorcrate with FillValidator + EmailValidator (1,286 lines). Production guardrails./v1/validate+/v1/iterateendpoints. Network-callable validators with auto-correct loop.uicrate (Dioxus, 1,509 lines). Native desktop/web UI. Plus Bun-frontend at :3700 + LLM Team UI at :5000.- Materializer + replay tools (the "produce evidence" side of distillation).
- Acceptance harness (22 invariants over fixtures) — though it's TS not Rust.
- Production deployment (devop.live/lakehouse/* serves through Rust today).
Strengths and weaknesses
Rust strengths
- Maturity — production today, serving real demo traffic at devop.live/lakehouse/.
- Single deploy unit — one binary, one systemd service, one log.
- Type system + memory safety — fewer runtime bugs in the hot path.
- Mature library ecosystem — axum, tokio, polars, arrow, hnsw_rs, lance.
- Native distillation pipeline — every stage authored in Rust/TS first; Go is the porter.
- Production validators — formal guardrails that the Go side doesn't have.
- Lance vector storage — columnar format scales better at multi-million rows.
Rust weaknesses
- Python sidecar dependency — every AI call goes through Python. 63× slower than direct Ollama. No structural reason it has to be Python; aibridge could call Ollama directly.
- No embed cache — every embed pays the full Ollama latency. Adding a cache would close most of the 63× gap.
- Mega-binary blast radius — gateway PID 1241 at 14.9G RSS means any panic kills the whole production system. No process isolation.
- Tail latency cliff — Rust gateway hit 374% CPU during the load investigation earlier today. Single async runtime under load = tail-latency degradation across all subsystems.
- Compile times — slow iteration. Go's per-package builds are seconds; Rust's incremental builds are minutes for large changes.
- Coupling — adding a new feature touches gateway/v1/ and ripples across crates because everything composes via axum.nest.
Go strengths
- Process isolation — daemons crash independently. ops can
systemctl restart vectordwithout touching gateway/matrixd. - Per-daemon scale — embed cache lives in embedd; vectord shards independently. Hot daemons can scale horizontally without touching the rest.
- No Python dependency — every daemon talks to its peers in HTTP/JSON. Native Go all the way down to Ollama.
- In-process embed cache — yields 63× RPS improvement on warm workloads.
- Smaller, denser code — 15,128 lines vs Rust's 35,447 + 1,237 sidecar (Go is 43% the size).
- Faster iteration —
go buildof all 14 binaries is ~3-5s on this box; Rust full rebuild is minutes. - Cross-runtime artifact compatibility verified — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal between Rust and Go.
Go weaknesses
- Distillation pipeline incomplete — materializer + replay + RAG export + preference export still Rust-only. Operators running Go end-to-end can't produce evidence; only consume it.
- Production validators missing — no FillValidator/EmailValidator. Matrix gate covers role bleed but not structural validation (phantom IDs, SSN patterns, etc.).
- Vector storage is HNSW-only — no Lance equivalent. Fine for current scale; would need Lance port for multi-million-row indexes.
- Less production-tested — no real coordinator traffic against Go yet (cutover slice live but operator-controlled).
- HTTP between daemons — every cross-daemon call is a network round-trip. Latency is fine on localhost (microseconds) but adds up. Rust's nest()-composed in-process services have zero IPC cost.
- Coder/hnsw is newer — less battle-tested than Rust's hnsw_rs. The smoke-vs-persistent vector index pollution earlier today exposed an envelope versioning gap that's now fixed (v2 envelope) but Rust's lance-based persistence has had longer to mature.
Cross-cutting abstracts to address
Whichever side wins, both should grow these
-
Embed cache layer in Rust — Mirror Go's
CachedProvidershape inside Rust aibridge. Even if you keep the Python sidecar, putting the cache on the Rust side closes the biggest perf gap. Sketch: 100-linelru::LruCache<(String, String), Vec<f32>>with a sync mutex. Would close ~95% of the 63× gap. -
Drop the Python sidecar — Rewrite Rust aibridge to call Ollama at
:11434/api/embedand/api/generatedirectly (skip Python). Reduces operations surface by one runtime + one process, gains performance, removes Python deploy dependency. The pydantic validation isn't doing anything Rust's serde can't already. -
Materializer port (Rust → Go) — Currently Rust-only. Without it Go can audit but not produce. Most-blocking missing piece for Go-only operation.
-
Validator port (Rust → Go) — FillValidator + EmailValidator +
/v1/validate+/v1/iterate. Production safety nets that Go doesn't have. About 1,300 lines of Rust to port. -
Cross-runtime contract tests — pin the JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in
auditor/schemas/and add Go-side validators that read the same definitions. Currently we have ad-hoc parity tests; a formal contract test would catch drift. -
Decide on Lance — Rust has it, Go doesn't. For 5K-500K row corpora it doesn't matter. For 5M+ corpora the Lance backend wins on disk scaling. If staffing demand grows beyond a million workers, port it; otherwise leave it.
If keeping Go primary
- Port materializer first (highest leverage — unblocks full pipeline)
- Port replay second (closes audit-FULL phase 7 live invocation)
- Port validators third (production safety)
- Skip Lance until corpus growth demands it
- Keep chatd, observer fail-safe, role gate, multi-corpus matrix — these are real Go wins worth preserving
If keeping Rust primary
- Drop Python sidecar — call Ollama directly from aibridge. Single biggest perf gain available.
- Add embed cache in aibridge (LRU). Closes most of the perf gap to Go.
- Port chatd's 5-provider dispatcher to Rust — Go has unified cloud LLM access; Rust's v1/chat is single-provider.
- Port the cross-role gate to Rust matrix retrieve — Go's role gate prevents real bleed (verified through real_001..005). Rust matrix-retrieve doesn't have it.
- Consider process splitting — gateway at 14.9GB RSS is operationally awkward. Even partial decomposition (split out vectord into its own process) would help.
Recommendation
Go for the primary line, Rust for production-bridge maintenance.
Reasons:
- Performance — Go is 63× faster on the embed hot path, and the gap is structural (Python sidecar) not implementation-level.
- Operations — process isolation is genuinely operationally simpler than a 14.9GB mega-binary at production scale.
- Code volume — Go does the same job in 43% the lines; less surface area to audit, fewer places for bugs.
- Cross-runtime parity verified — every artifact (audit_baselines, sft_export, scored runs) round-trips byte-equal. Operators running Go don't lose Rust-side compatibility.
- The 4 missing pieces are bounded — materializer + replay + validators + RAG/preference exports are concrete porting targets, not research questions.
But don't abandon Rust:
- devop.live/lakehouse/ runs through Rust today. Cutover is a multi-week process; Rust must stay healthy.
- Several Go improvements (validators, Lance, mature HNSW lib) would be downstream of Rust patterns. Keeping Rust live means the substrate keeps evolving — anything new there is a porting opportunity for Go.
- The Python sidecar drop + embed cache are valuable Rust improvements regardless. Rust under those would be 2-3x as competitive.
Bottom line
The substrate is parallel-mature on both sides for the audit/observation surface. The producer side (materializer/replay/validators) is Rust-only. Performance favors Go ~60× on warm workloads, structurally driven by the Python-sidecar architectural choice on Rust. Operations favor Go on process isolation. Production deployment status favors Rust today.
If the goal is "find the right primary line and harden the other," both should drop the Python sidecar and add embed caches first — those are universal wins. Then port materializer + replay to Go for end-to-end Go operation; or stay Rust-primary and improve the process model. Both paths are valid; the deciding factor is operations preference.