root 2a974d6dea docs: ARCHITECTURE_COMPARISON.md as living source file

Per J's request: move the parallel-runtime comparison from
reports/cutover/ (where it lived as cutover-prep evidence) into
docs/ as the source-of-truth file. J will keep updating it as
fixes ship on either side.

Restructured for living-document use:
- Status header (last refresh date, owner, update triggers)
- 'How to update this doc' section with explicit dos and don'ts
- Decisions tracker at top — actioned items with commit refs
  + open backlog with LOC estimates
- Each comparison section now has 'Last verified' columns where
  numbers are time-sensitive
- Change log section at bottom for one-line entries on every
  meaningful refresh

The original at reports/cutover/architecture_comparison.md gains
a 'THIS IS A SNAPSHOT' header pointing at the docs/ source. Kept
as historical record but no longer the place to update.

Sister pointer file in /home/profit/lakehouse/docs/ARCHITECTURE_COMPARISON.md
so the doc is reachable from either repo side. That file explicitly
says the source lives in golangLAKEHOUSE and warns against
authoritative content in the pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 04:56:20 -05:00

16 KiB

Raw Blame History

Lakehouse: Rust vs Go architecture comparison (snapshot)

THIS IS A SNAPSHOT — NOT THE SOURCE OF TRUTH. The living document is at docs/ARCHITECTURE_COMPARISON.md. Update there; this file is a frozen historical record. Snapshot date: 2026-05-01.

Produced 2026-05-01 to inform the keep/maintain decision and surface abstractions that should be addressed regardless of which side is the primary line going forward.

Code volume

	Lines
Rust `crates/` (15 crates)	35,447
Rust `sidecar/` (Python)	1,237
Go `internal/` (20 packages)	11,896
Go `cmd/` (14 binaries)	3,232
Go total	15,128

Go is ~43% the size of Rust on like-for-like surface. The gap is largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord implements both HNSW + Lance-format storage + benchmarking; Go's wraps coder/hnsw and stops there.

Process model

	Rust	Go
Binaries running	1 mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load)	11 dedicated daemons (each ~100-300MB RSS)
Inter-component comms	In-process axum.nest (no network)	HTTP between daemons
Crash blast radius	Whole system if any subsystem panics	One daemon dies, rest survive
Horizontal scale	One unit only — can't scale individual components	Each daemon scales independently
Deploy unit	Single binary	11 systemd units

Reading: Rust's mega-binary is simpler ops at small scale (one thing to start, one log to tail). Go's daemons are simpler ops at production scale (kill the misbehaving one, restart it, others stay up). Go also lets you tune per-daemon resource limits via systemd (e.g. MemoryHigh=4G on vectord but unlimited on chatd).

Python dependency

This is the architectural difference J flagged. Both call Ollama at :11434, but the path is different:

Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434

The Python sidecar (sidecar/sidecar/main.py, 1,237 lines) is a FastAPI wrapper around Ollama. It does:

/embed — pydantic validation + iterates over texts calling Ollama
/generate — pydantic validation + forwards to Ollama
/rerank — pydantic validation + Ollama-prompt scoring

It adds no fundamental compute that Ollama can't do directly. It's a type-validation + request-shape adapter layer.

The cost shows up in load tests:

Path	RPS @ conc=10	p50	p99	max
Go gateway → embedd → Ollama (warm cache)	8,119	0.79ms	3.2ms	8ms
Rust gateway → sidecar → Ollama (no cache)	128	77.8ms	124ms	214ms

Go is 63× faster on RPS, 100× lower p50 latency on this workload.

Two effects compound:

Go has an in-process embed cache (internal/embed/cached.go). For 6 rotating bodies × 240k requests, cache hit rate approaches 100%. Rust's sidecar doesn't cache.
Rust pays a Python serialization tax on every request — JSON in, pydantic validate, build httpx call, JSON out, pydantic serialize. ~10-20ms per request before reaching Ollama. Go's inline aibridge is native code with zero round-trips through a second runtime.

Even with cache parity, the structural Python-hop cost would leave Rust at maybe 1,000-2,000 RPS versus Go's 8,000+. The Python sidecar is the single biggest performance lever available on the Rust side.

Vector storage

	Rust	Go
HNSW lib	`hnsw_rs` (mature, used in production)	`coder/hnsw` (newer, smaller surface)
Code size	11,005 lines (vectord + vectord-lance)	804 lines
Lance-format storage	Yes (vectord-lance crate)	No
Persistence	LanceDB or in-memory	MinIO + JSON envelope (v2 envelope as of `eb0dfdf`)
Distance functions	cosine, euclidean, dot product	cosine, euclidean

Reading: Rust has the deeper vector-storage substrate. Lance-format gives columnar persistence + zero-copy reads + Apache-Arrow integration. Go relies on coder/hnsw with its own envelope format. For the staffing-domain corpus sizes (5K-500K vectors) both work fine; for multi-million-row indexes Rust would have an edge (Lance scales better on disk).

Distillation pipeline

This is where porting status matters most.

Phase	Rust source	Go port status
Materializer (transforms.ts)	TS, full	❌ NOT YET PORTED — Go can READ data/evidence/ as observer but cannot PRODUCE evidence. Phase 2 of audit-FULL is observer-only on Go.
Scorer	TS + Go	✅ Ported (`internal/distillation/scorer.go`)
Score categories + firewall	Pinned	✅ Ported (`SftNever` + `IsSftNever`)
SFT export (synthesis)	TS, full (8 source-class templates)	✅ Fully ported (`internal/distillation/sft_export.go`); 4-decimal byte-equal cross-runtime
RAG export	TS	❌ NOT YET PORTED
Preference export	TS	❌ NOT YET PORTED
Audit-baselines	TS	✅ Fully ported + cross-runtime byte-equal verified on 7 live entries
Audit-FULL phases 0/3/4	TS	✅ Ported
Audit-FULL phase 1 (schema tests)	bun test	✅ Ported via `go test` exec
Audit-FULL phase 2 (materializer dry-run)	TS, calls materializer	✅ Observer mode only (reads existing data/evidence/)
Audit-FULL phase 5 (run summaries)	TS	✅ Observer mode (reads existing summary.json)
Audit-FULL phase 6 (acceptance)	TS, fixture harness	❌ Skipped — TS-only fixture deps
Audit-FULL phase 7 (replay)	TS, runs replay()	✅ Observer mode (reads replay_runs.jsonl)
Replay tool	TS	❌ NOT YET PORTED
Quarantine writer	TS	❌ NOT YET PORTED

Reading: Go has the substrate for everything observable (read paths) and the SFT export end-to-end. The producer side (materializer, replay) is still Rust-only. To run the full pipeline from Go alone, the materializer + replay need porting.

Production validators

	Rust	Go
FillValidator	`crates/validator/src/staffing/fill.rs` (12 unit tests)	❌ NOT IN GO — closest is matrix gate's role check
EmailValidator	SSN pattern + salary disclosure + name consistency (12 tests)	❌ NOT IN GO
`/v1/validate` endpoint	Yes (Rust `gateway/v1/validate`)	❌ NOT IN GO
`/v1/iterate` endpoint	Yes (gen→validate→correct→retry loop)	❌ NOT IN GO
Production validators load `workers_500k.parquet` at startup	Yes (75MB resident)	N/A

Reading: Rust has a formal validator layer the Go side hasn't ported. For staffing-domain production, these matter — they're the "don't generate phantom worker IDs / SSN-pattern phone numbers / wrong geography" guardrails. Go's matrix retrieve filters by geo + role via embedder semantics, but doesn't do the rigorous structural validation the Rust validator crate does.

Substrate features unique to each side

Go has, Rust doesn't

chatd 5-provider dispatcher (kimi / opencode / openrouter / ollama_cloud / ollama). One unified /v1/chat endpoint over many cloud LLM providers.
Cross-role gate in matrix retrieve (real_001 fix). Playbook recordings tagged with role; retrieve queries pass query_role; gate prevents cross-role bleed.
Multi-corpus matrix indexer (Spec §3.4 component 2). Compose N single-corpus vectord indexes with attribution.
Pathway memory (Mem0-style versioned traces).
Observer fail-safe semantics (ADR-005 Decision 5.1).
In-process embed cache (CachedProvider + LRU).
LLM-based role extractor (regex + qwen2.5 fallback).
Persistent stack 3-layer isolation (scripts/cutover/start_go_stack.sh).
Cutover slice (Bun /_go/* route, opt-in via systemd drop-in).

Rust has, Go doesn't

Lance-format vector storage (vectord-lance crate, 605 lines). Columnar persistence with Apache Arrow.
truth crate (970 lines). Cross-source claim reconciliation. Used by validators + auditor.
journald crate (455 lines). Structured event journal for audit trails.
validator crate with FillValidator + EmailValidator (1,286 lines). Production guardrails.
/v1/validate + /v1/iterate endpoints. Network-callable validators with auto-correct loop.
ui crate (Dioxus, 1,509 lines). Native desktop/web UI. Plus Bun-frontend at :3700 + LLM Team UI at :5000.
Materializer + replay tools (the "produce evidence" side of distillation).
Acceptance harness (22 invariants over fixtures) — though it's TS not Rust.
Production deployment (devop.live/lakehouse/* serves through Rust today).

Strengths and weaknesses

Rust strengths

Maturity — production today, serving real demo traffic at devop.live/lakehouse/.
Single deploy unit — one binary, one systemd service, one log.
Type system + memory safety — fewer runtime bugs in the hot path.
Mature library ecosystem — axum, tokio, polars, arrow, hnsw_rs, lance.
Native distillation pipeline — every stage authored in Rust/TS first; Go is the porter.
Production validators — formal guardrails that the Go side doesn't have.
Lance vector storage — columnar format scales better at multi-million rows.

Rust weaknesses

Python sidecar dependency — every AI call goes through Python. 63× slower than direct Ollama. No structural reason it has to be Python; aibridge could call Ollama directly.
No embed cache — every embed pays the full Ollama latency. Adding a cache would close most of the 63× gap.
Mega-binary blast radius — gateway PID 1241 at 14.9G RSS means any panic kills the whole production system. No process isolation.
Tail latency cliff — Rust gateway hit 374% CPU during the load investigation earlier today. Single async runtime under load = tail-latency degradation across all subsystems.
Compile times — slow iteration. Go's per-package builds are seconds; Rust's incremental builds are minutes for large changes.
Coupling — adding a new feature touches gateway/v1/ and ripples across crates because everything composes via axum.nest.

Go strengths

Process isolation — daemons crash independently. ops can systemctl restart vectord without touching gateway/matrixd.
Per-daemon scale — embed cache lives in embedd; vectord shards independently. Hot daemons can scale horizontally without touching the rest.
No Python dependency — every daemon talks to its peers in HTTP/JSON. Native Go all the way down to Ollama.
In-process embed cache — yields 63× RPS improvement on warm workloads.
Smaller, denser code — 15,128 lines vs Rust's 35,447 + 1,237 sidecar (Go is 43% the size).
Faster iteration — go build of all 14 binaries is ~3-5s on this box; Rust full rebuild is minutes.
Cross-runtime artifact compatibility verified — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal between Rust and Go.

Go weaknesses

Distillation pipeline incomplete — materializer + replay + RAG export + preference export still Rust-only. Operators running Go end-to-end can't produce evidence; only consume it.
Production validators missing — no FillValidator/EmailValidator. Matrix gate covers role bleed but not structural validation (phantom IDs, SSN patterns, etc.).
Vector storage is HNSW-only — no Lance equivalent. Fine for current scale; would need Lance port for multi-million-row indexes.
Less production-tested — no real coordinator traffic against Go yet (cutover slice live but operator-controlled).
HTTP between daemons — every cross-daemon call is a network round-trip. Latency is fine on localhost (microseconds) but adds up. Rust's nest()-composed in-process services have zero IPC cost.
Coder/hnsw is newer — less battle-tested than Rust's hnsw_rs. The smoke-vs-persistent vector index pollution earlier today exposed an envelope versioning gap that's now fixed (v2 envelope) but Rust's lance-based persistence has had longer to mature.

Cross-cutting abstracts to address

Whichever side wins, both should grow these

Embed cache layer in Rust — Mirror Go's CachedProvider shape inside Rust aibridge. Even if you keep the Python sidecar, putting the cache on the Rust side closes the biggest perf gap. Sketch: 100-line lru::LruCache<(String, String), Vec<f32>> with a sync mutex. Would close ~95% of the 63× gap.
Drop the Python sidecar — Rewrite Rust aibridge to call Ollama at :11434/api/embed and /api/generate directly (skip Python). Reduces operations surface by one runtime + one process, gains performance, removes Python deploy dependency. The pydantic validation isn't doing anything Rust's serde can't already.
Materializer port (Rust → Go) — Currently Rust-only. Without it Go can audit but not produce. Most-blocking missing piece for Go-only operation.
Validator port (Rust → Go) — FillValidator + EmailValidator + /v1/validate + /v1/iterate. Production safety nets that Go doesn't have. About 1,300 lines of Rust to port.
Cross-runtime contract tests — pin the JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in auditor/schemas/ and add Go-side validators that read the same definitions. Currently we have ad-hoc parity tests; a formal contract test would catch drift.
Decide on Lance — Rust has it, Go doesn't. For 5K-500K row corpora it doesn't matter. For 5M+ corpora the Lance backend wins on disk scaling. If staffing demand grows beyond a million workers, port it; otherwise leave it.

If keeping Go primary

Port materializer first (highest leverage — unblocks full pipeline)
Port replay second (closes audit-FULL phase 7 live invocation)
Port validators third (production safety)
Skip Lance until corpus growth demands it
Keep chatd, observer fail-safe, role gate, multi-corpus matrix — these are real Go wins worth preserving

If keeping Rust primary

Drop Python sidecar — call Ollama directly from aibridge. Single biggest perf gain available.
Add embed cache in aibridge (LRU). Closes most of the perf gap to Go.
Port chatd's 5-provider dispatcher to Rust — Go has unified cloud LLM access; Rust's v1/chat is single-provider.
Port the cross-role gate to Rust matrix retrieve — Go's role gate prevents real bleed (verified through real_001..005). Rust matrix-retrieve doesn't have it.
Consider process splitting — gateway at 14.9GB RSS is operationally awkward. Even partial decomposition (split out vectord into its own process) would help.

Recommendation

Go for the primary line, Rust for production-bridge maintenance.

Reasons:

Performance — Go is 63× faster on the embed hot path, and the gap is structural (Python sidecar) not implementation-level.
Operations — process isolation is genuinely operationally simpler than a 14.9GB mega-binary at production scale.
Code volume — Go does the same job in 43% the lines; less surface area to audit, fewer places for bugs.
Cross-runtime parity verified — every artifact (audit_baselines, sft_export, scored runs) round-trips byte-equal. Operators running Go don't lose Rust-side compatibility.
The 4 missing pieces are bounded — materializer + replay + validators + RAG/preference exports are concrete porting targets, not research questions.

But don't abandon Rust:

devop.live/lakehouse/ runs through Rust today. Cutover is a multi-week process; Rust must stay healthy.
Several Go improvements (validators, Lance, mature HNSW lib) would be downstream of Rust patterns. Keeping Rust live means the substrate keeps evolving — anything new there is a porting opportunity for Go.
The Python sidecar drop + embed cache are valuable Rust improvements regardless. Rust under those would be 2-3x as competitive.

Bottom line

The substrate is parallel-mature on both sides for the audit/observation surface. The producer side (materializer/replay/validators) is Rust-only. Performance favors Go ~60× on warm workloads, structurally driven by the Python-sidecar architectural choice on Rust. Operations favor Go on process isolation. Production deployment status favors Rust today.

If the goal is "find the right primary line and harden the other," both should drop the Python sidecar and add embed caches first — those are universal wins. Then port materializer + replay to Go for end-to-end Go operation; or stay Rust-primary and improve the process model. Both paths are valid; the deciding factor is operations preference.

16 KiB Raw Blame History Unescape Escape