Per J's request: move the parallel-runtime comparison from reports/cutover/ (where it lived as cutover-prep evidence) into docs/ as the source-of-truth file. J will keep updating it as fixes ship on either side. Restructured for living-document use: - Status header (last refresh date, owner, update triggers) - 'How to update this doc' section with explicit dos and don'ts - Decisions tracker at top — actioned items with commit refs + open backlog with LOC estimates - Each comparison section now has 'Last verified' columns where numbers are time-sensitive - Change log section at bottom for one-line entries on every meaningful refresh The original at reports/cutover/architecture_comparison.md gains a 'THIS IS A SNAPSHOT' header pointing at the docs/ source. Kept as historical record but no longer the place to update. Sister pointer file in /home/profit/lakehouse/docs/ARCHITECTURE_COMPARISON.md so the doc is reachable from either repo side. That file explicitly says the source lives in golangLAKEHOUSE and warns against authoritative content in the pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
270 lines
16 KiB
Markdown
270 lines
16 KiB
Markdown
# Lakehouse: Rust vs Go architecture comparison (snapshot)
|
||
|
||
> **THIS IS A SNAPSHOT — NOT THE SOURCE OF TRUTH.**
|
||
> The living document is at **`docs/ARCHITECTURE_COMPARISON.md`**.
|
||
> Update there; this file is a frozen historical record.
|
||
> Snapshot date: 2026-05-01.
|
||
|
||
Produced 2026-05-01 to inform the keep/maintain decision and surface
|
||
abstractions that should be addressed regardless of which side is the
|
||
primary line going forward.
|
||
|
||
## Code volume
|
||
|
||
| | Lines |
|
||
|---|---:|
|
||
| Rust `crates/` (15 crates) | **35,447** |
|
||
| Rust `sidecar/` (Python) | 1,237 |
|
||
| Go `internal/` (20 packages) | 11,896 |
|
||
| Go `cmd/` (14 binaries) | 3,232 |
|
||
| **Go total** | **15,128** |
|
||
|
||
Go is ~43% the size of Rust on like-for-like surface. The gap is
|
||
largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord
|
||
implements both HNSW + Lance-format storage + benchmarking; Go's
|
||
wraps coder/hnsw and stops there.
|
||
|
||
## Process model
|
||
|
||
| | Rust | Go |
|
||
|---|---|---|
|
||
| Binaries running | **1** mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) | **11** dedicated daemons (each ~100-300MB RSS) |
|
||
| Inter-component comms | In-process axum.nest (no network) | HTTP between daemons |
|
||
| Crash blast radius | Whole system if any subsystem panics | One daemon dies, rest survive |
|
||
| Horizontal scale | One unit only — can't scale individual components | Each daemon scales independently |
|
||
| Deploy unit | Single binary | 11 systemd units |
|
||
|
||
**Reading**: Rust's mega-binary is simpler ops at small scale (one thing
|
||
to start, one log to tail). Go's daemons are simpler ops at production
|
||
scale (kill the misbehaving one, restart it, others stay up). Go also
|
||
lets you tune per-daemon resource limits via systemd (e.g.
|
||
`MemoryHigh=4G` on vectord but unlimited on chatd).
|
||
|
||
## Python dependency
|
||
|
||
This is the architectural difference J flagged. Both call Ollama at
|
||
:11434, but the path is different:
|
||
|
||
```
|
||
Rust embed: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
|
||
Go embed: gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434
|
||
```
|
||
|
||
The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a
|
||
FastAPI wrapper around Ollama. It does:
|
||
- `/embed` — pydantic validation + iterates over texts calling Ollama
|
||
- `/generate` — pydantic validation + forwards to Ollama
|
||
- `/rerank` — pydantic validation + Ollama-prompt scoring
|
||
|
||
It adds **no fundamental compute** that Ollama can't do directly. It's a
|
||
type-validation + request-shape adapter layer.
|
||
|
||
The cost shows up in load tests:
|
||
|
||
| Path | RPS @ conc=10 | p50 | p99 | max |
|
||
|---|---:|---:|---:|---:|
|
||
| Go gateway → embedd → Ollama (warm cache) | **8,119** | 0.79ms | 3.2ms | 8ms |
|
||
| Rust gateway → sidecar → Ollama (no cache) | 128 | 77.8ms | 124ms | 214ms |
|
||
|
||
Go is **63× faster on RPS, 100× lower p50 latency** on this workload.
|
||
|
||
Two effects compound:
|
||
1. **Go has an in-process embed cache** (`internal/embed/cached.go`).
|
||
For 6 rotating bodies × 240k requests, cache hit rate approaches
|
||
100%. Rust's sidecar doesn't cache.
|
||
2. **Rust pays a Python serialization tax** on every request — JSON
|
||
in, pydantic validate, build httpx call, JSON out, pydantic
|
||
serialize. ~10-20ms per request before reaching Ollama. Go's
|
||
inline aibridge is native code with zero round-trips through a
|
||
second runtime.
|
||
|
||
Even with cache parity, the structural Python-hop cost would leave
|
||
Rust at maybe 1,000-2,000 RPS versus Go's 8,000+. The Python sidecar
|
||
is the single biggest performance lever available on the Rust side.
|
||
|
||
## Vector storage
|
||
|
||
| | Rust | Go |
|
||
|---|---|---|
|
||
| HNSW lib | `hnsw_rs` (mature, used in production) | `coder/hnsw` (newer, smaller surface) |
|
||
| Code size | 11,005 lines (vectord + vectord-lance) | 804 lines |
|
||
| Lance-format storage | Yes (vectord-lance crate) | No |
|
||
| Persistence | LanceDB or in-memory | MinIO + JSON envelope (v2 envelope as of `eb0dfdf`) |
|
||
| Distance functions | cosine, euclidean, dot product | cosine, euclidean |
|
||
|
||
**Reading**: Rust has the deeper vector-storage substrate. Lance-format
|
||
gives columnar persistence + zero-copy reads + Apache-Arrow integration.
|
||
Go relies on coder/hnsw with its own envelope format. For the
|
||
staffing-domain corpus sizes (5K-500K vectors) both work fine; for
|
||
multi-million-row indexes Rust would have an edge (Lance scales
|
||
better on disk).
|
||
|
||
## Distillation pipeline
|
||
|
||
This is where porting status matters most.
|
||
|
||
| Phase | Rust source | Go port status |
|
||
|---|---|---|
|
||
| Materializer (transforms.ts) | TS, full | ❌ NOT YET PORTED — Go can READ data/evidence/ as observer but cannot PRODUCE evidence. Phase 2 of audit-FULL is observer-only on Go. |
|
||
| Scorer | TS + Go | ✅ Ported (`internal/distillation/scorer.go`) |
|
||
| Score categories + firewall | Pinned | ✅ Ported (`SftNever` + `IsSftNever`) |
|
||
| SFT export (synthesis) | TS, full (8 source-class templates) | ✅ Fully ported (`internal/distillation/sft_export.go`); 4-decimal byte-equal cross-runtime |
|
||
| RAG export | TS | ❌ NOT YET PORTED |
|
||
| Preference export | TS | ❌ NOT YET PORTED |
|
||
| Audit-baselines | TS | ✅ Fully ported + cross-runtime byte-equal verified on 7 live entries |
|
||
| Audit-FULL phases 0/3/4 | TS | ✅ Ported |
|
||
| Audit-FULL phase 1 (schema tests) | bun test | ✅ Ported via `go test` exec |
|
||
| Audit-FULL phase 2 (materializer dry-run) | TS, calls materializer | ✅ Observer mode only (reads existing data/evidence/) |
|
||
| Audit-FULL phase 5 (run summaries) | TS | ✅ Observer mode (reads existing summary.json) |
|
||
| Audit-FULL phase 6 (acceptance) | TS, fixture harness | ❌ Skipped — TS-only fixture deps |
|
||
| Audit-FULL phase 7 (replay) | TS, runs replay() | ✅ Observer mode (reads replay_runs.jsonl) |
|
||
| Replay tool | TS | ❌ NOT YET PORTED |
|
||
| Quarantine writer | TS | ❌ NOT YET PORTED |
|
||
|
||
**Reading**: Go has the substrate for everything observable (read paths)
|
||
and the SFT export end-to-end. The producer side (materializer, replay)
|
||
is still Rust-only. To run the full pipeline from Go alone, the
|
||
materializer + replay need porting.
|
||
|
||
## Production validators
|
||
|
||
| | Rust | Go |
|
||
|---|---|---|
|
||
| FillValidator | `crates/validator/src/staffing/fill.rs` (12 unit tests) | ❌ NOT IN GO — closest is matrix gate's role check |
|
||
| EmailValidator | SSN pattern + salary disclosure + name consistency (12 tests) | ❌ NOT IN GO |
|
||
| `/v1/validate` endpoint | Yes (Rust `gateway/v1/validate`) | ❌ NOT IN GO |
|
||
| `/v1/iterate` endpoint | Yes (gen→validate→correct→retry loop) | ❌ NOT IN GO |
|
||
| Production validators load `workers_500k.parquet` at startup | Yes (75MB resident) | N/A |
|
||
|
||
**Reading**: Rust has a formal validator layer the Go side hasn't
|
||
ported. For staffing-domain production, these matter — they're the
|
||
"don't generate phantom worker IDs / SSN-pattern phone numbers / wrong
|
||
geography" guardrails. Go's matrix retrieve filters by geo + role via
|
||
embedder semantics, but doesn't do the rigorous structural validation
|
||
the Rust validator crate does.
|
||
|
||
## Substrate features unique to each side
|
||
|
||
### Go has, Rust doesn't
|
||
|
||
- **chatd 5-provider dispatcher** (kimi / opencode / openrouter / ollama_cloud / ollama). One unified `/v1/chat` endpoint over many cloud LLM providers.
|
||
- **Cross-role gate** in matrix retrieve (real_001 fix). Playbook recordings tagged with role; retrieve queries pass query_role; gate prevents cross-role bleed.
|
||
- **Multi-corpus matrix indexer** (Spec §3.4 component 2). Compose N single-corpus vectord indexes with attribution.
|
||
- **Pathway memory** (Mem0-style versioned traces).
|
||
- **Observer fail-safe semantics** (ADR-005 Decision 5.1).
|
||
- **In-process embed cache** (CachedProvider + LRU).
|
||
- **LLM-based role extractor** (regex + qwen2.5 fallback).
|
||
- **Persistent stack 3-layer isolation** (`scripts/cutover/start_go_stack.sh`).
|
||
- **Cutover slice** (Bun `/_go/*` route, opt-in via systemd drop-in).
|
||
|
||
### Rust has, Go doesn't
|
||
|
||
- **Lance-format vector storage** (vectord-lance crate, 605 lines). Columnar persistence with Apache Arrow.
|
||
- **`truth` crate** (970 lines). Cross-source claim reconciliation. Used by validators + auditor.
|
||
- **`journald` crate** (455 lines). Structured event journal for audit trails.
|
||
- **`validator` crate** with FillValidator + EmailValidator (1,286 lines). Production guardrails.
|
||
- **`/v1/validate` + `/v1/iterate` endpoints**. Network-callable validators with auto-correct loop.
|
||
- **`ui` crate (Dioxus, 1,509 lines)**. Native desktop/web UI. Plus Bun-frontend at :3700 + LLM Team UI at :5000.
|
||
- **Materializer + replay tools** (the "produce evidence" side of distillation).
|
||
- **Acceptance harness** (22 invariants over fixtures) — though it's TS not Rust.
|
||
- **Production deployment** (devop.live/lakehouse/* serves through Rust today).
|
||
|
||
## Strengths and weaknesses
|
||
|
||
### Rust strengths
|
||
|
||
- **Maturity** — production today, serving real demo traffic at devop.live/lakehouse/.
|
||
- **Single deploy unit** — one binary, one systemd service, one log.
|
||
- **Type system + memory safety** — fewer runtime bugs in the hot path.
|
||
- **Mature library ecosystem** — axum, tokio, polars, arrow, hnsw_rs, lance.
|
||
- **Native distillation pipeline** — every stage authored in Rust/TS first; Go is the porter.
|
||
- **Production validators** — formal guardrails that the Go side doesn't have.
|
||
- **Lance vector storage** — columnar format scales better at multi-million rows.
|
||
|
||
### Rust weaknesses
|
||
|
||
- **Python sidecar dependency** — every AI call goes through Python. 63× slower than direct Ollama. No structural reason it has to be Python; aibridge could call Ollama directly.
|
||
- **No embed cache** — every embed pays the full Ollama latency. Adding a cache would close most of the 63× gap.
|
||
- **Mega-binary blast radius** — gateway PID 1241 at 14.9G RSS means any panic kills the whole production system. No process isolation.
|
||
- **Tail latency cliff** — Rust gateway hit 374% CPU during the load investigation earlier today. Single async runtime under load = tail-latency degradation across all subsystems.
|
||
- **Compile times** — slow iteration. Go's per-package builds are seconds; Rust's incremental builds are minutes for large changes.
|
||
- **Coupling** — adding a new feature touches gateway/v1/ and ripples across crates because everything composes via axum.nest.
|
||
|
||
### Go strengths
|
||
|
||
- **Process isolation** — daemons crash independently. ops can `systemctl restart vectord` without touching gateway/matrixd.
|
||
- **Per-daemon scale** — embed cache lives in embedd; vectord shards independently. Hot daemons can scale horizontally without touching the rest.
|
||
- **No Python dependency** — every daemon talks to its peers in HTTP/JSON. Native Go all the way down to Ollama.
|
||
- **In-process embed cache** — yields 63× RPS improvement on warm workloads.
|
||
- **Smaller, denser code** — 15,128 lines vs Rust's 35,447 + 1,237 sidecar (Go is 43% the size).
|
||
- **Faster iteration** — `go build` of all 14 binaries is ~3-5s on this box; Rust full rebuild is minutes.
|
||
- **Cross-runtime artifact compatibility verified** — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal between Rust and Go.
|
||
|
||
### Go weaknesses
|
||
|
||
- **Distillation pipeline incomplete** — materializer + replay + RAG export + preference export still Rust-only. Operators running Go end-to-end can't produce evidence; only consume it.
|
||
- **Production validators missing** — no FillValidator/EmailValidator. Matrix gate covers role bleed but not structural validation (phantom IDs, SSN patterns, etc.).
|
||
- **Vector storage is HNSW-only** — no Lance equivalent. Fine for current scale; would need Lance port for multi-million-row indexes.
|
||
- **Less production-tested** — no real coordinator traffic against Go yet (cutover slice live but operator-controlled).
|
||
- **HTTP between daemons** — every cross-daemon call is a network round-trip. Latency is fine on localhost (microseconds) but adds up. Rust's nest()-composed in-process services have zero IPC cost.
|
||
- **Coder/hnsw is newer** — less battle-tested than Rust's hnsw_rs. The smoke-vs-persistent vector index pollution earlier today exposed an envelope versioning gap that's now fixed (v2 envelope) but Rust's lance-based persistence has had longer to mature.
|
||
|
||
## Cross-cutting abstracts to address
|
||
|
||
### Whichever side wins, both should grow these
|
||
|
||
1. **Embed cache layer in Rust** — Mirror Go's `CachedProvider` shape inside Rust aibridge. Even if you keep the Python sidecar, putting the cache on the Rust side closes the biggest perf gap. Sketch: 100-line `lru::LruCache<(String, String), Vec<f32>>` with a sync mutex. Would close ~95% of the 63× gap.
|
||
|
||
2. **Drop the Python sidecar** — Rewrite Rust aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly (skip Python). Reduces operations surface by one runtime + one process, gains performance, removes Python deploy dependency. The pydantic validation isn't doing anything Rust's serde can't already.
|
||
|
||
3. **Materializer port (Rust → Go)** — Currently Rust-only. Without it Go can audit but not produce. Most-blocking missing piece for Go-only operation.
|
||
|
||
4. **Validator port (Rust → Go)** — FillValidator + EmailValidator + `/v1/validate` + `/v1/iterate`. Production safety nets that Go doesn't have. About 1,300 lines of Rust to port.
|
||
|
||
5. **Cross-runtime contract tests** — pin the JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` and add Go-side validators that read the same definitions. Currently we have ad-hoc parity tests; a formal contract test would catch drift.
|
||
|
||
6. **Decide on Lance** — Rust has it, Go doesn't. For 5K-500K row corpora it doesn't matter. For 5M+ corpora the Lance backend wins on disk scaling. If staffing demand grows beyond a million workers, port it; otherwise leave it.
|
||
|
||
### If keeping Go primary
|
||
|
||
- **Port materializer first** (highest leverage — unblocks full pipeline)
|
||
- **Port replay second** (closes audit-FULL phase 7 live invocation)
|
||
- **Port validators third** (production safety)
|
||
- **Skip Lance until corpus growth demands it**
|
||
- **Keep chatd, observer fail-safe, role gate, multi-corpus matrix** — these are real Go wins worth preserving
|
||
|
||
### If keeping Rust primary
|
||
|
||
- **Drop Python sidecar — call Ollama directly from aibridge**. Single biggest perf gain available.
|
||
- **Add embed cache in aibridge** (LRU). Closes most of the perf gap to Go.
|
||
- **Port chatd's 5-provider dispatcher to Rust** — Go has unified cloud LLM access; Rust's v1/chat is single-provider.
|
||
- **Port the cross-role gate** to Rust matrix retrieve — Go's role gate prevents real bleed (verified through real_001..005). Rust matrix-retrieve doesn't have it.
|
||
- **Consider process splitting** — gateway at 14.9GB RSS is operationally awkward. Even partial decomposition (split out vectord into its own process) would help.
|
||
|
||
## Recommendation
|
||
|
||
**Go for the primary line, Rust for production-bridge maintenance.**
|
||
|
||
Reasons:
|
||
1. **Performance** — Go is 63× faster on the embed hot path, and the gap is structural (Python sidecar) not implementation-level.
|
||
2. **Operations** — process isolation is genuinely operationally simpler than a 14.9GB mega-binary at production scale.
|
||
3. **Code volume** — Go does the same job in 43% the lines; less surface area to audit, fewer places for bugs.
|
||
4. **Cross-runtime parity verified** — every artifact (audit_baselines, sft_export, scored runs) round-trips byte-equal. Operators running Go don't lose Rust-side compatibility.
|
||
5. **The 4 missing pieces are bounded** — materializer + replay + validators + RAG/preference exports are concrete porting targets, not research questions.
|
||
|
||
But **don't abandon Rust**:
|
||
1. devop.live/lakehouse/ runs through Rust today. Cutover is a multi-week process; Rust must stay healthy.
|
||
2. Several Go improvements (validators, Lance, mature HNSW lib) would be downstream of Rust patterns. Keeping Rust live means the substrate keeps evolving — anything new there is a porting opportunity for Go.
|
||
3. The Python sidecar drop + embed cache are valuable Rust improvements regardless. Rust under those would be 2-3x as competitive.
|
||
|
||
## Bottom line
|
||
|
||
The substrate is parallel-mature on both sides for the audit/observation surface. The producer side (materializer/replay/validators) is Rust-only. Performance favors Go ~60× on warm workloads, structurally driven by the Python-sidecar architectural choice on Rust. Operations favor Go on process isolation. Production deployment status favors Rust today.
|
||
|
||
If the goal is "find the right primary line and harden the other,"
|
||
both should drop the Python sidecar and add embed caches first —
|
||
those are universal wins. Then port materializer + replay to Go for
|
||
end-to-end Go operation; or stay Rust-primary and improve the
|
||
process model. Both paths are valid; the deciding factor is operations
|
||
preference.
|