golangLAKEHOUSE/reports/cutover/architecture_comparison.md

# Lakehouse: Rust vs Go architecture comparison (snapshot)

> **THIS IS A SNAPSHOT — NOT THE SOURCE OF TRUTH.**
> The living document is at **`docs/ARCHITECTURE_COMPARISON.md`**.
> Update there; this file is a frozen historical record.
> Snapshot date: 2026-05-01.

Produced 2026-05-01 to inform the keep/maintain decision and surface
abstractions that should be addressed regardless of which side is the
primary line going forward.

## Code volume

| | Lines |
|---|---:|
| Rust `crates/` (15 crates) | **35,447** |
| Rust `sidecar/` (Python) | 1,237 |
| Go `internal/` (20 packages) | 11,896 |
| Go `cmd/` (14 binaries) | 3,232 |
| **Go total** | **15,128** |

Go is ~43% the size of Rust on like-for-like surface. The gap is
largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord
implements both HNSW + Lance-format storage + benchmarking; Go's
wraps coder/hnsw and stops there.

## Process model

| | Rust | Go |
|---|---|---|
| Binaries running | **1** mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) | **11** dedicated daemons (each ~100-300MB RSS) |
| Inter-component comms | In-process axum.nest (no network) | HTTP between daemons |
| Crash blast radius | Whole system if any subsystem panics | One daemon dies, rest survive |
| Horizontal scale | One unit only — can't scale individual components | Each daemon scales independently |
| Deploy unit | Single binary | 11 systemd units |

**Reading**: Rust's mega-binary is simpler ops at small scale (one thing
to start, one log to tail). Go's daemons are simpler ops at production
scale (kill the misbehaving one, restart it, others stay up). Go also
lets you tune per-daemon resource limits via systemd (e.g.
`MemoryHigh=4G` on vectord but unlimited on chatd).

## Python dependency

This is the architectural difference J flagged. Both call Ollama at
:11434, but the path is different:

```
Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434
```

The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a
FastAPI wrapper around Ollama. It does:
- `/embed` — pydantic validation + iterates over texts calling Ollama
- `/generate` — pydantic validation + forwards to Ollama
- `/rerank` — pydantic validation + Ollama-prompt scoring

It adds **no fundamental compute** that Ollama can't do directly. It's a
type-validation + request-shape adapter layer.

The cost shows up in load tests:

| Path | RPS @ conc=10 | p50 | p99 | max |
|---|---:|---:|---:|---:|
| Go gateway → embedd → Ollama (warm cache) | **8,119** | 0.79ms | 3.2ms | 8ms |
| Rust gateway → sidecar → Ollama (no cache) | 128 | 77.8ms | 124ms | 214ms |

Go is **63× faster on RPS, 100× lower p50 latency** on this workload.

Two effects compound:
1. **Go has an in-process embed cache** (`internal/embed/cached.go`).
   For 6 rotating bodies × 240k requests, cache hit rate approaches
   100%. Rust's sidecar doesn't cache.
2. **Rust pays a Python serialization tax** on every request — JSON
   in, pydantic validate, build httpx call, JSON out, pydantic
   serialize. ~10-20ms per request before reaching Ollama. Go's
   inline aibridge is native code with zero round-trips through a
   second runtime.

Even with cache parity, the structural Python-hop cost would leave
Rust at maybe 1,000-2,000 RPS versus Go's 8,000+. The Python sidecar
is the single biggest performance lever available on the Rust side.

## Vector storage

| | Rust | Go |
|---|---|---|
| HNSW lib | `hnsw_rs` (mature, used in production) | `coder/hnsw` (newer, smaller surface) |
| Code size | 11,005 lines (vectord + vectord-lance) | 804 lines |
| Lance-format storage | Yes (vectord-lance crate) | No |
| Persistence | LanceDB or in-memory | MinIO + JSON envelope (v2 envelope as of `eb0dfdf`) |
| Distance functions | cosine, euclidean, dot product | cosine, euclidean |

**Reading**: Rust has the deeper vector-storage substrate. Lance-format
gives columnar persistence + zero-copy reads + Apache-Arrow integration.
Go relies on coder/hnsw with its own envelope format. For the
staffing-domain corpus sizes (5K-500K vectors) both work fine; for
multi-million-row indexes Rust would have an edge (Lance scales
better on disk).

## Distillation pipeline

This is where porting status matters most.

| Phase | Rust source | Go port status |
|---|---|---|
| Materializer (transforms.ts) | TS, full | ❌ NOT YET PORTED — Go can READ data/evidence/ as observer but cannot PRODUCE evidence. Phase 2 of audit-FULL is observer-only on Go. |
| Scorer | TS + Go | ✅ Ported (`internal/distillation/scorer.go`) |
| Score categories + firewall | Pinned | ✅ Ported (`SftNever` + `IsSftNever`) |
| SFT export (synthesis) | TS, full (8 source-class templates) | ✅ Fully ported (`internal/distillation/sft_export.go`); 4-decimal byte-equal cross-runtime |
| RAG export | TS | ❌ NOT YET PORTED |
| Preference export | TS | ❌ NOT YET PORTED |
| Audit-baselines | TS | ✅ Fully ported + cross-runtime byte-equal verified on 7 live entries |
| Audit-FULL phases 0/3/4 | TS | ✅ Ported |
| Audit-FULL phase 1 (schema tests) | bun test | ✅ Ported via `go test` exec |
| Audit-FULL phase 2 (materializer dry-run) | TS, calls materializer | ✅ Observer mode only (reads existing data/evidence/) |
| Audit-FULL phase 5 (run summaries) | TS | ✅ Observer mode (reads existing summary.json) |
| Audit-FULL phase 6 (acceptance) | TS, fixture harness | ❌ Skipped — TS-only fixture deps |
| Audit-FULL phase 7 (replay) | TS, runs replay() | ✅ Observer mode (reads replay_runs.jsonl) |
| Replay tool | TS | ❌ NOT YET PORTED |
| Quarantine writer | TS | ❌ NOT YET PORTED |

**Reading**: Go has the substrate for everything observable (read paths)
and the SFT export end-to-end. The producer side (materializer, replay)
is still Rust-only. To run the full pipeline from Go alone, the
materializer + replay need porting.

## Production validators

| | Rust | Go |
|---|---|---|
| FillValidator | `crates/validator/src/staffing/fill.rs` (12 unit tests) | ❌ NOT IN GO — closest is matrix gate's role check |
| EmailValidator | SSN pattern + salary disclosure + name consistency (12 tests) | ❌ NOT IN GO |
| `/v1/validate` endpoint | Yes (Rust `gateway/v1/validate`) | ❌ NOT IN GO |
| `/v1/iterate` endpoint | Yes (gen→validate→correct→retry loop) | ❌ NOT IN GO |
| Production validators load `workers_500k.parquet` at startup | Yes (75MB resident) | N/A |

**Reading**: Rust has a formal validator layer the Go side hasn't
ported. For staffing-domain production, these matter — they're the
"don't generate phantom worker IDs / SSN-pattern phone numbers / wrong
geography" guardrails. Go's matrix retrieve filters by geo + role via
embedder semantics, but doesn't do the rigorous structural validation
the Rust validator crate does.

## Substrate features unique to each side

### Go has, Rust doesn't

- **chatd 5-provider dispatcher** (kimi / opencode / openrouter / ollama_cloud / ollama). One unified `/v1/chat` endpoint over many cloud LLM providers.
- **Cross-role gate** in matrix retrieve (real_001 fix). Playbook recordings tagged with role; retrieve queries pass query_role; gate prevents cross-role bleed.
- **Multi-corpus matrix indexer** (Spec §3.4 component 2). Compose N single-corpus vectord indexes with attribution.
- **Pathway memory** (Mem0-style versioned traces).
- **Observer fail-safe semantics** (ADR-005 Decision 5.1).
- **In-process embed cache** (CachedProvider + LRU).
- **LLM-based role extractor** (regex + qwen2.5 fallback).
- **Persistent stack 3-layer isolation** (`scripts/cutover/start_go_stack.sh`).
- **Cutover slice** (Bun `/_go/*` route, opt-in via systemd drop-in).

### Rust has, Go doesn't

- **Lance-format vector storage** (vectord-lance crate, 605 lines). Columnar persistence with Apache Arrow.
- **`truth` crate** (970 lines). Cross-source claim reconciliation. Used by validators + auditor.
- **`journald` crate** (455 lines). Structured event journal for audit trails.
- **`validator` crate** with FillValidator + EmailValidator (1,286 lines). Production guardrails.
- **`/v1/validate` + `/v1/iterate` endpoints**. Network-callable validators with auto-correct loop.
- **`ui` crate (Dioxus, 1,509 lines)**. Native desktop/web UI. Plus Bun-frontend at :3700 + LLM Team UI at :5000.
- **Materializer + replay tools** (the "produce evidence" side of distillation).
- **Acceptance harness** (22 invariants over fixtures) — though it's TS not Rust.
- **Production deployment** (devop.live/lakehouse/* serves through Rust today).

## Strengths and weaknesses

### Rust strengths

- **Maturity** — production today, serving real demo traffic at devop.live/lakehouse/.
- **Single deploy unit** — one binary, one systemd service, one log.
- **Type system + memory safety** — fewer runtime bugs in the hot path.
- **Mature library ecosystem** — axum, tokio, polars, arrow, hnsw_rs, lance.
- **Native distillation pipeline** — every stage authored in Rust/TS first; Go is the porter.
- **Production validators** — formal guardrails that the Go side doesn't have.
- **Lance vector storage** — columnar format scales better at multi-million rows.

### Rust weaknesses

- **Python sidecar dependency** — every AI call goes through Python. 63× slower than direct Ollama. No structural reason it has to be Python; aibridge could call Ollama directly.
- **No embed cache** — every embed pays the full Ollama latency. Adding a cache would close most of the 63× gap.
- **Mega-binary blast radius** — gateway PID 1241 at 14.9G RSS means any panic kills the whole production system. No process isolation.
- **Tail latency cliff** — Rust gateway hit 374% CPU during the load investigation earlier today. Single async runtime under load = tail-latency degradation across all subsystems.
- **Compile times** — slow iteration. Go's per-package builds are seconds; Rust's incremental builds are minutes for large changes.
- **Coupling** — adding a new feature touches gateway/v1/ and ripples across crates because everything composes via axum.nest.

### Go strengths

- **Process isolation** — daemons crash independently. ops can `systemctl restart vectord` without touching gateway/matrixd.
- **Per-daemon scale** — embed cache lives in embedd; vectord shards independently. Hot daemons can scale horizontally without touching the rest.
- **No Python dependency** — every daemon talks to its peers in HTTP/JSON. Native Go all the way down to Ollama.
- **In-process embed cache** — yields 63× RPS improvement on warm workloads.
- **Smaller, denser code** — 15,128 lines vs Rust's 35,447 + 1,237 sidecar (Go is 43% the size).
- **Faster iteration** — `go build` of all 14 binaries is ~3-5s on this box; Rust full rebuild is minutes.
- **Cross-runtime artifact compatibility verified** — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal between Rust and Go.

### Go weaknesses

- **Distillation pipeline incomplete** — materializer + replay + RAG export + preference export still Rust-only. Operators running Go end-to-end can't produce evidence; only consume it.
- **Production validators missing** — no FillValidator/EmailValidator. Matrix gate covers role bleed but not structural validation (phantom IDs, SSN patterns, etc.).
- **Vector storage is HNSW-only** — no Lance equivalent. Fine for current scale; would need Lance port for multi-million-row indexes.
- **Less production-tested** — no real coordinator traffic against Go yet (cutover slice live but operator-controlled).
- **HTTP between daemons** — every cross-daemon call is a network round-trip. Latency is fine on localhost (microseconds) but adds up. Rust's nest()-composed in-process services have zero IPC cost.
- **Coder/hnsw is newer** — less battle-tested than Rust's hnsw_rs. The smoke-vs-persistent vector index pollution earlier today exposed an envelope versioning gap that's now fixed (v2 envelope) but Rust's lance-based persistence has had longer to mature.

## Cross-cutting abstracts to address

### Whichever side wins, both should grow these

1. **Embed cache layer in Rust** — Mirror Go's `CachedProvider` shape inside Rust aibridge. Even if you keep the Python sidecar, putting the cache on the Rust side closes the biggest perf gap. Sketch: 100-line `lru::LruCache<(String, String), Vec<f32>>` with a sync mutex. Would close ~95% of the 63× gap.

2. **Drop the Python sidecar** — Rewrite Rust aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly (skip Python). Reduces operations surface by one runtime + one process, gains performance, removes Python deploy dependency. The pydantic validation isn't doing anything Rust's serde can't already.

3. **Materializer port (Rust → Go)** — Currently Rust-only. Without it Go can audit but not produce. Most-blocking missing piece for Go-only operation.

4. **Validator port (Rust → Go)** — FillValidator + EmailValidator + `/v1/validate` + `/v1/iterate`. Production safety nets that Go doesn't have. About 1,300 lines of Rust to port.

5. **Cross-runtime contract tests** — pin the JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` and add Go-side validators that read the same definitions. Currently we have ad-hoc parity tests; a formal contract test would catch drift.

6. **Decide on Lance** — Rust has it, Go doesn't. For 5K-500K row corpora it doesn't matter. For 5M+ corpora the Lance backend wins on disk scaling. If staffing demand grows beyond a million workers, port it; otherwise leave it.

### If keeping Go primary

- **Port materializer first** (highest leverage — unblocks full pipeline)
- **Port replay second** (closes audit-FULL phase 7 live invocation)
- **Port validators third** (production safety)
- **Skip Lance until corpus growth demands it**
- **Keep chatd, observer fail-safe, role gate, multi-corpus matrix** — these are real Go wins worth preserving

### If keeping Rust primary

- **Drop Python sidecar — call Ollama directly from aibridge**. Single biggest perf gain available.
- **Add embed cache in aibridge** (LRU). Closes most of the perf gap to Go.
- **Port chatd's 5-provider dispatcher to Rust** — Go has unified cloud LLM access; Rust's v1/chat is single-provider.
- **Port the cross-role gate** to Rust matrix retrieve — Go's role gate prevents real bleed (verified through real_001..005). Rust matrix-retrieve doesn't have it.
- **Consider process splitting** — gateway at 14.9GB RSS is operationally awkward. Even partial decomposition (split out vectord into its own process) would help.

## Recommendation

**Go for the primary line, Rust for production-bridge maintenance.**

Reasons:
1. **Performance** — Go is 63× faster on the embed hot path, and the gap is structural (Python sidecar) not implementation-level.
2. **Operations** — process isolation is genuinely operationally simpler than a 14.9GB mega-binary at production scale.
3. **Code volume** — Go does the same job in 43% the lines; less surface area to audit, fewer places for bugs.
4. **Cross-runtime parity verified** — every artifact (audit_baselines, sft_export, scored runs) round-trips byte-equal. Operators running Go don't lose Rust-side compatibility.
5. **The 4 missing pieces are bounded** — materializer + replay + validators + RAG/preference exports are concrete porting targets, not research questions.

But **don't abandon Rust**:
1. devop.live/lakehouse/ runs through Rust today. Cutover is a multi-week process; Rust must stay healthy.
2. Several Go improvements (validators, Lance, mature HNSW lib) would be downstream of Rust patterns. Keeping Rust live means the substrate keeps evolving — anything new there is a porting opportunity for Go.
3. The Python sidecar drop + embed cache are valuable Rust improvements regardless. Rust under those would be 2-3x as competitive.

## Bottom line

The substrate is parallel-mature on both sides for the audit/observation surface. The producer side (materializer/replay/validators) is Rust-only. Performance favors Go ~60× on warm workloads, structurally driven by the Python-sidecar architectural choice on Rust. Operations favor Go on process isolation. Production deployment status favors Rust today.

If the goal is "find the right primary line and harden the other,"
both should drop the Python sidecar and add embed caches first —
those are universal wins. Then port materializer + replay to Go for
end-to-end Go operation; or stay Rust-primary and improve the
process model. Both paths are valid; the deciding factor is operations
preference.