golangLAKEHOUSE

History

root b3ad14832d architecture_comparison: Rust vs Go lakehouse — weaknesses, strengths, abstracts to address

J asked for the comparison before locking in primary line. This
report documents what's actually structurally different vs
implementation-level different, and what to do about each.

Key findings:

1. Python sidecar is the single biggest architectural lever
   - Rust: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama
   - Go:   gateway → HTTP → embedd → HTTP → Ollama (no Python)
   - Sidecar adds zero compute over Ollama (just pydantic + httpx)
   - 63× perf gap (8,119 vs 128 RPS) driven by sidecar + cache absence

2. Process model: Rust 1 mega-binary (14.9G RSS), Go 11 daemons
   - Rust: simpler ops at small scale, panic blast radius = whole system
   - Go: per-daemon scale + crash isolation, more config surface

3. Code volume: Go 15,128 lines vs Rust 35,447 + 1,237 sidecar
   - Go is 43% the size doing similar work
   - Gap concentrated in vectord (Rust 11k lines, Go 804 — Lance + benchmarking)

4. Distillation pipeline asymmetry
   - Audit/observation: BOTH sides parallel-mature
   - Production: Rust-only (materializer + replay + RAG/pref export)
   - Go can READ everything but can't PRODUCE evidence

5. Production validators (FillValidator/EmailValidator/'/v1/validate')
   - Rust has them (1,286 lines, 12 tests each)
   - Go doesn't — matrix gate covers role bleed but not structural validation

Cross-cutting abstracts to address regardless of which wins:
- Drop Python sidecar from Rust (call Ollama directly)
- Add LRU embed cache to Rust aibridge
- Port materializer + replay + validators to Go
- Pin shared JSONL schemas as canonical (both runtimes consume same spec)
- Decide on Lance backend (defer until corpus > 5M rows)

If keeping Go primary: port materializer first, validators second,
skip Lance. If keeping Rust primary: drop Python + add cache,
port chatd 5-provider dispatcher + cross-role gate from Go.

Bottom line: substrate is parallel-mature on observation; producer
side is Rust-only; performance structurally favors Go ~60× on warm
workloads; operations favors Go on isolation; production deployment
favors Rust today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 04:34:24 -05:00

architecture_comparison.md

architecture_comparison: Rust vs Go lakehouse — weaknesses, strengths, abstracts to address

2026-05-01 04:34:24 -05:00

audit_baselines_roundtrip.md

audit_baselines: round-trip validation against live Rust data

2026-05-01 00:20:18 -05:00

audit_full_go_vs_rust.md

audit-FULL: port phases 1/2/5/7 — only acceptance.ts (TS-only) remains skipped

2026-05-01 02:35:13 -05:00

embed_parity_20260430_v1.md

G5 cutover prep: embed parity probe — Rust /ai/embed ↔ Go /v1/embed verified