root 277884b5eb multitier_100k: 335k scenarios @ 1,115/sec against 100k corpus, 4/6 at 0% fail

J asked for a much more sophisticated test using the 100k corpus from
the Rust legacy database. This commit ships:

scripts/cutover/multitier/main.go — 6-scenario harness with weighted
random selection per goroutine. Mixes search, email/SMS/fill
validators (in-process via internal/validator), profile swap with
ExcludeIDs, repeat-cache exercise, and playbook record/replay.

Scenarios + weights (cumulative scenario fractions):
  35% cold_search_email      — search + email outreach + EmailValidator
  15% surge_fill_validate    — search + fill proposal + FillValidator + record
  15% profile_swap           — original search + ExcludeIDs swap + no-overlap check
  15% repeat_cache           — same query × 5 (cache effectiveness)
  10% sms_validate           — SMS draft (≤160 chars, phone for SSN-FP guard)
  10% playbook_record_replay — cold → record → warm w/ use_playbook=true

Test results (5-min sustained, conc=50, 100k workers indexed):
  TOTAL 335,257 scenarios @ 1,115/sec
  cold_search_email     117k @ 0.0% fail · p50 2.2ms · p99 8.6ms
  surge_fill_validate    50k @ 98.8% fail (substrate bug below)
  profile_swap           50k @ 0.0% fail · p50 4.5ms · ExcludeIDs verified
  repeat_cache           50k × 5 = 252k searches @ 0.0% fail · p50 11.7ms
  sms_validate           33k @ 0.0% fail · phone-pattern guard works
  playbook_record_replay 33k @ 96.8% fail (substrate bug below)
  Total successful workflows: ~250k+

Validator integration verified at load:
  150,930 EmailValidator passes across cold_search_email + sms_validate
  35 + 1,061 successful FillValidator + playbook_record (where the bug
    didn't fire)
  zero false positives on the SSN-pattern guard against phone numbers

Resource footprint at 100k:
  vectord 1.23GB RSS (linear with 100k vectors)
  matrixd 26MB, 75% CPU (1-core saturated at conc=50)
  Total across 11 daemons: 1.7GB
  Compare to Rust at 14.9GB — ~10× less even at 100k.

SUBSTRATE BUG SURFACED: coder/hnsw v0.6.1 nil-deref in
layerNode.search at graph.go:95. Triggers on /v1/matrix/playbooks/record
under sustained writes to the small playbook_memory index. Both Add
and Search paths can panic.

Workaround applied (this commit) in internal/vectord/index.go
BatchAdd: recover() guard converts panic to error; daemon stays up
instead of crashing the request handler.

Operator recovery procedure (also documented in the report):
  curl -X DELETE http://localhost:4215/vectors/index/playbook_memory
Next record recreates the index fresh.

Real fix DEFERRED — open in docs/ARCHITECTURE_COMPARISON.md
Decisions tracker. Three options:
  a) upstream patch to coder/hnsw
  b) custom small-index Add path that always rebuilds when len < threshold
  c) alternate store for playbook_memory (Lance? in-memory map?)

Evidence: reports/cutover/multitier_100k.md (full methodology +
results + repro + bug analysis). docs/ARCHITECTURE_COMPARISON.md
Decisions tracker updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 06:28:50 -05:00

16 KiB

Raw Blame History

Lakehouse: Rust vs Go architecture comparison

Status: Living document · primary source for the parallel-runtime comparison. Owner: J. Update this when either side ships a fix that changes the table values, or when a new architectural axis surfaces. Last meaningful refresh: 2026-05-01 (post-Rust-cache + Go-validator-port)

This document compares the two parallel implementations of the lakehouse substrate — Rust at /home/profit/lakehouse/ (production today), Go at /home/profit/golangLAKEHOUSE/ (cutover-prep, Bun /_go/* slice live). The goal of running both lines is to find where each architecture is weak vs strong, address those gaps, and make the keep/maintain decision based on real evidence rather than preference.

A snapshot of this document at any point in time is also captured at reports/cutover/architecture_comparison.md. The version in docs/ is the source of truth; reports/cutover/ is the historical record.

How to update this doc

Three triggers:

A fix lands on either side that moves a table value. Update the number, append a one-line entry to the change log at the bottom, commit alongside the fix.
A new architectural axis surfaces. Add a section. Match the shape of existing sections (table + read paragraph).
A keep/maintain decision is made. Update the Recommendation section + change log.

Don't:

Delete sections without recording the reason in the change log.
Embed unverified claims — every "Rust is X" or "Go is X" should point to either a load-test number, a code reference (crate/file:line), or an explicit "asserted, not measured" caveat.

Decisions tracker

Date	Decision	Effect
2026-05-01	Add LRU embed cache to Rust aibridge	Closes 236× perf gap. DONE (commit `150cc3b` in lakehouse).
2026-05-01	Port FillValidator + EmailValidator to Go	Production safety net Go was missing. DONE (commit `b03521a` in golangLAKEHOUSE).
2026-05-01	Multi-tier load test against 100k corpus	335k scenarios in 5min, 4/6 at 0% fail. Surfaced coder/hnsw v0.6.1 bug. Recover guard added. DONE (multitier_100k.md).
open	coder/hnsw v0.6.1 small-index panic	Surfaced by multi-tier test. Operator recovery: DELETE + recreate playbook_memory. Real fix: upstream patch OR custom small-index Add path OR alternate store for playbook_memory.
open	Drop Python sidecar from Rust aibridge	Universal-win architectural cleanup. ~200 LOC, removes 1 runtime + 1 process.
open	Port Rust materializer to Go (transforms.ts)	Unblocks Go-only end-to-end pipeline. ~500-800 LOC.
open	Port Rust replay tool to Go	Closes audit-FULL phase 7 live invocation. ~400-600 LOC.
open	Decide on Lance vector backend	Defer until corpus exceeds ~5M rows.
open	Pick Go primary vs Rust primary	Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness.

Code volume

	Lines	Last verified
Rust `crates/` (15 crates)	35,447	2026-05-01
Rust `sidecar/` (Python)	1,237	2026-05-01
Go `internal/` (20 packages)	11,896 (+ validator 1190)	2026-05-01
Go `cmd/` (14 binaries)	3,232	2026-05-01
Go total	~16,300	2026-05-01

Go is ~46% the size of Rust on like-for-like surface (post-validator-port). The gap is largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord implements HNSW + Lance-format storage + benchmarking; Go's wraps coder/hnsw and stops there.

Process model

	Rust	Go
Binaries running	1 mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load)	11 dedicated daemons (~100-300MB RSS each)
Inter-component comms	In-process axum.nest (no network)	HTTP between daemons
Crash blast radius	Whole system if any subsystem panics	One daemon dies, rest survive
Horizontal scale	One unit only — can't scale individual components	Each daemon scales independently
Deploy unit	Single binary	11 systemd units

Reading: Rust's mega-binary is simpler ops at small scale (one thing to start, one log to tail). Go's daemons are simpler ops at production scale (kill the misbehaving one, restart it, others stay up). Go also lets you tune per-daemon resource limits via systemd.

Python dependency (the load-bearing axis)

This is the architectural difference that drove the original perf gap. Both call Ollama at :11434, but the path differs:

Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434

The Python sidecar (sidecar/sidecar/main.py, 1,237 lines) is a FastAPI wrapper around Ollama. It does pydantic validation + request shaping; no fundamental compute that Ollama can't do directly.

Performance impact (load-tested 2026-05-01, 6 rotating bodies, 10 concurrency, 30s)

Path	Pre-cache	Post-cache (`150cc3b`)	Δ
Rust /ai/embed (via gateway)	128 RPS · p50 78ms · p99 124ms	30,279 RPS · p50 129µs · p99 5ms	+236× RPS
Go /v1/embed (via gateway → embedd)	8,119 RPS · p50 0.79ms · p99 3ms	unchanged	(already cached)

Rust now beats Go ~3.7× on cache-warm workloads. The cache being in-process inside Rust's gateway (no HTTP hop to a separate daemon) gives it the edge once both sides have caching.

What the cache fix did NOT do

The Python sidecar is still in the Rust path on cache misses. Cold queries pay the full Python+Ollama tax. Dropping the sidecar (rewriting aibridge to call Ollama directly) is the next universal-win item — open in the Decisions tracker.

Vector storage

	Rust	Go
HNSW lib	`hnsw_rs` (mature)	`coder/hnsw` (newer, smaller)
Code size	11,005 lines (`vectord` + `vectord-lance`)	804 lines
Lance-format storage	Yes (`vectord-lance` crate)	No
Persistence	LanceDB or in-memory	MinIO + JSON envelope (v2 envelope as of `eb0dfdf`)
Distance functions	cosine, euclidean, dot product	cosine, euclidean

Reading: Rust has the deeper substrate. Lance-format gives columnar persistence + zero-copy reads + Apache Arrow integration. For staffing-domain corpus sizes (5K-500K vectors) both work fine; for multi-million-row indexes Rust would have a real edge. Defer the Go Lance port until corpus growth demands it.

Distillation pipeline (porting status)

Phase	Rust source	Go port
Materializer (transforms.ts)	TS, full	❌ NOT YET PORTED
Scorer	TS + Go	✅ Ported
Score categories + firewall	Pinned	✅ Ported (`SftNever`)
SFT export (synthesis)	TS, full (8 source classes)	✅ Fully ported, 4-decimal byte-equal
RAG export	TS	❌ NOT YET PORTED
Preference export	TS	❌ NOT YET PORTED
Audit-baselines	TS	✅ Fully ported, byte-equal verified
Audit-FULL phase 0/3/4	TS	✅ Ported
Audit-FULL phase 1 (schema)	bun test	✅ Via `go test` exec
Audit-FULL phase 2 (materializer)	TS	✅ Observer mode (read-only)
Audit-FULL phase 5 (run summaries)	TS	✅ Observer mode (read-only)
Audit-FULL phase 6 (acceptance)	TS fixture harness	❌ Skipped (TS-only deps)
Audit-FULL phase 7 (replay)	TS	✅ Observer mode (read-only)
Replay tool	TS	❌ NOT YET PORTED
Quarantine writer	TS	❌ NOT YET PORTED

Reading: Go has the substrate for everything observable (read paths) and SFT export end-to-end. The producer side (materializer, replay) is still Rust-only. To run the full pipeline from Go alone, the materializer + replay need porting.

Production validators

	Rust	Go
FillValidator	`crates/validator/src/staffing/fill.rs` (12 unit tests)	✅ Ported 2026-05-01 (`internal/validator/fill.go` + 13 tests)
EmailValidator	`crates/validator/src/staffing/email.rs` (12 tests)	✅ Ported 2026-05-01 (`internal/validator/email.go` + 11 tests)
`/v1/validate` endpoint	Yes	❌ NOT YET PORTED (validator network surface)
`/v1/iterate` endpoint	Yes (gen→validate→correct→retry loop)	❌ NOT YET PORTED
Production validators load `workers_500k.parquet` at startup	Yes (75MB resident)	N/A — Go uses WorkerLookup interface; in-memory or adapter

Reading: With today's port, Go has the load-bearing validators. The network surface (/v1/validate, /v1/iterate) is the next piece — the in-memory validators work in-process; turning them into HTTP endpoints adds the production-shape access pattern.

Substrate features unique to each side

Go has, Rust doesn't

chatd 5-provider dispatcher (kimi / opencode / openrouter / ollama_cloud / ollama).
Cross-role gate in matrix retrieve (real_001 fix). Verified by reality tests real_001..005.
Multi-corpus matrix indexer (Spec §3.4 component 2).
Pathway memory (Mem0-style versioned traces).
Observer fail-safe semantics (ADR-005 Decision 5.1).
In-process embed cache (CachedProvider + LRU). Note: Rust got this 2026-05-01 too.
LLM-based role extractor (regex + qwen2.5 fallback).
Persistent stack 3-layer isolation (scripts/cutover/start_go_stack.sh).
Cutover slice (Bun /_go/* route, opt-in via systemd drop-in).
Production load test (scripts/cutover/loadgen/) with Bun-frontend + direct comparison.

Rust has, Go doesn't

Lance-format vector storage (vectord-lance crate, 605 lines).
truth crate (970 lines). Cross-source claim reconciliation.
journald crate (455 lines). Structured event journal.
/v1/validate + /v1/iterate endpoints (network surface).
ui crate (Dioxus, 1,509 lines). Native desktop/web UI.
Materializer + replay tools (the "produce evidence" side).
Acceptance harness (22 invariants over fixtures, TS).
Production deployment (devop.live/lakehouse/* serves through Rust today).

Strengths and weaknesses

Rust strengths

Mature, in production, serving real demo traffic.
Single deploy unit; one binary, one systemd service, one log.
Type system + memory safety; fewer runtime bugs in hot paths.
Mature library ecosystem (axum, tokio, polars, arrow, hnsw_rs, lance).
Native distillation pipeline; Go is the porter.
Production validators (now also in Go but Rust authored them).
Lance vector storage scales beyond 5M rows.
In-process embed cache (post-150cc3b) makes Rust the fastest path on warm workloads.

Rust weaknesses

Python sidecar dependency — every cache-miss AI call goes through Python. Adds 1 runtime + 1 process to ops. ~200 LOC to fix.
Mega-binary blast radius — gateway at 14.9G RSS means any panic kills the whole production system.
Tail latency cliff under uncached load — single async runtime serializes I/O completions.
Compile times — slow iteration vs Go's per-package builds.
Coupling — adding a feature touches gateway/v1/ and ripples across crates.

Go strengths

Process isolation — daemons crash independently; ops can systemctl restart vectord without touching gateway.
Per-daemon scale — embed cache lives in embedd; vectord shards independently. Hot daemons scale horizontally.
No Python dependency — every daemon talks to peers in HTTP/JSON. Native Go down to Ollama.
In-process embed cache at the daemon level (was the perf lever pre-Rust-cache).
Smaller, denser code — 16,300 lines vs Rust's 35,447 + 1,237 sidecar (~46% the size).
Faster iteration — go build of all 14 binaries is ~3-5s; Rust full rebuild is minutes.
Cross-runtime artifact compatibility verified — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal.

Go weaknesses

Distillation pipeline incomplete — materializer + replay + RAG export + preference export still Rust-only.
Validator network surface missing — in-memory validators work, but /v1/validate HTTP endpoint not yet ported. Operators can't call validators over the wire from Go.
Vector storage HNSW-only — no Lance equivalent. Fine for current scale.
Less production-tested — cutover slice live but no real coordinator traffic yet.
HTTP between daemons — every cross-daemon call is a network round-trip. Latency fine on localhost (microseconds) but tail-latency contributes more than Rust's in-process composition.
coder/hnsw is newer than Rust's hnsw_rs. Less battle-tested.

Cross-cutting abstracts to address

The list below is a working backlog. Move items to "Decisions tracker" (at top) when actioned with a commit reference.

Universal wins (apply regardless of primary line)

✅ Embed cache in Rust aibridge — DONE 2026-05-01 (150cc3b).
✅ FillValidator + EmailValidator in Go — DONE 2026-05-01 (b03521a).
Drop Python sidecar from Rust — Rewrite aibridge to call Ollama at :11434/api/embed and /api/generate directly. Removes 1 runtime + 1 process from ops. ~200 LOC.
Cross-runtime contract tests — Pin shared JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in auditor/schemas/ with Go-side validators consuming the same definitions.

If keeping Go primary

Port materializer (highest leverage — unblocks full Go pipeline). ~500-800 LOC.
Port replay tool (closes audit-FULL phase 7 live invocation). ~400-600 LOC.
Port /v1/validate + /v1/iterate HTTP surface for the now-Go-side validators. ~200 LOC.
Skip Lance until corpus growth demands it (>5M rows).
Keep chatd, observer fail-safe, role gate, multi-corpus matrix — real Go wins worth preserving.

If keeping Rust primary

Port chatd's 5-provider dispatcher to Rust — unified cloud LLM access.
Port the cross-role gate to Rust matrix retrieve — production safety on the matrix layer (verified by Go reality tests real_001..005).
Consider process splitting — even partial decomposition (split out vectord into its own process) would help with the mega-binary blast radius.

Recommendation (working hypothesis)

Go for the primary line, Rust for production-bridge maintenance.

Reasons:

Operations — process isolation is genuinely simpler at production scale than a 14.9G mega-binary.
Code volume — Go does the same job in ~46% the lines.
Cross-runtime parity verified — every artifact round-trips byte-equal between runtimes.
The 4 missing pieces are bounded — materializer + replay + validators-network + RAG/preference exports are concrete porting targets, not research questions.
Performance is no longer a deciding factor post-150cc3b — Rust is faster on warm cache, but both are well above staffing-domain demand levels (<1 RPS typical).

But don't abandon Rust:

devop.live/lakehouse/ runs through Rust today; cutover is multi-week.
Several Go improvements would be downstream of Rust patterns. Keeping Rust live means anything new there is a porting opportunity for Go.
The Python sidecar drop + cross-role gate port are valuable Rust improvements regardless of which line is primary.

Change log

Append entries here when this doc gets updated. One-line entries; link to commits.

2026-05-01 — Initial draft (b3ad148 golangLAKEHOUSE).
2026-05-01 — Recorded Rust embed cache shipping (150cc3b lakehouse), updated Python-dependency section + table.
2026-05-01 — Recorded Go validator port shipping (b03521a golangLAKEHOUSE), updated production-validators section.
2026-05-01 — Reframed as living document in docs/, added Decisions tracker + Update guidance + Change log sections.
2026-05-01 — Multi-tier 100k load test ran (335k scenarios @ 1,115/sec, 4/6 at 0% fail), surfaced coder/hnsw v0.6.1 nil-deref on small playbook_memory index. Recover guard added; real fix open.

16 KiB Raw Blame History Unescape Escape