golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md
root 857ca4c971 catalogd: HTML-safe escape fix + decisions tracker entry
Per 2026-05-03 step_7_8_retention_and_parity scrum (opus WARN on
parity_subject_audit.rs:canonical_json):

Go's json.Marshal HTML-escapes < > & to < > & by
default. Rust's serde_json::to_vec keeps them literal. Any audit
row with these chars in any string field would silently produce
different canonical bytes across runtimes → broken HMAC chain.
Latent because no production audit field has carried <>& yet, but
realistic for purpose strings ("error & retry") or trace_id values
("<HTTP-Request-Id>").

Fix: marshalNoEscapeHTML helper wraps json.Encoder.SetEscapeHTML(false)
+ trims trailing newline. Routed through writeCanonical for both
keys and scalar values.

Regression test: TestVerifyChain_HtmlChars_NotEscaped (purpose has &,
trace_id has <>) asserts the canonical bytes contain literal chars,
not escape sequences.

11 unit tests pass including the new one; parity probe still 6/6
byte-identical against live production audit logs.

Decisions tracker: added 2026-05-03 entry for SUBJECT_MANIFESTS_ON_CATALOGD
Steps 1-8 closure + 6th cross-runtime parity probe (was 5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:29:53 -05:00

27 KiB
Raw Permalink Blame History

Lakehouse: Rust vs Go architecture comparison

Status: Living document · primary source for the parallel-runtime comparison. Owner: J. Update this when either side ships a fix that changes the table values, or when a new architectural axis surfaces. Last meaningful refresh: 2026-05-02 (post-Lance-gauntlet + observability parity)

This document compares the two parallel implementations of the lakehouse substrate — Rust at /home/profit/lakehouse/ (production today), Go at /home/profit/golangLAKEHOUSE/ (cutover-prep, Bun /_go/* slice live). The goal of running both lines is to find where each architecture is weak vs strong, address those gaps, and make the keep/maintain decision based on real evidence rather than preference.

A snapshot of this document at any point in time is also captured at reports/cutover/architecture_comparison.md. The version in docs/ is the source of truth; reports/cutover/ is the historical record.


How to update this doc

Three triggers:

  1. A fix lands on either side that moves a table value. Update the number, append a one-line entry to the change log at the bottom, commit alongside the fix.
  2. A new architectural axis surfaces. Add a section. Match the shape of existing sections (table + read paragraph).
  3. A keep/maintain decision is made. Update the Recommendation section + change log.

Don't:

  • Delete sections without recording the reason in the change log.
  • Embed unverified claims — every "Rust is X" or "Go is X" should point to either a load-test number, a code reference (crate/file:line), or an explicit "asserted, not measured" caveat.

Decisions tracker

Date Decision Effect
2026-05-01 Add LRU embed cache to Rust aibridge Closes 236× perf gap. DONE (commit 150cc3b in lakehouse).
2026-05-01 Port FillValidator + EmailValidator to Go Production safety net Go was missing. DONE (commit b03521a in golangLAKEHOUSE).
2026-05-01 Multi-tier load test against 100k corpus 335k scenarios in 5min, 4/6 at 0% fail. Surfaced coder/hnsw v0.6.1 bug. Recover guard added. DONE (multitier_100k.md).
2026-05-01 coder/hnsw v0.6.1 panic — REAL FIX landed Lifted source-of-truth out of coder/hnsw via i.vectors map[string][]float32 side store + safeGraphAdd/safeGraphDelete recover wrappers + warm-path rebuild fallback. Re-run: 0 failures across 19,622 scenarios (was 96-98% on 2/6). DONE. Architecture invariant in STATE_OF_PLAY "DO NOT RELITIGATE".
2026-05-02 Substrate fix verified at original failure scale Re-ran multitier 5min @ conc=50 (the footprint that originally surfaced the bug at 96-98% fail). Result: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure. Throughput dropped 1,115/sec → 438/sec because broken scenarios now do real HNSW Add work. Tails healthy: surge_fill_validate p99=1.53s, playbook_record_replay p99=2.32s. Fix scales — closing the open thread.
2026-05-02 Drop Python sidecar from Rust aibridge — DONE crates/aibridge/src/client.rs rewrite (commit ba928b1 in lakehouse). AiClient now talks Ollama directly: per-text /api/embed loop, /api/generate for chat + rerank-loop + admin (unload/preload), /api/ps + std::process::Command nvidia-smi for vram_snapshot. Public API unchanged — 0 callers updated. Verification: cargo test -p aibridge 32/32 PASS, live smoke /ai/embed returns 768-dim vector + /v1/chat returns "OK". Sidecar's lab_ui.py + pipeline_lab.py (~888 LOC dev-only UIs) keep running; only the hot-path embed/generate/rerank/admin routes are retired. Process count drops from "mega-binary + sidecar" to "mega-binary alone".
2026-05-02 Rust observability parity (trace-id + session JSONL) — DONE Mirrors the Go-side wave (commits d6d2fdf + 1a3a82a in golangLAKEHOUSE). crates/gateway/src/v1/iterate.rs now reads X-Lakehouse-Trace-Id header, forwards to /v1/chat + /v1/validate hops, emits per-attempt Langfuse spans (iterate.attempt[N]), and writes one SessionRecord JSONL row per session via the new crates/gateway/src/v1/session_log.rs writer. New crates/gateway/src/bin/parity_session_log helper enables the cross-runtime parity probe at scripts/cutover/parity/session_log_parity.sh4/4 fixtures byte-equal (including unicode prompts) after normalizing the 2 fields that legitimately differ (timestamp, daemon name). Both runtimes can write to the same path and DuckDB queries see them as one stream. 90/90 Rust unit tests PASS, 33 Go packages PASS.
2026-05-02 Cross-runtime parity gap closure — DONE Two follow-ups from the deploy + reality wave: (1) Go validatord didn't honor X-Lakehouse-Trace-Id when Langfuse middleware was a passthrough (no env config) — fixed by reading the header DIRECTLY in the iterate handler before the ctx fallback (Go commit 6847bbc). (2) Rust IterateResponse/IterateFailure didn't echo trace_id back to callers — added the field with skip-serializing-if-none for backward compat (Rust commit 98b6647). Plus enabled [gateway].session_log_path on live Rust pointing at the SAME path Go writes to, so DuckDB queries see one unified longitudinal log. Verified end-to-end: identical /v1/iterate to both runtimes lands in one sessions.jsonl tagged daemon=gateway (Rust) vs daemon=validatord (Go); response bodies both echo the forwarded trace_id. 24/24 parity assertions (validator+extract_json+session_log+materializer) hold post-restart.
2026-05-02 Port Rust materializer to Go (transforms.ts) — DONE internal/materializer + cmd/materializer + materializer_smoke.sh. Ports transforms.ts (12 transforms) + build_evidence_index.ts. Idempotency, day-partition, receipt. 14 tests green; on-wire JSON matches TS so both runtimes interoperate.
2026-05-02 Port Rust replay tool to Go — DONE internal/replay + cmd/replay + replay_smoke.sh. Ports replay.ts retrieve → bundle → /v1/chat → validate → log. Closes audit-FULL phase 7 live invocation on Go side. 14 tests green; same data/_kb/replay_runs.jsonl shape (schema=replay_run.v1) as TS.
2026-05-02 /v1/validate + /v1/iterate HTTP surface — DONE cmd/validatord (port 3221) hosts both endpoints. internal/validator gains PlaybookValidator (3rd kind), JSONL roster loader, and the Iterate orchestrator + ExtractJSON helper. Gateway proxies /v1/validate + /v1/iterate to validatord. Closes the last "Go-primary" backlog item (architecture_comparison.md item #7). 30+ tests + validatord_smoke.sh 5/5 PASS.
2026-05-02 Cross-runtime validator parity probe — surfaced wire-format gap New scripts/cutover/parity/validator_parity.sh runs 6 identical /v1/validate cases against Rust :3100 AND Go :4110, compares status + body. Result: 6/6 status codes match (logic-level equivalence holds), 5/6 body shapes diverge. Rust returns serde-tagged enum {"Schema":{"field":"x","reason":"y"}}; Go returns flat struct {"Kind":"schema","Field":"x","Reason":"y"}. Any caller parsing the error envelope would break in cutover. Open: pick a target shape (Go matching Rust is the cutover-friendly direction) and align via custom MarshalJSON on ValidationError.
2026-05-02 Materializer parity probe — caught + fixed real bug New scripts/cutover/parity/materializer_parity.sh runs Bun + Go materializer on identical synthetic root, diffs output JSONL. Result on first run: 0/2 match — Go's Provenance.LineOffset had json:",omitempty" and stripped the field on first-row records (line_offset=0 is semantically meaningful, not absent). 1-line fix (drop omitempty + comment explaining why). Re-run: 2/2 match. Real cross-runtime gap surfaced + closed in same wave.
2026-05-02 extract_json parity probe — 12/12 match across edge cases New scripts/cutover/parity/extract_json_parity.sh runs identical model-output strings through Rust gateway::v1::iterate::extract_json AND Go validator.ExtractJSON. 12 fixtures: fenced/unfenced blocks, nested objects, unicode, escaped quotes, top-level array, malformed JSON. Substrate gate: cargo test -p gateway extract_json PASS before probe. Result: 12/12 match. Algorithms genuinely equivalent. Rust side gained pub on extract_json + new bin/parity_extract_json (~30 LOC).
2026-05-02 Validator wire-format alignment — DONE Custom MarshalJSON/UnmarshalJSON on Go's validator.ValidationError emits the Rust serde-tagged-enum shape {"Schema":{"field":"x","reason":"y"}}. UnmarshalJSON also accepts the legacy flat shape (migration safety) and rejects unknown variants (drift guard for future Rust enum additions). 4 new pinning tests in types_test.go. Re-run validator parity probe: 6/6 match (was 1/6).
2026-05-02 Lance backend gauntlet (4-pack + root-cause fix) — DONE Lance crate had zero tests + no smoke when audited this morning. Shipped: (a) sanitize_lance_err over all 5 routes (search/doc/index/append/migrate) — missing-index now 404 not 500, no /home/ or /root/.cargo/ paths leaked; (b) 7 unit tests in crates/vectord-lance with synth Parquet helper; (c) 9-probe scripts/lance_smoke.sh against live :3100; (d) 10M re-bench (reports/lance_10m_rebench_2026-05-02.md) — search warm ~20ms, search cold ~46ms median. Bench surfaced doc-fetch p50 ~100ms (300x slower than ADR-019 100K projection); root-caused to lance-bench bypassing IndexMeta → warming auto-build never fired → no doc_id btree. Fix shipped (commit 5d30b3d): lance_migrate HTTP handler now auto-builds the btree inline (1.2s on 10M, +269MB), drops doc-fetch to ~5ms (20x). Live verified 9/9 smoke + post-restart doc-fetch 4-15ms.
2026-05-03 Subject manifests + per-subject HMAC audit log — DONE on Rust + Go Local-first compliance substrate per lakehouse/docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md Steps 1-8. Rust shipped: SubjectManifest type + Registry CRUD (crates/catalogd/src/registry.rs), SubjectAuditWriter with HMAC-SHA256 chain + per-subject Mutex serialization + canonical-JSON via BTreeMap (subject_audit.rs), backfill ETL (bin/backfill_subjects), gateway tool dispatch + validator decorator wiring, legal-tier /audit/subject/{id} endpoint with constant-time-eq token + tampering detection, daily bin/retention_sweep (BIPA-aware, idempotent, no auto-mutation). Go shipped: identical internal/catalogd/subject.go reader + VerifyChain over RAW LINE BYTES (avoids time-precision drift), 11 unit tests. 6th cross-runtime parity probe: scripts/cutover/parity/subject_audit_parity.sh — 6/6 byte-identical assertions across known-answer fixture + 5 real production audit logs. Surfaced + closed three drift classes in authoring loop: (1) Go omitempty stripping trace_id:""; (2) time.RFC3339Nano truncating trailing-zero nanoseconds where chrono AutoSi keeps 9 digits; (3) Go json.Marshal HTML-escaping <>& where serde keeps literal — fixed via marshalNoEscapeHTML + raw-bytes canonicalization. Two cross-lineage scrums caught real bugs each round (chain corruption race, schema-evolution HMAC drift, hardcoded "success" classifier, token min length, chain_root from windowed slice, tampering detection, HTML escape divergence).
open Decide Lance vs Parquet+HNSW for primary Lance verified production-ready at 10M (this morning's gauntlet). HNSW at 10M doesn't fit RAM (~60GB for vectors+graph), so the comparison is between Lance and Parquet+HNSW-with-spilling. Decide once we have a 10M ingest scenario where the Parquet path is bottlenecked.
open Pick Go primary vs Rust primary Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness.

Code volume

Lines Last verified
Rust crates/ (15 crates) 35,447 2026-05-01
Rust sidecar/ (Python) 1,237 2026-05-01
Go internal/ (20 packages) 11,896 (+ validator 1190) 2026-05-01
Go cmd/ (14 binaries) 3,232 2026-05-01
Go total ~16,300 2026-05-01

Go is ~46% the size of Rust on like-for-like surface (post-validator-port). The gap is largely vectord (Rust 11,005 lines vs Go 804) — Rust's vectord implements HNSW + Lance-format storage + benchmarking; Go's wraps coder/hnsw and stops there.


Process model

Rust Go
Binaries running 1 mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) 11 dedicated daemons (~100-300MB RSS each)
Inter-component comms In-process axum.nest (no network) HTTP between daemons
Crash blast radius Whole system if any subsystem panics One daemon dies, rest survive
Horizontal scale One unit only — can't scale individual components Each daemon scales independently
Deploy unit Single binary 11 systemd units

Reading: Rust's mega-binary is simpler ops at small scale (one thing to start, one log to tail). Go's daemons are simpler ops at production scale (kill the misbehaving one, restart it, others stay up). Go also lets you tune per-daemon resource limits via systemd.


Python dependency (the load-bearing axis)

This was the architectural difference that drove the original perf gap.

Pre-2026-05-02:
  Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
  Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434

Post-2026-05-02 (commit ba928b1):
  Rust embed:  gateway → HTTP → Ollama :11434                (sidecar dropped)
  Go embed:    gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434

The hot-path Python sidecar (~120 LOC across embed.py / generate.py / rerank.py / admin.py) was pure pass-through to Ollama and added no logic — just translation. AiClient was rewritten 2026-05-02 to call Ollama directly. lab_ui.py + pipeline_lab.py (~888 LOC of Streamlit-shape dev UIs) remain as ad-hoc tooling, not on the runtime hot path.

Performance impact (load-tested 2026-05-01, 6 rotating bodies, 10 concurrency, 30s)

Path Pre-cache Post-cache (150cc3b) Δ
Rust /ai/embed (via gateway) 128 RPS · p50 78ms · p99 124ms 30,279 RPS · p50 129µs · p99 5ms +236× RPS
Go /v1/embed (via gateway → embedd) 8,119 RPS · p50 0.79ms · p99 3ms unchanged (already cached)

Rust now beats Go ~3.7× on cache-warm workloads. The cache being in-process inside Rust's gateway (no HTTP hop to a separate daemon) gives it the edge once both sides have caching.

What the cache fix did NOT do

The Python sidecar is still in the Rust path on cache misses. Cold queries pay the full Python+Ollama tax. Dropping the sidecar (rewriting aibridge to call Ollama directly) is the next universal-win item — open in the Decisions tracker.


Vector storage

Rust Go
HNSW lib hnsw_rs (mature) coder/hnsw (newer, smaller)
Code size 11,005 lines (vectord + vectord-lance) 804 lines
Lance-format storage Yes (vectord-lance crate) No
Persistence LanceDB or in-memory MinIO + JSON envelope (v2 envelope as of eb0dfdf)
Distance functions cosine, euclidean, dot product cosine, euclidean

Reading: Rust has the deeper substrate. Lance-format gives columnar persistence + zero-copy reads + Apache Arrow integration. For staffing-domain corpus sizes (5K-500K vectors) both work fine; for multi-million-row indexes Rust would have a real edge. Defer the Go Lance port until corpus growth demands it.


Distillation pipeline (porting status)

Phase Rust source Go port
Materializer (transforms.ts) TS, full NOT YET PORTED
Scorer TS + Go Ported
Score categories + firewall Pinned Ported (SftNever)
SFT export (synthesis) TS, full (8 source classes) Fully ported, 4-decimal byte-equal
RAG export TS NOT YET PORTED
Preference export TS NOT YET PORTED
Audit-baselines TS Fully ported, byte-equal verified
Audit-FULL phase 0/3/4 TS Ported
Audit-FULL phase 1 (schema) bun test Via go test exec
Audit-FULL phase 2 (materializer) TS Observer mode (read-only)
Audit-FULL phase 5 (run summaries) TS Observer mode (read-only)
Audit-FULL phase 6 (acceptance) TS fixture harness Skipped (TS-only deps)
Audit-FULL phase 7 (replay) TS Observer mode (read-only)
Replay tool TS NOT YET PORTED
Quarantine writer TS NOT YET PORTED

Reading: Go has the substrate for everything observable (read paths) and SFT export end-to-end. The producer side (materializer, replay) is still Rust-only. To run the full pipeline from Go alone, the materializer + replay need porting.


Production validators

Rust Go
FillValidator crates/validator/src/staffing/fill.rs (12 unit tests) Ported 2026-05-01 (internal/validator/fill.go + 13 tests)
EmailValidator crates/validator/src/staffing/email.rs (12 tests) Ported 2026-05-01 (internal/validator/email.go + 11 tests)
/v1/validate endpoint Yes NOT YET PORTED (validator network surface)
/v1/iterate endpoint Yes (gen→validate→correct→retry loop) NOT YET PORTED
Production validators load workers_500k.parquet at startup Yes (75MB resident) N/A — Go uses WorkerLookup interface; in-memory or adapter

Reading: With today's port, Go has the load-bearing validators. The network surface (/v1/validate, /v1/iterate) is the next piece — the in-memory validators work in-process; turning them into HTTP endpoints adds the production-shape access pattern.


Substrate features unique to each side

Go has, Rust doesn't

  • chatd 5-provider dispatcher (kimi / opencode / openrouter / ollama_cloud / ollama).
  • Cross-role gate in matrix retrieve (real_001 fix). Verified by reality tests real_001..005.
  • Multi-corpus matrix indexer (Spec §3.4 component 2).
  • Pathway memory (Mem0-style versioned traces).
  • Observer fail-safe semantics (ADR-005 Decision 5.1).
  • In-process embed cache (CachedProvider + LRU). Note: Rust got this 2026-05-01 too.
  • LLM-based role extractor (regex + qwen2.5 fallback).
  • Persistent stack 3-layer isolation (scripts/cutover/start_go_stack.sh).
  • Cutover slice (Bun /_go/* route, opt-in via systemd drop-in).
  • Production load test (scripts/cutover/loadgen/) with Bun-frontend + direct comparison.

Rust has, Go doesn't

  • Lance-format vector storage (vectord-lance crate, 605 lines).
  • truth crate (970 lines). Cross-source claim reconciliation.
  • journald crate (455 lines). Structured event journal.
  • /v1/validate + /v1/iterate endpoints (network surface).
  • ui crate (Dioxus, 1,509 lines). Native desktop/web UI.
  • Materializer + replay tools (the "produce evidence" side).
  • Acceptance harness (22 invariants over fixtures, TS).
  • Production deployment (devop.live/lakehouse/* serves through Rust today).

Strengths and weaknesses

Rust strengths

  • Mature, in production, serving real demo traffic.
  • Single deploy unit; one binary, one systemd service, one log.
  • Type system + memory safety; fewer runtime bugs in hot paths.
  • Mature library ecosystem (axum, tokio, polars, arrow, hnsw_rs, lance).
  • Native distillation pipeline; Go is the porter.
  • Production validators (now also in Go but Rust authored them).
  • Lance vector storage scales beyond 5M rows.
  • In-process embed cache (post-150cc3b) makes Rust the fastest path on warm workloads.

Rust weaknesses

  • Python sidecar dependency — every cache-miss AI call goes through Python. Adds 1 runtime + 1 process to ops. ~200 LOC to fix.
  • Mega-binary blast radius — gateway at 14.9G RSS means any panic kills the whole production system.
  • Tail latency cliff under uncached load — single async runtime serializes I/O completions.
  • Compile times — slow iteration vs Go's per-package builds.
  • Coupling — adding a feature touches gateway/v1/ and ripples across crates.

Go strengths

  • Process isolation — daemons crash independently; ops can systemctl restart vectord without touching gateway.
  • Per-daemon scale — embed cache lives in embedd; vectord shards independently. Hot daemons scale horizontally.
  • No Python dependency — every daemon talks to peers in HTTP/JSON. Native Go down to Ollama.
  • In-process embed cache at the daemon level (was the perf lever pre-Rust-cache).
  • Smaller, denser code — 16,300 lines vs Rust's 35,447 + 1,237 sidecar (~46% the size).
  • Faster iterationgo build of all 14 binaries is ~3-5s; Rust full rebuild is minutes.
  • Cross-runtime artifact compatibility verified — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal.

Go weaknesses

  • Distillation pipeline incomplete — materializer + replay + RAG export + preference export still Rust-only.
  • Validator network surface missing — in-memory validators work, but /v1/validate HTTP endpoint not yet ported. Operators can't call validators over the wire from Go.
  • Vector storage HNSW-only — no Lance equivalent. Fine for current scale.
  • Less production-tested — cutover slice live but no real coordinator traffic yet.
  • HTTP between daemons — every cross-daemon call is a network round-trip. Latency fine on localhost (microseconds) but tail-latency contributes more than Rust's in-process composition.
  • coder/hnsw is newer than Rust's hnsw_rs. Less battle-tested.

Cross-cutting abstracts to address

The list below is a working backlog. Move items to "Decisions tracker" (at top) when actioned with a commit reference.

Universal wins (apply regardless of primary line)

  1. Embed cache in Rust aibridge — DONE 2026-05-01 (150cc3b).
  2. FillValidator + EmailValidator in Go — DONE 2026-05-01 (b03521a).
  3. Drop Python sidecar from Rust — DONE 2026-05-02 (ba928b1).
  4. Cross-runtime contract tests — Pin shared JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in auditor/schemas/ with Go-side validators consuming the same definitions.

If keeping Go primary

  1. Port materializer — DONE 2026-05-02 (cmd/materializer).
  2. Port replay tool — DONE 2026-05-02 (cmd/replay).
  3. Port /v1/validate + /v1/iterate HTTP surface — DONE 2026-05-02 (cmd/validatord).
  4. Skip Lance until corpus growth demands it (>5M rows).
  5. Keep chatd, observer fail-safe, role gate, multi-corpus matrix — real Go wins worth preserving.

If keeping Rust primary

  1. Port chatd's 5-provider dispatcher to Rust — unified cloud LLM access.
  2. Port the cross-role gate to Rust matrix retrieve — production safety on the matrix layer (verified by Go reality tests real_001..005).
  3. Consider process splitting — even partial decomposition (split out vectord into its own process) would help with the mega-binary blast radius.

Recommendation (working hypothesis)

Go for the primary line, Rust for production-bridge maintenance.

Reasons:

  1. Operations — process isolation is genuinely simpler at production scale than a 14.9G mega-binary.
  2. Code volume — Go does the same job in ~46% the lines.
  3. Cross-runtime parity verified — every artifact round-trips byte-equal between runtimes.
  4. The 4 missing pieces are bounded — materializer + replay + validators-network + RAG/preference exports are concrete porting targets, not research questions.
  5. Performance is no longer a deciding factor post-150cc3b — Rust is faster on warm cache, but both are well above staffing-domain demand levels (<1 RPS typical).

But don't abandon Rust:

  1. devop.live/lakehouse/ runs through Rust today; cutover is multi-week.
  2. Several Go improvements would be downstream of Rust patterns. Keeping Rust live means anything new there is a porting opportunity for Go.
  3. The Python sidecar drop + cross-role gate port are valuable Rust improvements regardless of which line is primary.

Change log

Append entries here when this doc gets updated. One-line entries; link to commits.

  • 2026-05-01 — Initial draft (b3ad148 golangLAKEHOUSE).
  • 2026-05-01 — Recorded Rust embed cache shipping (150cc3b lakehouse), updated Python-dependency section + table.
  • 2026-05-01 — Recorded Go validator port shipping (b03521a golangLAKEHOUSE), updated production-validators section.
  • 2026-05-01 — Reframed as living document in docs/, added Decisions tracker + Update guidance + Change log sections.
  • 2026-05-01 — Multi-tier 100k load test ran (335k scenarios @ 1,115/sec, 4/6 at 0% fail), surfaced coder/hnsw v0.6.1 nil-deref on small playbook_memory index. Recover guard added; real fix open.
  • 2026-05-01 (later) — coder/hnsw v0.6.1 panic real fix landed: vectord lifts source-of-truth out of coder/hnsw via i.vectors side store + recover wrappers + rebuild fallback. Re-run multitier 60s/conc=50: 0 failures across 19,622 scenarios. STATE_OF_PLAY invariant added to "DO NOT RELITIGATE".
  • 2026-05-02 — Substrate fix verified at original failure-surfacing scale. Multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput drop (1,115 → 438/sec) is the honest cost of the formerly-broken scenarios doing real HNSW Add work. STATE_OF_PLAY refreshed to 2026-05-02.
  • 2026-05-02 — Materializer + replay tool ported from Rust legacy to Go (internal/materializer + internal/replay, both with CLI + smoke + tests). Both runtimes now produce the same data/evidence/YYYY/MM/DD/*.jsonl and data/_kb/replay_runs.jsonl shapes; Go side no longer needs Bun for these phases.
  • 2026-05-02 — /v1/validate + /v1/iterate HTTP surface ported as cmd/validatord on :3221. Closes the last "If keeping Go primary" backlog item — Go now owns the entire validator path end-to-end (no Rust dep for staffing safety net). 5/5 smoke probes via gateway :3110.
  • 2026-05-02 — Python sidecar dropped from Rust hot path (ba928b1). AiClient now talks Ollama directly. Process count: mega-binary + sidecar → mega-binary alone. Public API unchanged, 0 callers updated. Cargo tests 32/32 PASS, live /ai/embed + /v1/chat smokes green on test gateway. Lab UI Python remains as ad-hoc dev tooling.

See also

  • reports/cutover/architecture_comparison.md — historical snapshot (matched this doc as of the date stamp at top).
  • docs/SPEC.md — Go-side architectural spec.
  • docs/DECISIONS.md — Go-side ADRs.
  • /home/profit/lakehouse/docs/DECISIONS.md — Rust-side ADRs.
  • /home/profit/lakehouse/docs/go-rewrite/ — Rust-side notes on the rewrite.
  • reports/cutover/SUMMARY.md — running log of cross-runtime parity probes.
  • reports/cutover/g5_load_test.md — load-test methodology + numbers.