Two new cross-runtime parity probes joining the validator probe from
the gauntlet wave. Pattern: feed identical input through Rust and Go;
diff outputs. Each probe surfaced a different signal.
## Materializer parity probe
scripts/cutover/parity/materializer_parity.sh runs Bun + Go
materializer against an identical synthetic data/_kb/ root, diffs the
resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at).
**First run: 0/2 match.** Real finding: Go's Provenance.LineOffset
had `json:"line_offset,omitempty"` which strips the field when value
is 0. Line offset 0 is the FIRST ROW of every source file — a real
semantic value, not absent. Bun side always emits it.
Fix: drop `omitempty` on Provenance.LineOffset. Updated comment
explaining why.
**Re-run: 2/2 match.** On-wire JSON parity holds.
## extract_json parity probe
scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture
strings through both runtimes' extract_json:
- fenced ```json``` blocks
- unfenced ``` blocks
- bare braces with prose around
- first-balanced-of-many
- nested objects
- unicode in string values
- escaped quotes
- empty object
- top-level array (both return first inner object)
- no JSON
- depth-balanced but invalid syntax
- trailing garbage
Substrate gate: cargo test -p gateway extract_json PASS before probe.
**Result: 12/12 match.** Algorithms genuinely equivalent.
## scripts/cutover/parity/extract_json_helper/main.go
Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints
{matched, value} JSON. Counterpart to the Rust parity_extract_json
binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit).
## Pattern crystallized
Every cross-runtime port should land with a parity probe. Three
probes now exist:
- validator (5/6 wire-format gap captured 2026-05-02)
- materializer (caught + fixed real bug 2026-05-02)
- extract_json (12/12 match 2026-05-02)
The instrument is reusable — each new shared HTTP/CLI surface gets
a probe row added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
335 lines
21 KiB
Markdown
335 lines
21 KiB
Markdown
# Lakehouse: Rust vs Go architecture comparison
|
||
|
||
> **Status**: Living document · primary source for the parallel-runtime
|
||
> comparison.
|
||
> **Owner**: J. Update this when either side ships a fix that changes
|
||
> the table values, or when a new architectural axis surfaces.
|
||
> **Last meaningful refresh**: 2026-05-01 (post-Rust-cache + Go-validator-port)
|
||
|
||
This document compares the two parallel implementations of the lakehouse
|
||
substrate — Rust at `/home/profit/lakehouse/` (production today), Go at
|
||
`/home/profit/golangLAKEHOUSE/` (cutover-prep, Bun `/_go/*` slice live).
|
||
The goal of running both lines is to find where each architecture is
|
||
weak vs strong, address those gaps, and make the keep/maintain
|
||
decision based on real evidence rather than preference.
|
||
|
||
A snapshot of this document at any point in time is also captured at
|
||
`reports/cutover/architecture_comparison.md`. The version in `docs/`
|
||
is the source of truth; `reports/cutover/` is the historical record.
|
||
|
||
---
|
||
|
||
## How to update this doc
|
||
|
||
Three triggers:
|
||
|
||
1. **A fix lands on either side that moves a table value.** Update the
|
||
number, append a one-line entry to the change log at the bottom,
|
||
commit alongside the fix.
|
||
2. **A new architectural axis surfaces.** Add a section. Match the
|
||
shape of existing sections (table + read paragraph).
|
||
3. **A keep/maintain decision is made.** Update the Recommendation
|
||
section + change log.
|
||
|
||
Don't:
|
||
- Delete sections without recording the reason in the change log.
|
||
- Embed unverified claims — every "Rust is X" or "Go is X" should
|
||
point to either a load-test number, a code reference (`crate/file:line`),
|
||
or an explicit "asserted, not measured" caveat.
|
||
|
||
---
|
||
|
||
## Decisions tracker
|
||
|
||
| Date | Decision | Effect |
|
||
|---|---|---|
|
||
| 2026-05-01 | Add LRU embed cache to Rust aibridge | Closes 236× perf gap. **DONE** (commit `150cc3b` in lakehouse). |
|
||
| 2026-05-01 | Port FillValidator + EmailValidator to Go | Production safety net Go was missing. **DONE** (commit `b03521a` in golangLAKEHOUSE). |
|
||
| 2026-05-01 | Multi-tier load test against 100k corpus | 335k scenarios in 5min, 4/6 at 0% fail. Surfaced coder/hnsw v0.6.1 bug. Recover guard added. **DONE** (multitier_100k.md). |
|
||
| 2026-05-01 | **coder/hnsw v0.6.1 panic — REAL FIX landed** | Lifted source-of-truth out of coder/hnsw via `i.vectors map[string][]float32` side store + `safeGraphAdd`/`safeGraphDelete` recover wrappers + warm-path rebuild fallback. Re-run: 0 failures across 19,622 scenarios (was 96-98% on 2/6). **DONE.** Architecture invariant in STATE_OF_PLAY "DO NOT RELITIGATE". |
|
||
| 2026-05-02 | **Substrate fix verified at original failure scale** | Re-ran multitier 5min @ conc=50 (the footprint that originally surfaced the bug at 96-98% fail). Result: 132,211 scenarios at 438/sec, **6/6 classes at 0% failure**. Throughput dropped 1,115/sec → 438/sec because broken scenarios now do real HNSW Add work. Tails healthy: surge_fill_validate p99=1.53s, playbook_record_replay p99=2.32s. **Fix scales — closing the open thread.** |
|
||
| _open_ | Drop Python sidecar from Rust aibridge | Universal-win architectural cleanup. ~200 LOC, removes 1 runtime + 1 process. |
|
||
| 2026-05-02 | **Port Rust materializer to Go (transforms.ts) — DONE** | `internal/materializer` + `cmd/materializer` + `materializer_smoke.sh`. Ports `transforms.ts` (12 transforms) + `build_evidence_index.ts`. Idempotency, day-partition, receipt. 14 tests green; on-wire JSON matches TS so both runtimes interoperate. |
|
||
| 2026-05-02 | **Port Rust replay tool to Go — DONE** | `internal/replay` + `cmd/replay` + `replay_smoke.sh`. Ports `replay.ts` retrieve → bundle → /v1/chat → validate → log. Closes audit-FULL phase 7 live invocation on Go side. 14 tests green; same `data/_kb/replay_runs.jsonl` shape (schema=replay_run.v1) as TS. |
|
||
| 2026-05-02 | **`/v1/validate` + `/v1/iterate` HTTP surface — DONE** | `cmd/validatord` (port 3221) hosts both endpoints. `internal/validator` gains `PlaybookValidator` (3rd kind), JSONL roster loader, and the `Iterate` orchestrator + `ExtractJSON` helper. Gateway proxies `/v1/validate` + `/v1/iterate` to validatord. Closes the last "Go-primary" backlog item (architecture_comparison.md item #7). 30+ tests + `validatord_smoke.sh` 5/5 PASS. |
|
||
| 2026-05-02 | **Cross-runtime validator parity probe — surfaced wire-format gap** | New `scripts/cutover/parity/validator_parity.sh` runs 6 identical /v1/validate cases against Rust :3100 AND Go :4110, compares status + body. Result: **6/6 status codes match (logic-level equivalence holds), 5/6 body shapes diverge.** Rust returns serde-tagged enum `{"Schema":{"field":"x","reason":"y"}}`; Go returns flat struct `{"Kind":"schema","Field":"x","Reason":"y"}`. Any caller parsing the error envelope would break in cutover. **Open**: pick a target shape (Go matching Rust is the cutover-friendly direction) and align via custom `MarshalJSON` on `ValidationError`. |
|
||
| 2026-05-02 | **Materializer parity probe — caught + fixed real bug** | New `scripts/cutover/parity/materializer_parity.sh` runs Bun + Go materializer on identical synthetic root, diffs output JSONL. Result on first run: **0/2 match** — Go's `Provenance.LineOffset` had `json:",omitempty"` and stripped the field on first-row records (line_offset=0 is semantically meaningful, not absent). 1-line fix (drop `omitempty` + comment explaining why). Re-run: **2/2 match**. Real cross-runtime gap surfaced + closed in same wave. |
|
||
| 2026-05-02 | **extract_json parity probe — 12/12 match across edge cases** | New `scripts/cutover/parity/extract_json_parity.sh` runs identical model-output strings through Rust `gateway::v1::iterate::extract_json` AND Go `validator.ExtractJSON`. 12 fixtures: fenced/unfenced blocks, nested objects, unicode, escaped quotes, top-level array, malformed JSON. Substrate gate: `cargo test -p gateway extract_json` PASS before probe. Result: **12/12 match.** Algorithms genuinely equivalent. Rust side gained `pub` on `extract_json` + new `bin/parity_extract_json` (~30 LOC). |
|
||
| _open_ | **Validator wire-format alignment** | Surfaced by 2026-05-02 parity probe. Choose canonical error JSON shape, align both runtimes. ~50 LOC custom `MarshalJSON` either side. |
|
||
| _open_ | Decide on Lance vector backend | Defer until corpus exceeds ~5M rows. |
|
||
| _open_ | Pick Go primary vs Rust primary | Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness. |
|
||
|
||
---
|
||
|
||
## Code volume
|
||
|
||
| | Lines | Last verified |
|
||
|---|---:|---|
|
||
| Rust `crates/` (15 crates) | 35,447 | 2026-05-01 |
|
||
| Rust `sidecar/` (Python) | 1,237 | 2026-05-01 |
|
||
| Go `internal/` (20 packages) | 11,896 (+ validator 1190) | 2026-05-01 |
|
||
| Go `cmd/` (14 binaries) | 3,232 | 2026-05-01 |
|
||
| **Go total** | **~16,300** | 2026-05-01 |
|
||
|
||
Go is ~46% the size of Rust on like-for-like surface (post-validator-port).
|
||
The gap is largely `vectord` (Rust 11,005 lines vs Go 804) — Rust's
|
||
vectord implements HNSW + Lance-format storage + benchmarking; Go's
|
||
wraps `coder/hnsw` and stops there.
|
||
|
||
---
|
||
|
||
## Process model
|
||
|
||
| | Rust | Go |
|
||
|---|---|---|
|
||
| Binaries running | **1** mega-process (gateway PID 1241, 14.9G RSS, 374% CPU under load) | **11** dedicated daemons (~100-300MB RSS each) |
|
||
| Inter-component comms | In-process axum.nest (no network) | HTTP between daemons |
|
||
| Crash blast radius | Whole system if any subsystem panics | One daemon dies, rest survive |
|
||
| Horizontal scale | One unit only — can't scale individual components | Each daemon scales independently |
|
||
| Deploy unit | Single binary | 11 systemd units |
|
||
|
||
**Reading**: Rust's mega-binary is simpler ops at small scale (one
|
||
thing to start, one log to tail). Go's daemons are simpler ops at
|
||
production scale (kill the misbehaving one, restart it, others stay
|
||
up). Go also lets you tune per-daemon resource limits via systemd.
|
||
|
||
---
|
||
|
||
## Python dependency (the load-bearing axis)
|
||
|
||
This is the architectural difference that drove the original perf gap.
|
||
Both call Ollama at `:11434`, but the path differs:
|
||
|
||
```
|
||
Rust embed: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
|
||
Go embed: gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434
|
||
```
|
||
|
||
The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a
|
||
FastAPI wrapper around Ollama. It does pydantic validation + request
|
||
shaping; **no fundamental compute** that Ollama can't do directly.
|
||
|
||
### Performance impact (load-tested 2026-05-01, 6 rotating bodies, 10 concurrency, 30s)
|
||
|
||
| Path | Pre-cache | Post-cache (`150cc3b`) | Δ |
|
||
|---|---:|---:|---:|
|
||
| **Rust /ai/embed** (via gateway) | 128 RPS · p50 78ms · p99 124ms | **30,279 RPS · p50 129µs · p99 5ms** | +236× RPS |
|
||
| **Go /v1/embed** (via gateway → embedd) | 8,119 RPS · p50 0.79ms · p99 3ms | _unchanged_ | (already cached) |
|
||
|
||
Rust now beats Go ~3.7× on cache-warm workloads. The cache being
|
||
in-process inside Rust's gateway (no HTTP hop to a separate daemon)
|
||
gives it the edge once both sides have caching.
|
||
|
||
### What the cache fix did NOT do
|
||
|
||
The Python sidecar is still in the Rust path on cache misses. Cold
|
||
queries pay the full Python+Ollama tax. Dropping the sidecar
|
||
(rewriting aibridge to call Ollama directly) is the next universal-win
|
||
item — open in the Decisions tracker.
|
||
|
||
---
|
||
|
||
## Vector storage
|
||
|
||
| | Rust | Go |
|
||
|---|---|---|
|
||
| HNSW lib | `hnsw_rs` (mature) | `coder/hnsw` (newer, smaller) |
|
||
| Code size | 11,005 lines (`vectord` + `vectord-lance`) | 804 lines |
|
||
| Lance-format storage | Yes (`vectord-lance` crate) | No |
|
||
| Persistence | LanceDB or in-memory | MinIO + JSON envelope (v2 envelope as of `eb0dfdf`) |
|
||
| Distance functions | cosine, euclidean, dot product | cosine, euclidean |
|
||
|
||
**Reading**: Rust has the deeper substrate. Lance-format gives columnar
|
||
persistence + zero-copy reads + Apache Arrow integration. For
|
||
staffing-domain corpus sizes (5K-500K vectors) both work fine; for
|
||
multi-million-row indexes Rust would have a real edge. **Defer the Go
|
||
Lance port until corpus growth demands it.**
|
||
|
||
---
|
||
|
||
## Distillation pipeline (porting status)
|
||
|
||
| Phase | Rust source | Go port |
|
||
|---|---|---|
|
||
| Materializer (transforms.ts) | TS, full | ❌ NOT YET PORTED |
|
||
| Scorer | TS + Go | ✅ Ported |
|
||
| Score categories + firewall | Pinned | ✅ Ported (`SftNever`) |
|
||
| SFT export (synthesis) | TS, full (8 source classes) | ✅ Fully ported, 4-decimal byte-equal |
|
||
| RAG export | TS | ❌ NOT YET PORTED |
|
||
| Preference export | TS | ❌ NOT YET PORTED |
|
||
| Audit-baselines | TS | ✅ Fully ported, byte-equal verified |
|
||
| Audit-FULL phase 0/3/4 | TS | ✅ Ported |
|
||
| Audit-FULL phase 1 (schema) | bun test | ✅ Via `go test` exec |
|
||
| Audit-FULL phase 2 (materializer) | TS | ✅ Observer mode (read-only) |
|
||
| Audit-FULL phase 5 (run summaries) | TS | ✅ Observer mode (read-only) |
|
||
| Audit-FULL phase 6 (acceptance) | TS fixture harness | ❌ Skipped (TS-only deps) |
|
||
| Audit-FULL phase 7 (replay) | TS | ✅ Observer mode (read-only) |
|
||
| Replay tool | TS | ❌ NOT YET PORTED |
|
||
| Quarantine writer | TS | ❌ NOT YET PORTED |
|
||
|
||
**Reading**: Go has the substrate for everything observable (read
|
||
paths) and SFT export end-to-end. The producer side (materializer,
|
||
replay) is still Rust-only. To run the full pipeline from Go alone,
|
||
the materializer + replay need porting.
|
||
|
||
---
|
||
|
||
## Production validators
|
||
|
||
| | Rust | Go |
|
||
|---|---|---|
|
||
| FillValidator | `crates/validator/src/staffing/fill.rs` (12 unit tests) | ✅ **Ported 2026-05-01** (`internal/validator/fill.go` + 13 tests) |
|
||
| EmailValidator | `crates/validator/src/staffing/email.rs` (12 tests) | ✅ **Ported 2026-05-01** (`internal/validator/email.go` + 11 tests) |
|
||
| `/v1/validate` endpoint | Yes | ❌ NOT YET PORTED (validator network surface) |
|
||
| `/v1/iterate` endpoint | Yes (gen→validate→correct→retry loop) | ❌ NOT YET PORTED |
|
||
| Production validators load `workers_500k.parquet` at startup | Yes (75MB resident) | N/A — Go uses WorkerLookup interface; in-memory or adapter |
|
||
|
||
**Reading**: With today's port, Go has the load-bearing validators.
|
||
The network surface (`/v1/validate`, `/v1/iterate`) is the next
|
||
piece — the in-memory validators work in-process; turning them into
|
||
HTTP endpoints adds the production-shape access pattern.
|
||
|
||
---
|
||
|
||
## Substrate features unique to each side
|
||
|
||
### Go has, Rust doesn't
|
||
|
||
- **chatd 5-provider dispatcher** (kimi / opencode / openrouter / ollama_cloud / ollama).
|
||
- **Cross-role gate** in matrix retrieve (real_001 fix). Verified by reality tests real_001..005.
|
||
- **Multi-corpus matrix indexer** (Spec §3.4 component 2).
|
||
- **Pathway memory** (Mem0-style versioned traces).
|
||
- **Observer fail-safe semantics** (ADR-005 Decision 5.1).
|
||
- **In-process embed cache** (CachedProvider + LRU). _Note: Rust got this 2026-05-01 too._
|
||
- **LLM-based role extractor** (regex + qwen2.5 fallback).
|
||
- **Persistent stack 3-layer isolation** (`scripts/cutover/start_go_stack.sh`).
|
||
- **Cutover slice** (Bun `/_go/*` route, opt-in via systemd drop-in).
|
||
- **Production load test** (`scripts/cutover/loadgen/`) with Bun-frontend + direct comparison.
|
||
|
||
### Rust has, Go doesn't
|
||
|
||
- **Lance-format vector storage** (vectord-lance crate, 605 lines).
|
||
- **`truth` crate** (970 lines). Cross-source claim reconciliation.
|
||
- **`journald` crate** (455 lines). Structured event journal.
|
||
- **`/v1/validate` + `/v1/iterate` endpoints** (network surface).
|
||
- **`ui` crate (Dioxus, 1,509 lines)**. Native desktop/web UI.
|
||
- **Materializer + replay tools** (the "produce evidence" side).
|
||
- **Acceptance harness** (22 invariants over fixtures, TS).
|
||
- **Production deployment** (devop.live/lakehouse/* serves through Rust today).
|
||
|
||
---
|
||
|
||
## Strengths and weaknesses
|
||
|
||
### Rust strengths
|
||
|
||
- Mature, in production, serving real demo traffic.
|
||
- Single deploy unit; one binary, one systemd service, one log.
|
||
- Type system + memory safety; fewer runtime bugs in hot paths.
|
||
- Mature library ecosystem (axum, tokio, polars, arrow, hnsw_rs, lance).
|
||
- Native distillation pipeline; Go is the porter.
|
||
- Production validators (now also in Go but Rust authored them).
|
||
- Lance vector storage scales beyond 5M rows.
|
||
- **In-process embed cache (post-`150cc3b`) makes Rust the fastest path on warm workloads.**
|
||
|
||
### Rust weaknesses
|
||
|
||
- **Python sidecar dependency** — every cache-miss AI call goes through Python. Adds 1 runtime + 1 process to ops. ~200 LOC to fix.
|
||
- **Mega-binary blast radius** — gateway at 14.9G RSS means any panic kills the whole production system.
|
||
- **Tail latency cliff under uncached load** — single async runtime serializes I/O completions.
|
||
- **Compile times** — slow iteration vs Go's per-package builds.
|
||
- **Coupling** — adding a feature touches gateway/v1/ and ripples across crates.
|
||
|
||
### Go strengths
|
||
|
||
- **Process isolation** — daemons crash independently; ops can `systemctl restart vectord` without touching gateway.
|
||
- **Per-daemon scale** — embed cache lives in embedd; vectord shards independently. Hot daemons scale horizontally.
|
||
- **No Python dependency** — every daemon talks to peers in HTTP/JSON. Native Go down to Ollama.
|
||
- **In-process embed cache** at the daemon level (was the perf lever pre-Rust-cache).
|
||
- **Smaller, denser code** — 16,300 lines vs Rust's 35,447 + 1,237 sidecar (~46% the size).
|
||
- **Faster iteration** — `go build` of all 14 binaries is ~3-5s; Rust full rebuild is minutes.
|
||
- **Cross-runtime artifact compatibility verified** — audit_baselines.jsonl, scored-runs JSONL, sft_export.jsonl all round-trip byte-equal.
|
||
|
||
### Go weaknesses
|
||
|
||
- **Distillation pipeline incomplete** — materializer + replay + RAG export + preference export still Rust-only.
|
||
- **Validator network surface missing** — in-memory validators work, but `/v1/validate` HTTP endpoint not yet ported. Operators can't call validators over the wire from Go.
|
||
- **Vector storage HNSW-only** — no Lance equivalent. Fine for current scale.
|
||
- **Less production-tested** — cutover slice live but no real coordinator traffic yet.
|
||
- **HTTP between daemons** — every cross-daemon call is a network round-trip. Latency fine on localhost (microseconds) but tail-latency contributes more than Rust's in-process composition.
|
||
- **`coder/hnsw` is newer** than Rust's `hnsw_rs`. Less battle-tested.
|
||
|
||
---
|
||
|
||
## Cross-cutting abstracts to address
|
||
|
||
The list below is a working backlog. Move items to "Decisions tracker"
|
||
(at top) when actioned with a commit reference.
|
||
|
||
### Universal wins (apply regardless of primary line)
|
||
|
||
1. ✅ **Embed cache in Rust aibridge** — DONE 2026-05-01 (`150cc3b`).
|
||
2. ✅ **FillValidator + EmailValidator in Go** — DONE 2026-05-01 (`b03521a`).
|
||
3. **Drop Python sidecar from Rust** — Rewrite aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly. Removes 1 runtime + 1 process from ops. ~200 LOC.
|
||
4. **Cross-runtime contract tests** — Pin shared JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` with Go-side validators consuming the same definitions.
|
||
|
||
### If keeping Go primary
|
||
|
||
5. ✅ **Port materializer** — DONE 2026-05-02 (`cmd/materializer`).
|
||
6. ✅ **Port replay tool** — DONE 2026-05-02 (`cmd/replay`).
|
||
7. ✅ **Port `/v1/validate` + `/v1/iterate` HTTP surface** — DONE 2026-05-02 (`cmd/validatord`).
|
||
8. **Skip Lance** until corpus growth demands it (>5M rows).
|
||
9. **Keep chatd, observer fail-safe, role gate, multi-corpus matrix** — real Go wins worth preserving.
|
||
|
||
### If keeping Rust primary
|
||
|
||
10. **Port chatd's 5-provider dispatcher to Rust** — unified cloud LLM access.
|
||
11. **Port the cross-role gate to Rust matrix retrieve** — production safety on the matrix layer (verified by Go reality tests real_001..005).
|
||
12. **Consider process splitting** — even partial decomposition (split out vectord into its own process) would help with the mega-binary blast radius.
|
||
|
||
---
|
||
|
||
## Recommendation (working hypothesis)
|
||
|
||
**Go for the primary line, Rust for production-bridge maintenance.**
|
||
|
||
Reasons:
|
||
1. **Operations** — process isolation is genuinely simpler at production scale than a 14.9G mega-binary.
|
||
2. **Code volume** — Go does the same job in ~46% the lines.
|
||
3. **Cross-runtime parity verified** — every artifact round-trips byte-equal between runtimes.
|
||
4. **The 4 missing pieces are bounded** — materializer + replay + validators-network + RAG/preference exports are concrete porting targets, not research questions.
|
||
5. **Performance is no longer a deciding factor** post-`150cc3b` — Rust is faster on warm cache, but both are well above staffing-domain demand levels (<1 RPS typical).
|
||
|
||
But **don't abandon Rust**:
|
||
1. devop.live/lakehouse/ runs through Rust today; cutover is multi-week.
|
||
2. Several Go improvements would be downstream of Rust patterns. Keeping Rust live means anything new there is a porting opportunity for Go.
|
||
3. The Python sidecar drop + cross-role gate port are valuable Rust improvements regardless of which line is primary.
|
||
|
||
---
|
||
|
||
## Change log
|
||
|
||
Append entries here when this doc gets updated. One-line entries; link to commits.
|
||
|
||
- 2026-05-01 — Initial draft (`b3ad148` golangLAKEHOUSE).
|
||
- 2026-05-01 — Recorded Rust embed cache shipping (`150cc3b` lakehouse), updated Python-dependency section + table.
|
||
- 2026-05-01 — Recorded Go validator port shipping (`b03521a` golangLAKEHOUSE), updated production-validators section.
|
||
- 2026-05-01 — Reframed as living document in `docs/`, added Decisions tracker + Update guidance + Change log sections.
|
||
- 2026-05-01 — Multi-tier 100k load test ran (335k scenarios @ 1,115/sec, 4/6 at 0% fail), surfaced coder/hnsw v0.6.1 nil-deref on small playbook_memory index. Recover guard added; real fix open.
|
||
- 2026-05-01 (later) — coder/hnsw v0.6.1 panic real fix landed: vectord lifts source-of-truth out of coder/hnsw via `i.vectors` side store + recover wrappers + rebuild fallback. Re-run multitier 60s/conc=50: 0 failures across 19,622 scenarios. STATE_OF_PLAY invariant added to "DO NOT RELITIGATE".
|
||
- 2026-05-02 — Substrate fix verified at original failure-surfacing scale. Multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput drop (1,115 → 438/sec) is the honest cost of the formerly-broken scenarios doing real HNSW Add work. STATE_OF_PLAY refreshed to 2026-05-02.
|
||
- 2026-05-02 — Materializer + replay tool ported from Rust legacy to Go (`internal/materializer` + `internal/replay`, both with CLI + smoke + tests). Both runtimes now produce the same `data/evidence/YYYY/MM/DD/*.jsonl` and `data/_kb/replay_runs.jsonl` shapes; Go side no longer needs Bun for these phases.
|
||
- 2026-05-02 — `/v1/validate` + `/v1/iterate` HTTP surface ported as `cmd/validatord` on `:3221`. Closes the last "If keeping Go primary" backlog item — Go now owns the entire validator path end-to-end (no Rust dep for staffing safety net). 5/5 smoke probes via gateway :3110.
|
||
|
||
---
|
||
|
||
## See also
|
||
|
||
- **`reports/cutover/architecture_comparison.md`** — historical snapshot (matched this doc as of the date stamp at top).
|
||
- **`docs/SPEC.md`** — Go-side architectural spec.
|
||
- **`docs/DECISIONS.md`** — Go-side ADRs.
|
||
- **`/home/profit/lakehouse/docs/DECISIONS.md`** — Rust-side ADRs.
|
||
- **`/home/profit/lakehouse/docs/go-rewrite/`** — Rust-side notes on the rewrite.
|
||
- **`reports/cutover/SUMMARY.md`** — running log of cross-runtime parity probes.
|
||
- **`reports/cutover/g5_load_test.md`** — load-test methodology + numbers.
|