root 277884b5eb multitier_100k: 335k scenarios @ 1,115/sec against 100k corpus, 4/6 at 0% fail
J asked for a much more sophisticated test using the 100k corpus from
the Rust legacy database. This commit ships:

scripts/cutover/multitier/main.go — 6-scenario harness with weighted
random selection per goroutine. Mixes search, email/SMS/fill
validators (in-process via internal/validator), profile swap with
ExcludeIDs, repeat-cache exercise, and playbook record/replay.

Scenarios + weights (cumulative scenario fractions):
  35% cold_search_email      — search + email outreach + EmailValidator
  15% surge_fill_validate    — search + fill proposal + FillValidator + record
  15% profile_swap           — original search + ExcludeIDs swap + no-overlap check
  15% repeat_cache           — same query × 5 (cache effectiveness)
  10% sms_validate           — SMS draft (≤160 chars, phone for SSN-FP guard)
  10% playbook_record_replay — cold → record → warm w/ use_playbook=true

Test results (5-min sustained, conc=50, 100k workers indexed):
  TOTAL 335,257 scenarios @ 1,115/sec
  cold_search_email     117k @ 0.0% fail · p50 2.2ms · p99 8.6ms
  surge_fill_validate    50k @ 98.8% fail (substrate bug below)
  profile_swap           50k @ 0.0% fail · p50 4.5ms · ExcludeIDs verified
  repeat_cache           50k × 5 = 252k searches @ 0.0% fail · p50 11.7ms
  sms_validate           33k @ 0.0% fail · phone-pattern guard works
  playbook_record_replay 33k @ 96.8% fail (substrate bug below)
  Total successful workflows: ~250k+

Validator integration verified at load:
  150,930 EmailValidator passes across cold_search_email + sms_validate
  35 + 1,061 successful FillValidator + playbook_record (where the bug
    didn't fire)
  zero false positives on the SSN-pattern guard against phone numbers

Resource footprint at 100k:
  vectord 1.23GB RSS (linear with 100k vectors)
  matrixd 26MB, 75% CPU (1-core saturated at conc=50)
  Total across 11 daemons: 1.7GB
  Compare to Rust at 14.9GB — ~10× less even at 100k.

SUBSTRATE BUG SURFACED: coder/hnsw v0.6.1 nil-deref in
layerNode.search at graph.go:95. Triggers on /v1/matrix/playbooks/record
under sustained writes to the small playbook_memory index. Both Add
and Search paths can panic.

Workaround applied (this commit) in internal/vectord/index.go
BatchAdd: recover() guard converts panic to error; daemon stays up
instead of crashing the request handler.

Operator recovery procedure (also documented in the report):
  curl -X DELETE http://localhost:4215/vectors/index/playbook_memory
Next record recreates the index fresh.

Real fix DEFERRED — open in docs/ARCHITECTURE_COMPARISON.md
Decisions tracker. Three options:
  a) upstream patch to coder/hnsw
  b) custom small-index Add path that always rebuilds when len < threshold
  c) alternate store for playbook_memory (Lance? in-memory map?)

Evidence: reports/cutover/multitier_100k.md (full methodology +
results + repro + bug analysis). docs/ARCHITECTURE_COMPARISON.md
Decisions tracker updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:28:50 -05:00

8.1 KiB
Raw Permalink Blame History

G5 cutover prep — verified-parity log

What works on Go gateway, what's been side-by-side compared to Rust, what's safe to flip. Append a row when a new endpoint clears parity.

Endpoint Date Rust path Go path Verdict Notes
embed (forced v1) 2026-04-30 /ai/embed /v1/embed PASS 5/5 cos=1.000 bit-identical with model=nomic-embed-text forced both sides
embed (forced v2-moe) 2026-04-30 /ai/embed /v1/embed PASS 5/5 cos=1.000 bit-identical with model=nomic-embed-text-v2-moe forced both sides — both Ollamas have the model
audit_baselines.jsonl 2026-05-01 data/_kb/audit_baselines.jsonl internal/distillation LoadLastBaseline / AppendBaseline / BuildAuditDriftTable PASS round-trip Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See audit_baselines_roundtrip.md.
audit-FULL (phases 0/3/4) 2026-05-01 scripts/distillation/audit_full.ts cmd/audit_full + internal/distillation RunAuditFull PASS metric-equal Go-side run against live Rust root: all 8 ported metrics (p3_, p4_) byte-equal to the last Rust-emitted audit_baselines.jsonl entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See audit_full_go_vs_rust.md.
audit-FULL (phases 0/1/2/3/4/5/7 — observer mode) 2026-05-01 scripts/distillation/audit_full.ts cmd/audit_full + internal/distillation RunAuditFull PASS 12/12 Skips reduced from 4 → 1: phase 1 invokes go test, phases 2/5/7 read existing artifacts as observers (no live materializer/replay invocation). Only phase 6 (TS-only acceptance harness) remains skipped. p2_evidence_rows=1055 matches Rust summary.json collect.records_out=1055 byte-equal. Updated audit_full_go_vs_rust.md.
audit_baselines.jsonl write side 2026-05-01 data/_kb/audit_baselines.jsonl (Rust-emitted, 7 entries) Go-emitted entry #8 via cmd/audit_full -append-baseline Mixed-runtime log First Go-side entry written to the shared longitudinal log: git_commit=ee2a40c5... (golangLAKEHOUSE SHA, distinguishable from prior Rust SHAs like ca7375ea). All 10 metric fields match Rust shape exactly — drift comparator fires correctly across the runtime boundary.
Full Go stack (persistent) 2026-05-01 per-binary on :31xx 11 daemons (storaged/catalogd/ingestd/queryd/embedd/vectord/pathwayd/observerd/matrixd/gateway/chatd) All 11 healthy First time the Go stack runs as long-running daemons rather than per-harness transient processes. Brought up via scripts/cutover/start_go_stack.sh; gateway proxies /v1/embed correctly through to embedd; all 5 chatd providers loaded. Live alongside the Rust gateway on :3100 (no port conflict).
G5 cutover slice live 2026-05-01 (none — pure cutover) Bun /_go/* → Go gateway :4110 End-to-end First real Bun-frontend traffic to Go substrate. Rust legacy mcp-server/index.ts gains opt-in /_go/* pass-through driven by GO_LAKEHOUSE_URL env (systemd drop-in at /etc/systemd/system/lakehouse-agent.service.d/go-cutover.conf). /_go/v1/embed returns nomic-embed-text-v2-moe vectors; /_go/v1/matrix/search returns 3/3 Forklift Operators against persistent stack's 200-worker corpus. Reversible (unset env or revert systemd unit). See g5_first_slice_live.md.
5-loop live through cutover slice 2026-05-01 (none — pure substrate) Bun /_go/v1/matrix/search + /_go/v1/matrix/playbooks/record Math + Gate verified First end-to-end learning loop through real Bun-frontend traffic. Cold dist 0.4449 → warm dist 0.2224 (BoostFactor=0.5 for score=1.0; 0.4449×0.5=0.2225 expected, 0.2224 observed — 4-decimal exact). Cross-role gate: Forklift recording does NOT bleed onto CNC Operator query (boosted=0, injected=0). Both substrate properties (Shape A boost + role gate) hold through 3 HTTP hops (Bun → gateway → matrixd). See g5_first_loop_live.md.
Production load test 2026-05-01 (none — pure load probe) Bun /_go/v1/matrix/search + direct Go :4110 0 errors / 101k req Three runs, zero correctness errors. Direct-to-Go: 2,772 RPS @ p50 2.5ms / p99 8.5ms (production-grade). Via Bun: 484 RPS @ p50 4.6ms / p99 92ms (Bun event-loop is the bottleneck — 5.7× RPS hit, 11× p99 inflation; substrate itself is fine). For staffing-domain demand (<1 RPS typical), Bun-fronted has 480× headroom. See g5_load_test.md.
Big load test (5K corpus, 200 bodies) 2026-05-01 (none — pure load probe) Direct Go :4110/v1/matrix/search + :4110/v1/embed 0 errors / 5.87M req Concurrency sweep (10/50/100/200) + mixed embed+search workload. Peak: 8,114 RPS @ conc=200 (search). Mixed: 16,889 RPS combined. Saturation at conc=100+ — matrixd pegs 1 CPU core. Total RSS ~370MB across 11 daemons (40× lower than Rust 14.9G). matrixd identified as horizontal-scale target. See g5_load_test_big.md.
Multi-tier 100k (6 scenarios + validators) 2026-05-01 (none — pure substrate probe) Direct Go :4110 mixed scenarios 4/6 scenarios 0% fail · ⚠ 2/6 hit substrate bug 335,257 scenarios in 5 min @ conc=50 (1,115/sec) against 100k corpus. Validators integrated: 150,930 EmailValidator passes (cold_search_email + sms_validate). 4 scenarios at 0% fail: cold_search_email (117k), profile_swap (50k, ExcludeIDs no-overlap verified), repeat_cache (50k × 5 = 252k cached searches), sms_validate (33k, phone-pattern guard works). 2 scenarios fail at /v1/matrix/playbooks/record: coder/hnsw v0.6.1 nil-deref in layerNode.search on small playbook_memory index under sustained writes. Recover guard added in vectord BatchAdd. Total RSS at 100k: 1.7GB (vs Rust 14.9GB — still ~10× lower). See multitier_100k.md.

Wire-format drift catalog

The Go gateway is not a literal nginx-swap drop-in for the Rust gateway. Anything that flips needs a wire-shape adapter. Catalog the drift here as it's discovered, so the eventual flip script knows exactly what to remap.

embed

Field Rust Go
URL prefix /ai/embed /v1/embed
Response: vectors field embeddings vectors
Response: dim field dimensions dimension
Response: model field model model ✓ same
Request shape {texts, model?} {texts, model?} ✓ same
L2 normalization unit vectors (‖v‖ ≈ 1.0) raw Ollama output (‖v‖ ≈ 20-23)

The L2 normalization difference is real but currently harmless: vectors point in identical directions (cos=1.000) but Go has raw magnitudes. Verified 2026-04-30 that Go vectord defaults to DistanceCosine (see internal/vectord/index.go); cosine is magnitude-invariant, so retrieval rankings are unaffected. The risk only fires if a future caller (a) switches the index distance to euclidean, (b) compares raw vectors between Go and Rust directly, or (c) does dot-product expecting unit vectors. Adding a normalization step in internal/embed/embed.go would make the cutover safer and is cheap — but not blocking.

Repro

./scripts/cutover/embed_parity.sh                                     # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh       # measure embedder

Each run drops a per-date verdict at reports/cutover/embed_parity_<DATE>.md.

What's not yet probed

  • /v1/sql ↔ Rust /query — query shape parity
  • /v1/vectors/search ↔ Rust /vectors/search — recall@k parity
  • /v1/matrix/retrieve ↔ Rust /vectors/hybrid — semantic retrieve parity (highest-leverage)
  • /v1/storage/* ↔ Rust /storage/* — direct S3 abstraction parity
  • /v1/chat — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested

The matrix-retrieve probe is the next-highest leverage because it's the actual user-facing retrieval path. Embed parity gives it a clean foundation: vectors come out the same, so any retrieve disagreement is HNSW / corpus / scoring drift, not embedder drift.