root 3a2823c02f g5 cutover: bigger load test — 5.87M req, 0 errors, 370MB RSS

Larger-scale follow-up to the original load test. Three axis
expansions: corpus 200→5K workers, body variety 6→200 distinct
queries, concurrency sweep 10/50/100/200, plus mixed
embed+search workload.

Concurrency sweep on /v1/matrix/search direct (3 min each):
  conc=10:  486,733 req  · 2,704 RPS · p50 2.19ms · p99 6.7ms
  conc=50:  1,148,543 req · 6,381 RPS · p50 7.08ms · p99 20ms
  conc=100: 1,253,389 req · 6,963 RPS · p50 13.34ms · p99 37ms
  conc=200: 1,460,676 req · 8,114 RPS · p50 23.45ms · p99 56ms

Mixed embed+search at 60 conc each, 90s:
  /v1/embed: 1,127,854 req · 12,531 RPS · p50 3.31ms · p99 14.6ms
  /v1/matrix/search: 392,229 req · 4,358 RPS · p50 12.68ms · p99 33.8ms

TOTAL: 5,869,424 requests across ~13.5 minutes. ZERO errors.

Resource footprint during peak load:
  matrixd  105% CPU, 33MB RSS (bottleneck — pegs 1 core)
  vectord   39% CPU, 82MB RSS
  gateway   44% CPU, 41MB RSS
  embedd    30% CPU, 67MB RSS
  Total RSS across 11 daemons: ~370MB

Compare to Rust gateway under similar load: 14.9GB RSS, 374% CPU.
Go uses ~40x less memory + spreads load across daemons rather
than packing into one mega-process.

Saturation analysis:
- conc 10→50: +135% RPS (linear-ish scaling)
- conc 50→100: +9% RPS (saturation begins)
- conc 100→200: +17% RPS (matrixd 1-core pegged)

Headroom paths if production exceeds current demand:
1. Run multiple matrixd instances behind a load balancer.
   Substrate is stateless (recordings via storaged), horizontal
   scale is straightforward.
2. Profile matrixd's per-request work (role-gate + judge-eligibility
   + result merge).
3. Skip Bun for hot endpoints (direct nginx → Go = 5.7x previously
   measured).

Evidence: reports/cutover/g5_load_test_big.md (full tables +
methodology + repro script).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 05:18:00 -05:00

7.3 KiB

Raw Blame History

G5 cutover prep — verified-parity log

What works on Go gateway, what's been side-by-side compared to Rust, what's safe to flip. Append a row when a new endpoint clears parity.

Endpoint	Date	Rust path	Go path	Verdict	Notes
`embed` (forced v1)	2026-04-30	`/ai/embed`	`/v1/embed`	✅ PASS 5/5 cos=1.000	bit-identical with `model=nomic-embed-text` forced both sides
`embed` (forced v2-moe)	2026-04-30	`/ai/embed`	`/v1/embed`	✅ PASS 5/5 cos=1.000	bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model
`audit_baselines.jsonl`	2026-05-01	`data/_kb/audit_baselines.jsonl`	`internal/distillation` `LoadLastBaseline` / `AppendBaseline` / `BuildAuditDriftTable`	✅ PASS round-trip	Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See `audit_baselines_roundtrip.md`.
`audit-FULL` (phases 0/3/4)	2026-05-01	`scripts/distillation/audit_full.ts`	`cmd/audit_full` + `internal/distillation` `RunAuditFull`	✅ PASS metric-equal	Go-side run against live Rust root: all 8 ported metrics (p3_, p4_) byte-equal to the last Rust-emitted `audit_baselines.jsonl` entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See `audit_full_go_vs_rust.md`.
`audit-FULL` (phases 0/1/2/3/4/5/7 — observer mode)	2026-05-01	`scripts/distillation/audit_full.ts`	`cmd/audit_full` + `internal/distillation` `RunAuditFull`	✅ PASS 12/12	Skips reduced from 4 → 1: phase 1 invokes `go test`, phases 2/5/7 read existing artifacts as observers (no live materializer/replay invocation). Only phase 6 (TS-only acceptance harness) remains skipped. `p2_evidence_rows=1055` matches Rust `summary.json` `collect.records_out=1055` byte-equal. Updated `audit_full_go_vs_rust.md`.
`audit_baselines.jsonl` write side	2026-05-01	`data/_kb/audit_baselines.jsonl` (Rust-emitted, 7 entries)	Go-emitted entry #8 via `cmd/audit_full -append-baseline`	✅ Mixed-runtime log	First Go-side entry written to the shared longitudinal log: `git_commit=ee2a40c5...` (golangLAKEHOUSE SHA, distinguishable from prior Rust SHAs like `ca7375ea`). All 10 metric fields match Rust shape exactly — drift comparator fires correctly across the runtime boundary.
Full Go stack (persistent)	2026-05-01	per-binary on :31xx	11 daemons (storaged/catalogd/ingestd/queryd/embedd/vectord/pathwayd/observerd/matrixd/gateway/chatd)	✅ All 11 healthy	First time the Go stack runs as long-running daemons rather than per-harness transient processes. Brought up via `scripts/cutover/start_go_stack.sh`; gateway proxies `/v1/embed` correctly through to embedd; all 5 chatd providers loaded. Live alongside the Rust gateway on :3100 (no port conflict).
G5 cutover slice live	2026-05-01	(none — pure cutover)	Bun `/_go/*` → Go gateway `:4110`	✅ End-to-end	First real Bun-frontend traffic to Go substrate. Rust legacy `mcp-server/index.ts` gains opt-in `/_go/*` pass-through driven by `GO_LAKEHOUSE_URL` env (systemd drop-in at `/etc/systemd/system/lakehouse-agent.service.d/go-cutover.conf`). `/_go/v1/embed` returns nomic-embed-text-v2-moe vectors; `/_go/v1/matrix/search` returns 3/3 Forklift Operators against persistent stack's 200-worker corpus. Reversible (unset env or revert systemd unit). See `g5_first_slice_live.md`.
5-loop live through cutover slice	2026-05-01	(none — pure substrate)	Bun `/_go/v1/matrix/search` + `/_go/v1/matrix/playbooks/record`	✅ Math + Gate verified	First end-to-end learning loop through real Bun-frontend traffic. Cold dist 0.4449 → warm dist 0.2224 (BoostFactor=0.5 for score=1.0; 0.4449×0.5=0.2225 expected, 0.2224 observed — 4-decimal exact). Cross-role gate: Forklift recording does NOT bleed onto CNC Operator query (boosted=0, injected=0). Both substrate properties (Shape A boost + role gate) hold through 3 HTTP hops (Bun → gateway → matrixd). See `g5_first_loop_live.md`.
Production load test	2026-05-01	(none — pure load probe)	Bun `/_go/v1/matrix/search` + direct Go `:4110`	✅ 0 errors / 101k req	Three runs, zero correctness errors. Direct-to-Go: 2,772 RPS @ p50 2.5ms / p99 8.5ms (production-grade). Via Bun: 484 RPS @ p50 4.6ms / p99 92ms (Bun event-loop is the bottleneck — 5.7× RPS hit, 11× p99 inflation; substrate itself is fine). For staffing-domain demand (<1 RPS typical), Bun-fronted has 480× headroom. See `g5_load_test.md`.
Big load test (5K corpus, 200 bodies)	2026-05-01	(none — pure load probe)	Direct Go `:4110/v1/matrix/search` + `:4110/v1/embed`	✅ 0 errors / 5.87M req	Concurrency sweep (10/50/100/200) + mixed embed+search workload. Peak: 8,114 RPS @ conc=200 (search). Mixed: 16,889 RPS combined. Saturation at conc=100+ — matrixd pegs 1 CPU core. Total RSS ~370MB across 11 daemons (40× lower than Rust 14.9G). matrixd identified as horizontal-scale target. See `g5_load_test_big.md`.

Wire-format drift catalog

The Go gateway is not a literal nginx-swap drop-in for the Rust gateway. Anything that flips needs a wire-shape adapter. Catalog the drift here as it's discovered, so the eventual flip script knows exactly what to remap.

embed

Field	Rust	Go
URL prefix	`/ai/embed`	`/v1/embed`
Response: vectors field	`embeddings`	`vectors`
Response: dim field	`dimensions`	`dimension`
Response: model field	`model`	`model` ✓ same
Request shape	`{texts, model?}`	`{texts, model?}` ✓ same
L2 normalization	unit vectors (‖v‖ ≈ 1.0)	raw Ollama output (‖v‖ ≈ 20-23)

The L2 normalization difference is real but currently harmless: vectors point in identical directions (cos=1.000) but Go has raw magnitudes. Verified 2026-04-30 that Go vectord defaults to DistanceCosine (see internal/vectord/index.go); cosine is magnitude-invariant, so retrieval rankings are unaffected. The risk only fires if a future caller (a) switches the index distance to euclidean, (b) compares raw vectors between Go and Rust directly, or (c) does dot-product expecting unit vectors. Adding a normalization step in internal/embed/embed.go would make the cutover safer and is cheap — but not blocking.

Repro

./scripts/cutover/embed_parity.sh                                     # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh       # measure embedder

Each run drops a per-date verdict at reports/cutover/embed_parity_<DATE>.md.

What's not yet probed

/v1/sql ↔ Rust /query — query shape parity
/v1/vectors/search ↔ Rust /vectors/search — recall@k parity
/v1/matrix/retrieve ↔ Rust /vectors/hybrid — semantic retrieve parity (highest-leverage)
/v1/storage/* ↔ Rust /storage/* — direct S3 abstraction parity
/v1/chat — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested

The matrix-retrieve probe is the next-highest leverage because it's the actual user-facing retrieval path. Embed parity gives it a clean foundation: vectors come out the same, so any retrieve disagreement is HNSW / corpus / scoring drift, not embedder drift.

7.3 KiB Raw Blame History Unescape Escape