# G5 cutover slice — production load test Sustained-traffic load test against the cutover slice. Companion to `g5_first_loop_live.md` (which proved learning-loop math) — this report proves the substrate holds up under concurrent load. ## Setup - Persistent Go stack on `:4110+:4211-:4219` (11 daemons) - Workers corpus: 200 rows, in-memory + persisted to MinIO - Bun mcp-server on `:3700` with `GO_LAKEHOUSE_URL=http://127.0.0.1:4110` - Load generator: `scripts/cutover/loadgen/` — Go binary, 6-query rotating body mix (Forklift/CNC/Warehouse/Picker/Loader/Shipping) - All queries `use_playbook=false` (cold-pass retrieval only — the load test isolates retrieval performance from learning-loop costs) ## Results | Run | Path | Concurrency | Duration | Requests | RPS | p50 | p95 | p99 | max | errors | |---|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | 1 | Bun `/_go/*` → Go | 1 | 10s | 4,085 | 408 | 1.3ms | 3.2ms | 32ms | 215ms | 0 | | 2 | Bun `/_go/*` → Go | 10 | 30s | 14,527 | 484 | 4.6ms | 76ms | 92ms | 372ms | 0 | | 3 | Direct → Go (`:4110`) | 10 | 30s | 83,158 | **2,772** | 2.5ms | 7.2ms | 8.5ms | 16ms | 0 | **Total: 101,770 requests, zero errors.** ## Read ### What the load test confirmed 1. **Zero correctness errors across 101k requests.** Matrix gate + vectord HNSW + embedd cache + gateway proxy all hold under sustained concurrent traffic. No 5xx, no transport errors, no panics. This was the load test's primary question. 2. **Direct-to-Go performance is production-grade.** 2,772 RPS at p50 2.5ms / p99 8.5ms / max 16ms on a single host. The substrate itself has no scaling cliff at concurrency=10. 3. **The substrate's tail latency is well-bounded direct.** p99 8.5ms means 99% of requests complete in under 9ms. For a vector- search workload (which involves embed → HNSW search → metadata join), that's a strong number. ### What the load test exposed **Bun frontend is the bottleneck.** Adding Bun's reframing layer collapses throughput by 5.7× and inflates p99 by 11×: | Metric | Direct | Via Bun | Cost | |---|---:|---:|---| | RPS | 2,772 | 484 | -82% | | p50 latency | 2.5ms | 4.6ms | +84% | | p99 latency | 8.5ms | 92ms | +982% | | max latency | 16ms | 372ms | +2,225% | The p99/max cliff (>10× worse via Bun) suggests Bun's single-process JS event loop is queueing under concurrent requests. This is a known characteristic of Node/Bun in proxy mode — the event loop serializes I/O completions, and at concurrency=10 the queue depth during fan-out shows up as tail-latency cliffs. ### What this means for production **For staffing-domain demand levels** (single-coordinator workflows typically run <1 RPS even at peak), the Bun-fronted 484 RPS path has 480× headroom. No urgency to optimize Bun out of the data path. **If/when concurrent demand grows orders of magnitude** (e.g. 100+ simultaneous coordinators, automated pipelines), the optimization path is clear: route nginx → Go directly for `/v1/matrix/search` (or other hot endpoints), skip Bun for those. The 5.7× throughput gain isn't gated on Go-side optimization — it's gated on Bun reframing exit. **The substrate itself is production-ready.** Zero errors, sub-10ms p99 direct, no concurrency bugs surfaced under sustained load. The load test's null result on correctness is the load test's signal. ## What this load test does NOT cover - **Embedder hot path**: bodies rotate across 6 queries, so embed cache hits frequently. Cold-cache RPS would be lower. - **Larger corpus**: 200 workers is a small index. HNSW search costs scale with `O(log n)` so 5K or 500K row corpora would show small additional latency, but the experiment isn't done. - **Mixed read/write**: load is read-only. Concurrent ingest+search hasn't been tested under sustained load. - **Multi-host cluster**: single-process load on one box. Horizontal scaling characteristics unknown. - **Real chatd/observer/pathway calls**: load test bodies set `use_playbook=false` to isolate the matrix→vectord retrieve path. Full 5-loop traffic (with playbook lookup + judge gate) has different RPS characteristics. ## Repro ```bash # Stack must be up: ./scripts/cutover/start_go_stack.sh ./bin/staffing_workers -limit 200 -gateway http://127.0.0.1:4110 -drop=true # Build loadgen: go build -o bin/loadgen ./scripts/cutover/loadgen # Three runs: ./bin/loadgen -url http://localhost:3700/_go/v1/matrix/search -concurrency 1 -duration 10s ./bin/loadgen -url http://localhost:3700/_go/v1/matrix/search -concurrency 10 -duration 30s ./bin/loadgen -url http://localhost:4110/v1/matrix/search -concurrency 10 -duration 30s ``` JSON summary on stderr is parseable for CI integration.