g5 cutover: bigger load test — 5.87M req, 0 errors, 370MB RSS
Larger-scale follow-up to the original load test. Three axis
expansions: corpus 200→5K workers, body variety 6→200 distinct
queries, concurrency sweep 10/50/100/200, plus mixed
embed+search workload.
Concurrency sweep on /v1/matrix/search direct (3 min each):
conc=10: 486,733 req · 2,704 RPS · p50 2.19ms · p99 6.7ms
conc=50: 1,148,543 req · 6,381 RPS · p50 7.08ms · p99 20ms
conc=100: 1,253,389 req · 6,963 RPS · p50 13.34ms · p99 37ms
conc=200: 1,460,676 req · 8,114 RPS · p50 23.45ms · p99 56ms
Mixed embed+search at 60 conc each, 90s:
/v1/embed: 1,127,854 req · 12,531 RPS · p50 3.31ms · p99 14.6ms
/v1/matrix/search: 392,229 req · 4,358 RPS · p50 12.68ms · p99 33.8ms
TOTAL: 5,869,424 requests across ~13.5 minutes. ZERO errors.
Resource footprint during peak load:
matrixd 105% CPU, 33MB RSS (bottleneck — pegs 1 core)
vectord 39% CPU, 82MB RSS
gateway 44% CPU, 41MB RSS
embedd 30% CPU, 67MB RSS
Total RSS across 11 daemons: ~370MB
Compare to Rust gateway under similar load: 14.9GB RSS, 374% CPU.
Go uses ~40x less memory + spreads load across daemons rather
than packing into one mega-process.
Saturation analysis:
- conc 10→50: +135% RPS (linear-ish scaling)
- conc 50→100: +9% RPS (saturation begins)
- conc 100→200: +17% RPS (matrixd 1-core pegged)
Headroom paths if production exceeds current demand:
1. Run multiple matrixd instances behind a load balancer.
Substrate is stateless (recordings via storaged), horizontal
scale is straightforward.
2. Profile matrixd's per-request work (role-gate + judge-eligibility
+ result merge).
3. Skip Bun for hot endpoints (direct nginx → Go = 5.7x previously
measured).
Evidence: reports/cutover/g5_load_test_big.md (full tables +
methodology + repro script).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>