1 Commits

Author SHA1 Message Date
root
1f700e731d Staffing scale test: full 500K through gateway → embedd → vectord pipeline
scripts/staffing_500k/main.go: driver that reads workers_500k.csv,
embeds combined-text per worker via /v1/embed, adds to vectord index
"workers_500k", runs canonical staffing queries against the populated
index. Reproducible end-to-end test of the staffing co-pilot pipeline
at production scale.

Run results (2026-04-29 ~02:30):
  500,000 vectors ingested in 35m 36s (~234/sec avg)
  vectord peak RSS 4.5 GB (~9 KB/vector incl. HNSW graph)
  Query latency: embed 40-59ms + search 1-3ms = ~50ms end-to-end
  GPU avg ~65% (Ollama not the bottleneck — vectord Add is)

Semantic recall on canonical queries:
  "electrician with industrial wiring": top 2 are literal Electricians (d=0.30)
  "CNC operator with first article": Assembler / Quality Techs (adjacent, d=0.24)
  "forklift driver OSHA-30": warehouse roles (d=0.33)
  "warehouse picker night shift bilingual": Material Handlers (d=0.31)
  "dental hygienist": Production Workers at d=0.49+ — correctly
    LOW-similarity, signals "no dental hygienists in this manufacturing
    dataset" rather than hallucinating a fake match.

Documented gaps:
  - storaged's 256 MiB PUT cap blocks single-file LHV1 persistence
    above ~150K vectors at d=768. Test ran with persistence disabled.
  - vectord Add is RWMutex-serialized — with GPU at 65% util this is
    the throughput cap. Concurrent Adds would be 2-3x faster but
    require careful audit of coder/hnsw thread-safety (G1 scrum
    documented two known quirks).

PHASE_G0_KICKOFF.md gains a "Staffing scale test" section with full
metrics + the gaps-surfaced list. The architectural payoff is real:
six binaries, one HTTP route, ~50ms from text query to top-K
semantically-relevant workers across 500K records.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 02:31:30 -05:00