root 0d1553ca88 candidates corpus: first deep-field reality test on real staffing data
Lands the second staffing corpus and the first end-to-end reality test
through the full Go pipeline: parquet → corpusingest → embedd →
vectord → matrixd → gateway.

What's new:
  - scripts/staffing_candidates/main.go — parquet Source over
    candidates.parquet (1000 rows, 11 cols), single-chunk arrow-go
    pqarrow read. Embed text: "Candidate skills: <s>. Based in
    <city>, <state>. <years> years experience. Status: <status>.
    <first> <last>." IDs prefixed "c-" so multi-corpus merges
    against workers ("w-") stay unambiguous.
  - scripts/candidates_e2e.sh — first integration smoke that runs
    the full stack (storaged + embedd + vectord + matrixd + gateway),
    ingests via corpusingest, runs a real query through
    /v1/matrix/search, prints results. Ephemeral mode (vectord
    persistence disabled via custom toml) so re-runs don't pollute
    MinIO _vectors/ and break g1p_smoke's "only-one-persisted-index"
    assertion.

Real bug caught + fixed in corpusingest:
  When LogProgress > 0, the progress goroutine's only exit was
  ctx.Done(). With context.Background() in the production driver,
  Run hung forever after the pipeline finished. Added a stopProgress
  channel that close()s after wg.Wait(). Regression test
  TestRun_ProgressLoggerExits bounds Run's wall to 2s with
  LogProgress=50ms.

This is the bug the unit tests didn't catch because every prior test
set LogProgress: 0. Reality test surfaced it on first real-data
run — exactly the hyperfocus-and-find-architectural-weakness
property J framed as the reason for the Go pass.

End-to-end output (1000 candidates, query "Python AWS Docker
engineer in Chicago available now"):

  populate: scanned=1000 embedded=1000 added=1000 wall=3.5s
  matrix returned 5 hits in 26ms

The result quality is the interesting signal: top-5 had ZERO
Chicago candidates, ZERO active-status candidates, and the exact-
skill-match (Python,AWS,Docker) ranked #3 not #1. Pipeline works;
retrieval quality has real architectural limits (no structured
filtering, no relevance gate, semantic-only ranking dominated by
secondary signals like "1 year experience" and "engineer"). This
motivates SPEC §3.4 components 3 (relevance filter) and
eventually structured filtering — exactly the kind of finding the
deep field reality tests are supposed to surface before Enterprise
cutover.

12-smoke regression sweep all green. 9 corpusingest unit tests
including the new regression. vet clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:06:27 -05:00
..