Replaced the hard-coded DemandQuery on inbox events with an actual
LLM call: each email/SMS body is parsed by qwen2.5 (format=json,
schema-anchored) into structured {role, count, location, certs,
skills, shift}. The driver then composes a query string from those
fields and runs matrix.search.
This is the real-product flow that the Phase 3 stress test was
asking for: real bodies → real LLM parsing → real search. Before
this commit, the DemandQuery was my hand-crafted string, which
made the inbox phase trivial.
Run #007 result vs #006 (same bodies, parser swapped):
All 6 inbox events parsed cleanly — qwen2.5 nailed:
"Need 50 forklift operators in Cleveland OH for Monday day
shift. OSHA-30 + active forklift cert required."
→ {role:"forklift operator", count:50, location:"Cleveland, OH",
certs:["OSHA-30","active forklift cert"], skills:[], shift:"day"}
Other 5 similarly faithful (indy stayed as "indy", count
defaulted to 1 when unspecified, no hallucinated fields).
LLM-parsed queries produced TIGHTER matches than hard-coded:
Demand #006 dist #007 dist Δ
Crane Chicago 0.499 0.093 -82%
Drone Chicago 0.707 0.073 -90%
Bilingual safety 0.240 0.048 -80%
Forklift Cleveland 0.330 0.273 -17%
Production Indy 0.260 0.399 +53%
Warehouse Milwaukee 0.458 0.420 -8%
Three matches landed at distance < 0.10 — verbatim-replay-tight
territory. Structured queries embed sharper than conversational
hand-crafted strings.
Other metrics unchanged: diversity 0.000, determinism 1.000,
verbatim handover 4/4, paraphrase handover 4/4.
Tradeoff worth flagging: the drone-Chicago case dropped from
distance 0.71 (clear "we don't have one") to 0.07 (confident match
returned). The OOD honesty signal weakens when LLM-parsed structure
makes any closest-neighbor look tight. Future Phase 4 work: judge
re-rates the top match before surfacing, so coordinators see "your
demand was for X but the closest match scored 2/5" rather than just
the worker ID + distance.
Substrate cost: +6 LLM calls per inbox burst (~9s on qwen2.5).
Production would amortize via a small dedicated parser model.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>