# Reality test real_004 — LLM-based role extractor closes shorthand hole real_003 / real_003b had a documented limitation: the regex extractor can't separate role from city in shorthand-style queries (`{count} {role} {city} {state} ...`) because there's no anchor between role and city. real_004 closes this with an LLM fallback — qwen2.5 format=json, called only when the regex returns empty. ## Architecture `roleExtractor` struct in `scripts/playbook_lift/main.go`: 1. **Regex first** (fast, deterministic) — handles need + client_first + looking patterns from real_003b. 2. **LLM fallback** when regex returns empty AND model is configured — Ollama-shape `/api/chat` with format=json, schema-tight system prompt, temperature 0. 3. **Per-process cache** — paraphrase + rejudge passes hit the same query 4× per run; cache prevents 4× LLM cost per query. 4. **Off-by-default** — `-llm-role-extract` flag (CLI) + `LLM_ROLE_EXTRACT=1` env var (harness) opt in. Default behavior is unchanged from real_003b. ## Verification — what's proven **Unit tests (8 new in `scripts/playbook_lift/main_test.go`):** - `TestRoleExtractor_RegexFirst` — when regex matches, LLM is NOT called (cost discipline preserved on the 75% of queries the regex handles). - `TestRoleExtractor_LLMFallback` — shorthand query goes to LLM, result is used. - `TestRoleExtractor_LLMOffLeavesEmpty` — without model configured, shorthand returns empty (current default). - `TestRoleExtractor_Cache` — 3 calls to same query = 1 LLM hit. - `TestRoleExtractor_NilSafe` — nil receiver runs regex only; matrixSearch + playbookRecord don't need a guard. - `TestExtractRoleViaLLM_HTTPError` + `_BadJSON` — failure paths surface error so caller can fall back cleanly. - `TestRoleExtractor_ClosesCrossRoleShorthandBleed` — the synthetic witness for the real_003 scenario: when both record AND query are shorthand (regex returns "" for both), LLM produces DIFFERENT role tokens for CNC Operator vs Forklift Operator queries → matrix gate's cross-role rejection fires correctly. This is the load-bearing verification — paired with `internal/matrix/playbook_test.go`'s `TestInjectPlaybookMisses_RoleGateRejectsCrossRole` (which uses the exact role tokens this test produces). ## Verification — what real_004 (live harness) shows Same 40-query stress file as real_003. With `LLM_ROLE_EXTRACT=1`: | Run | recordings | shorthand recordings | boosts | Detroit cluster bleed | |---|---:|---:|---:|---| | real_003 (regex only) | 7 | 1 (CNC) | 14 | **YES — w-2404 leaked** | | real_003b (extended regex) | 11 | 1 (Pickers) | 31 | none observed | | real_004 (regex + LLM) | 5 | 0 | 16 | none observed | real_004's shorthand-recordings = 0 is stochastic (HNSW build is non-deterministic across runs, so cold-pass top-1 varies and so does which queries trigger discovery). The LLM is doing work — boosts fired across all 4 styles for the Loaders + Packers + Shipping Clerk clusters, including shorthand queries. But the dataset didn't naturally produce a shorthand recording in this run, so we can't read a clean "with-vs-without" reality-test signal on the bleed. **This is why the unit-test witness exists.** The reality test confirms no regressions; the unit test proves the extraction-layer correctness on the exact failure mode real_003 surfaced. Together they cover the fix. ## Cost note LLM extraction adds ~1-3s per shorthand query on local qwen2.5. Per real_004 timing the harness took ~10 minutes for 40 queries with LLM on (vs ~2 min in real_003b). The cache makes paraphrase + rejudge passes free; first-touch shorthand queries pay the LLM cost once. For production hot paths (inbox parsing, retrieve), the LLM cost is prohibitive. The right architecture there is to extract role at INGEST time (when queries land in the inbox parser) and pass the already-resolved role through the request — same pattern as multi_coord_stress's existing `Demand{Role: ...}` shape. The harness LLM extractor is for reality-test coverage of arbitrary query shapes; production should never need it. ## Repro ```bash # Generate the same 40-query stress file go run scripts/cutover/gen_real_queries.go -limit 10 -styles all > tests/reality/real_coord_queries_v2.txt # With LLM extractor (closes shorthand hole) QUERIES_FILE=tests/reality/real_coord_queries_v2.txt RUN_ID=real_004 \ WITH_PARAPHRASE=0 WITH_REJUDGE=0 LLM_ROLE_EXTRACT=1 \ ./scripts/playbook_lift.sh # Without LLM extractor (regex-only — current default) QUERIES_FILE=tests/reality/real_coord_queries_v2.txt RUN_ID=real_004_regex_only \ WITH_PARAPHRASE=0 WITH_REJUDGE=0 \ ./scripts/playbook_lift.sh ``` Evidence: `reports/reality-tests/playbook_lift_real_004.{json,md}`.