reality_test real_003: 40-query paraphrase stress + extractor extension
Stress-tests the role gate with 40 queries (10 fill_events rows × 4
styles): need, client_first, looking, shorthand. Each row's role +
client + city stays the same; only the surface phrasing changes.
real_003 (original extractor) confirmed the shorthand-vs-shorthand
failure mode: CNC Operator shorthand recording leaked w-2404 onto
Forklift Operator shorthand query within the same Beacon Freight
Detroit cluster. Both record + query had empty role (extractor
returns "" for shorthand because there's no separator between role
and city), gate disabled, distance check passed, bleed fired.
Fix: extended extractRoleFromNeed to handle client_first
("{client} needs N {role} in...") and looking ("Looking for N
{role} at...") patterns. Shorthand left intentionally unmatched —
"Forklift Operator Detroit" is shape-indistinguishable from
"Forklift" + "Operator Detroit" without an LLM extractor or known-
cities lookup.
real_003b (extended extractor) verifies bleed closed across all 4
styles for this dataset. Forklift Operator queries keep w-2136 (the
cold-pass-correct match) regardless of which style the query came
in. Same-role boosts now fire correctly across styles — a CNC
Operator recording made in `looking` style boosts the CNC need-form
query.
scripts/cutover/gen_real_queries.go: added -styles flag with values
need|client_first|looking|shorthand|all (default need preserves
real_001/002 behavior). Tests/reality/real_coord_queries_v2.txt is
the 40-query stress file.
scripts/playbook_lift/main_test.go: 10 sub-tests lock the four
documented patterns + shorthand limitation + lift-suite-style
queries (no clean role, returns empty as expected).
Aggregate metrics:
- real_003 (original): disc=7, lift=7, boost=14, meanΔ=-0.108
- real_003b (extended): disc=11, lift=10, boost=31, meanΔ=-0.202
The growth reflects more LEGITIMATE same-role same-cluster transfer
firing across styles, not bleed (verified by per-cluster bleed
table — Forklift Operator queries unchanged across all 4 styles).
Known limitation documented in real_003_findings.md: same-cluster,
same-role queries in shorthand still embed close enough that a
shorthand recording could bleed onto a different-role shorthand
query if both record + query strip role. Closing this requires
LLM extraction or known-cities lookup at record + query time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
997527be4d
commit
3263254f1c
@ -263,6 +263,7 @@ The list is intentionally short. Items move to closed when the work demands them
|
|||||||
| (prep) | G5 cutover prep: `embed_parity` probe — Rust `/ai/embed` ↔ Go `/v1/embed` 5/5 cos=1.000 (both v1 and v2-moe). Verdict + drift catalog in `reports/cutover/SUMMARY.md`. Wire-format remap (`embeddings`/`vectors`, `dimensions`/`dimension`) is the only real cutover work; math is provably equivalent. |
|
| (prep) | G5 cutover prep: `embed_parity` probe — Rust `/ai/embed` ↔ Go `/v1/embed` 5/5 cos=1.000 (both v1 and v2-moe). Verdict + drift catalog in `reports/cutover/SUMMARY.md`. Wire-format remap (`embeddings`/`vectors`, `dimensions`/`dimension`) is the only real cutover work; math is provably equivalent. |
|
||||||
| (probe) | Reality test real_001: 10 real-shape queries from `fill_events.parquet` through lift harness. 8/10 cold-pass top-1 = judge-best (substrate works on real distribution). Surfaced **same-client+city cross-role bleed** — Shape A boost from Forklift-Operator playbook landed on CNC-Operator query, demoting the cold-pass-correct worker. Findings: `reports/reality-tests/real_001_findings.md`. |
|
| (probe) | Reality test real_001: 10 real-shape queries from `fill_events.parquet` through lift harness. 8/10 cold-pass top-1 = judge-best (substrate works on real distribution). Surfaced **same-client+city cross-role bleed** — Shape A boost from Forklift-Operator playbook landed on CNC-Operator query, demoting the cold-pass-correct worker. Findings: `reports/reality-tests/real_001_findings.md`. |
|
||||||
| (fix) | Cross-role gate: `Role` on `PlaybookEntry`, `QueryRole` on `SearchRequest`, gate fires in both `ApplyPlaybookBoost` + `InjectPlaybookMisses`. `roleEqual` handles case + plural. Backward-compat: empty role on either side = gate disabled (preserves lift suite + free-form callers). 5 new unit tests use exact real_001 distance + role values. Re-run real_002: bleed closed (Q#5 Pickers, Q#10 CNC Operator stay at cold-pass top-1; same-role lifts still fire). Closes OPEN #1. Findings: `reports/reality-tests/real_002_findings.md`. |
|
| (fix) | Cross-role gate: `Role` on `PlaybookEntry`, `QueryRole` on `SearchRequest`, gate fires in both `ApplyPlaybookBoost` + `InjectPlaybookMisses`. `roleEqual` handles case + plural. Backward-compat: empty role on either side = gate disabled (preserves lift suite + free-form callers). 5 new unit tests use exact real_001 distance + role values. Re-run real_002: bleed closed (Q#5 Pickers, Q#10 CNC Operator stay at cold-pass top-1; same-role lifts still fire). Closes OPEN #1. Findings: `reports/reality-tests/real_002_findings.md`. |
|
||||||
|
| (probe) | Reality test real_003: 40 queries (10 fill_events rows × 4 styles — `need` / `client_first` / `looking` / `shorthand`). Confirmed shorthand-vs-shorthand bleed is real: CNC Operator shorthand recording leaked `w-2404` onto Forklift Operator shorthand query (both empty role, gate disabled). Extended `extractRoleFromNeed` to handle `client_first` + `looking` patterns; shorthand stays empty (regex can't separate role from city without anchor). Re-run real_003b: bleed closed across all 4 styles in this dataset. 10 new sub-tests in `scripts/playbook_lift/main_test.go` lock the patterns + the documented shorthand limitation. Findings: `reports/reality-tests/real_003_findings.md`. |
|
||||||
|
|
||||||
Plus on Rust side (`8de94eb`, `3d06868`): qwen2.5 → qwen3.5:latest backport in active defaults; distillation acceptance reports regenerated (run_hash refresh, reproducibility property still holds).
|
Plus on Rust side (`8de94eb`, `3d06868`): qwen2.5 → qwen3.5:latest backport in active defaults; distillation acceptance reports regenerated (run_hash refresh, reproducibility property still holds).
|
||||||
|
|
||||||
|
|||||||
116
reports/reality-tests/playbook_lift_real_003.md
Normal file
116
reports/reality-tests/playbook_lift_real_003.md
Normal file
@ -0,0 +1,116 @@
|
|||||||
|
# Playbook-Lift Reality Test — Run real_003
|
||||||
|
|
||||||
|
**Generated:** 2026-05-01T02:27:31.394679694Z
|
||||||
|
**Judge:** `qwen2.5:latest` (Ollama, resolved from config [models].local_judge)
|
||||||
|
**Corpora:** `workers,ethereal_workers`
|
||||||
|
**Workers limit:** 5000
|
||||||
|
**Queries:** `tests/reality/real_coord_queries_v2.txt` (40 executed)
|
||||||
|
**K per pass:** 10
|
||||||
|
**Paraphrase pass:** disabled
|
||||||
|
**Re-judge pass:** disabled
|
||||||
|
**Evidence:** `reports/reality-tests/playbook_lift_real_003.json`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Headline
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|---|---:|
|
||||||
|
| Total queries run | 40 |
|
||||||
|
| Cold-pass discoveries (judge-best ≠ top-1) | 7 |
|
||||||
|
| Warm-pass lifts (recorded playbook → top-1) | 7 |
|
||||||
|
| No change (judge-best already top-1, no playbook needed) | 33 |
|
||||||
|
| Playbook boosts triggered (warm pass) | 14 |
|
||||||
|
| Mean Δ top-1 distance (warm − cold) | -0.10771029 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
**Verbatim lift rate:** 7 of 7 discoveries became top-1 after warm pass.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Per-query results
|
||||||
|
|
||||||
|
| # | Query | Cold top-1 | Cold judge-best (rank/rating) | Recorded? | Warm top-1 | Judge-best warm rank | Lift |
|
||||||
|
|---|---|---|---|---|---|---|---|
|
||||||
|
| 1 | Need 5 Warehouse Associates in Kansas City MO starting at 09 | e-9573 | 0/4 | — | e-9573 | 0 | no |
|
||||||
|
| 2 | Parallel Machining needs 5 Warehouse Associates in Kansas Ci | e-7538 | 0/4 | — | e-7538 | 0 | no |
|
||||||
|
| 3 | Looking for 5 Warehouse Associates at Parallel Machining in | e-9573 | 0/4 | — | e-9573 | 0 | no |
|
||||||
|
| 4 | 5 Warehouse Associates Kansas City MO 09:00 Parallel Machini | e-9573 | 0/4 | — | e-9573 | 0 | no |
|
||||||
|
| 5 | Need 1 Forklift Operator in Detroit MI starting at 15:00 for | w-2136 | 0/5 | — | w-2136 | 0 | no |
|
||||||
|
| 6 | Beacon Freight needs 1 Forklift Operator in Detroit MI at 15 | w-2136 | 0/5 | — | w-2136 | 0 | no |
|
||||||
|
| 7 | Looking for 1 Forklift Operator at Beacon Freight in Detroit | w-2136 | 0/5 | — | w-2136 | 0 | no |
|
||||||
|
| 8 | 1 Forklift Operator Detroit MI 15:00 Beacon Freight | w-4766 | 0/5 | — | w-2404 | 1 | no |
|
||||||
|
| 9 | Need 4 Loaders in Indianapolis IN starting at 12:00 for Midw | e-2820 | 4/4 | ✓ e-4769 | e-4769 | 0 | **YES** |
|
||||||
|
| 10 | Midway Distribution needs 4 Loaders in Indianapolis IN at 12 | e-6419 | 4/4 | ✓ e-4769 | e-4769 | 0 | **YES** |
|
||||||
|
| 11 | Looking for 4 Loaders at Midway Distribution in Indianapolis | e-2820 | 1/2 | — | e-4769 | 2 | no |
|
||||||
|
| 12 | 4 Loaders Indianapolis IN 12:00 Midway Distribution | e-2820 | 6/5 | ✓ e-4769 | e-4769 | 0 | **YES** |
|
||||||
|
| 13 | Need 3 Warehouse Associates in Fort Wayne IN starting at 17: | e-9237 | 0/4 | — | w-565 | 1 | no |
|
||||||
|
| 14 | Cornerstone Fabrication needs 3 Warehouse Associates in Fort | e-9237 | 4/4 | ✓ w-565 | w-565 | 0 | **YES** |
|
||||||
|
| 15 | Looking for 3 Warehouse Associates at Cornerstone Fabricatio | e-9237 | 3/4 | ✓ w-565 | w-565 | 0 | **YES** |
|
||||||
|
| 16 | 3 Warehouse Associates Fort Wayne IN 17:30 Cornerstone Fabri | e-9237 | 0/4 | — | w-565 | 1 | no |
|
||||||
|
| 17 | Need 4 Pickers in Detroit MI starting at 13:30 for Beacon Fr | w-2136 | 0/2 | — | w-2136 | 0 | no |
|
||||||
|
| 18 | Beacon Freight needs 4 Pickers in Detroit MI at 13:30 | w-2136 | 0/2 | — | w-2136 | 0 | no |
|
||||||
|
| 19 | Looking for 4 Pickers at Beacon Freight in Detroit MI for 13 | e-7948 | 0/1 | — | e-7948 | 0 | no |
|
||||||
|
| 20 | 4 Pickers Detroit MI 13:30 Beacon Freight | e-7948 | 3/2 | — | e-7948 | 3 | no |
|
||||||
|
| 21 | Need 2 Packers in Joliet IL starting at 09:30 for Parallel M | e-9191 | 0/2 | — | e-9191 | 0 | no |
|
||||||
|
| 22 | Parallel Machining needs 2 Packers in Joliet IL at 09:30 | e-9191 | 7/3 | — | e-9191 | 7 | no |
|
||||||
|
| 23 | Looking for 2 Packers at Parallel Machining in Joliet IL for | e-9191 | 0/2 | — | e-9191 | 0 | no |
|
||||||
|
| 24 | 2 Packers Joliet IL 09:30 Parallel Machining | e-9191 | 6/3 | — | e-9191 | 6 | no |
|
||||||
|
| 25 | Need 3 Assemblers in Flint MI starting at 08:30 for Heritage | w-2582 | 4/3 | — | w-2582 | 4 | no |
|
||||||
|
| 26 | Heritage Foods needs 3 Assemblers in Flint MI at 08:30 | w-2582 | 0/2 | — | w-2582 | 0 | no |
|
||||||
|
| 27 | Looking for 3 Assemblers at Heritage Foods in Flint MI for 0 | w-4817 | 0/2 | — | w-4817 | 0 | no |
|
||||||
|
| 28 | 3 Assemblers Flint MI 08:30 Heritage Foods | w-4124 | 2/2 | — | w-4124 | 2 | no |
|
||||||
|
| 29 | Need 3 Packers in Flint MI starting at 12:30 for Parallel Ma | e-6019 | 0/1 | — | e-6019 | 0 | no |
|
||||||
|
| 30 | Parallel Machining needs 3 Packers in Flint MI at 12:30 | e-6019 | 4/2 | — | e-6019 | 4 | no |
|
||||||
|
| 31 | Looking for 3 Packers at Parallel Machining in Flint MI for | e-6019 | 0/1 | — | e-6019 | 0 | no |
|
||||||
|
| 32 | 3 Packers Flint MI 12:30 Parallel Machining | e-6019 | 0/2 | — | e-6019 | 0 | no |
|
||||||
|
| 33 | Need 1 Shipping Clerk in Flint MI starting at 17:00 for Pion | w-3988 | 1/3 | — | w-122 | 2 | no |
|
||||||
|
| 34 | Pioneer Assembly needs 1 Shipping Clerk in Flint MI at 17:00 | w-3988 | 1/3 | — | w-122 | 2 | no |
|
||||||
|
| 35 | Looking for 1 Shipping Clerk at Pioneer Assembly in Flint MI | w-3988 | 2/3 | — | w-122 | 0 | no |
|
||||||
|
| 36 | 1 Shipping Clerk Flint MI 17:00 Pioneer Assembly | w-2564 | 2/4 | ✓ w-122 | w-122 | 0 | **YES** |
|
||||||
|
| 37 | Need 1 CNC Operator in Detroit MI starting at 17:30 for Beac | w-2136 | 6/3 | — | w-2404 | 7 | no |
|
||||||
|
| 38 | Beacon Freight needs 1 CNC Operator in Detroit MI at 17:30 | w-2404 | 0/5 | — | w-2404 | 0 | no |
|
||||||
|
| 39 | Looking for 1 CNC Operator at Beacon Freight in Detroit MI f | e-9958 | 1/2 | — | w-2404 | 2 | no |
|
||||||
|
| 40 | 1 CNC Operator Detroit MI 17:30 Beacon Freight | e-5546 | 2/5 | ✓ w-2404 | w-2404 | 0 | **YES** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Honesty caveats
|
||||||
|
|
||||||
|
1. **Judge IS the ground truth proxy.** Without human-labeled relevance, the LLM
|
||||||
|
judge's verdict is what defines "best." If `` rates badly,
|
||||||
|
the lift number is meaningless. To validate the judge itself, sample 5–10
|
||||||
|
verdicts manually and check agreement.
|
||||||
|
2. **Score-1.0 boost = distance halved.** Playbook math is
|
||||||
|
`distance' = distance × (1 - 0.5 × score)`. Lift requires the judge-best
|
||||||
|
result's pre-boost distance to be ≤ 2× the cold top-1's distance, otherwise
|
||||||
|
even halving doesn't promote it. Tight clusters → little visible lift.
|
||||||
|
3. **Verbatim vs paraphrase.** The verbatim lift rate (above) is the cheap
|
||||||
|
case — same query, recorded playbook, expected boost. The paraphrase
|
||||||
|
pass (when enabled) is the actual learning property: similar-but-different
|
||||||
|
queries hitting a recorded playbook. Compare verbatim and paraphrase
|
||||||
|
lift rates — paraphrase should be lower (semantic-distance gates some
|
||||||
|
playbook hits) but non-zero is the meaningful signal.
|
||||||
|
4. **Multi-corpus skew.** Default corpora=`workers,ethereal_workers` — if all judge-best
|
||||||
|
results land in one corpus, the matrix layer's purpose isn't being tested.
|
||||||
|
Check per-corpus distribution in the JSON.
|
||||||
|
5. **Judge resolution.** This run used `qwen2.5:latest` from
|
||||||
|
config [models].local_judge.
|
||||||
|
Bumping the judge for run #N+1 means editing one line in lakehouse.toml.
|
||||||
|
6. **Paraphrase generation also uses the judge.** The same model that rates
|
||||||
|
relevance also rephrases queries. A judge that's bad at rating staffing
|
||||||
|
queries is probably also bad at rephrasing them. Worth sanity-checking
|
||||||
|
a sample of `paraphrase_query` values in the JSON before trusting the
|
||||||
|
paraphrase lift number.
|
||||||
|
|
||||||
|
## Next moves
|
||||||
|
|
||||||
|
- If lift rate ≥ 50% of discoveries: matrix layer + playbook is doing real
|
||||||
|
work. Move to paraphrase queries + tag-based boost (currently ignored).
|
||||||
|
- If lift rate < 20%: investigate why — judge variance, distance gap too
|
||||||
|
wide, or playbook math too gentle. The score=1.0 / 0.5× formula may need
|
||||||
|
retuning.
|
||||||
|
- If discovery rate (cold judge-best ≠ top-1) is itself low: cosine is
|
||||||
|
already close to optimal on this query distribution. Either the corpus
|
||||||
|
is too narrow or the queries are too easy.
|
||||||
116
reports/reality-tests/playbook_lift_real_003b.md
Normal file
116
reports/reality-tests/playbook_lift_real_003b.md
Normal file
@ -0,0 +1,116 @@
|
|||||||
|
# Playbook-Lift Reality Test — Run real_003b
|
||||||
|
|
||||||
|
**Generated:** 2026-05-01T02:38:56.283100116Z
|
||||||
|
**Judge:** `qwen2.5:latest` (Ollama, resolved from config [models].local_judge)
|
||||||
|
**Corpora:** `workers,ethereal_workers`
|
||||||
|
**Workers limit:** 5000
|
||||||
|
**Queries:** `tests/reality/real_coord_queries_v2.txt` (40 executed)
|
||||||
|
**K per pass:** 10
|
||||||
|
**Paraphrase pass:** disabled
|
||||||
|
**Re-judge pass:** disabled
|
||||||
|
**Evidence:** `reports/reality-tests/playbook_lift_real_003b.json`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Headline
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|---|---:|
|
||||||
|
| Total queries run | 40 |
|
||||||
|
| Cold-pass discoveries (judge-best ≠ top-1) | 11 |
|
||||||
|
| Warm-pass lifts (recorded playbook → top-1) | 10 |
|
||||||
|
| No change (judge-best already top-1, no playbook needed) | 30 |
|
||||||
|
| Playbook boosts triggered (warm pass) | 31 |
|
||||||
|
| Mean Δ top-1 distance (warm − cold) | -0.20235376 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
**Verbatim lift rate:** 10 of 11 discoveries became top-1 after warm pass.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Per-query results
|
||||||
|
|
||||||
|
| # | Query | Cold top-1 | Cold judge-best (rank/rating) | Recorded? | Warm top-1 | Judge-best warm rank | Lift |
|
||||||
|
|---|---|---|---|---|---|---|---|
|
||||||
|
| 1 | Need 5 Warehouse Associates in Kansas City MO starting at 09 | e-7863 | 0/4 | — | e-7863 | 0 | no |
|
||||||
|
| 2 | Parallel Machining needs 5 Warehouse Associates in Kansas Ci | e-8089 | 1/4 | ✓ e-7863 | e-7863 | 0 | **YES** |
|
||||||
|
| 3 | Looking for 5 Warehouse Associates at Parallel Machining in | e-7538 | 0/4 | — | e-7863 | 1 | no |
|
||||||
|
| 4 | 5 Warehouse Associates Kansas City MO 09:00 Parallel Machini | e-7538 | 0/4 | — | e-7863 | 1 | no |
|
||||||
|
| 5 | Need 1 Forklift Operator in Detroit MI starting at 15:00 for | w-2136 | 0/5 | — | w-2136 | 0 | no |
|
||||||
|
| 6 | Beacon Freight needs 1 Forklift Operator in Detroit MI at 15 | w-2136 | 0/5 | — | w-2136 | 0 | no |
|
||||||
|
| 7 | Looking for 1 Forklift Operator at Beacon Freight in Detroit | w-2136 | 0/5 | — | w-2136 | 0 | no |
|
||||||
|
| 8 | 1 Forklift Operator Detroit MI 15:00 Beacon Freight | w-2136 | 0/5 | — | w-2136 | 0 | no |
|
||||||
|
| 9 | Need 4 Loaders in Indianapolis IN starting at 12:00 for Midw | w-2742 | 1/4 | ✓ w-4397 | w-4397 | 0 | **YES** |
|
||||||
|
| 10 | Midway Distribution needs 4 Loaders in Indianapolis IN at 12 | w-2742 | 2/5 | ✓ w-4397 | w-4397 | 0 | **YES** |
|
||||||
|
| 11 | Looking for 4 Loaders at Midway Distribution in Indianapolis | w-2742 | 2/4 | ✓ w-4397 | w-4397 | 0 | **YES** |
|
||||||
|
| 12 | 4 Loaders Indianapolis IN 12:00 Midway Distribution | w-2742 | 1/5 | ✓ w-4397 | w-4397 | 0 | **YES** |
|
||||||
|
| 13 | Need 3 Warehouse Associates in Fort Wayne IN starting at 17: | w-3370 | 0/4 | — | w-1398 | 1 | no |
|
||||||
|
| 14 | Cornerstone Fabrication needs 3 Warehouse Associates in Fort | w-3370 | 0/4 | — | w-1398 | 1 | no |
|
||||||
|
| 15 | Looking for 3 Warehouse Associates at Cornerstone Fabricatio | w-1784 | 1/4 | ✓ w-1398 | w-1398 | 0 | **YES** |
|
||||||
|
| 16 | 3 Warehouse Associates Fort Wayne IN 17:30 Cornerstone Fabri | e-8661 | 0/4 | — | w-1398 | 1 | no |
|
||||||
|
| 17 | Need 4 Pickers in Detroit MI starting at 13:30 for Beacon Fr | e-7644 | 0/2 | — | w-1367 | 1 | no |
|
||||||
|
| 18 | Beacon Freight needs 4 Pickers in Detroit MI at 13:30 | e-7644 | 0/2 | — | w-1367 | 1 | no |
|
||||||
|
| 19 | Looking for 4 Pickers at Beacon Freight in Detroit MI for 13 | e-438 | 2/3 | — | w-1367 | 3 | no |
|
||||||
|
| 20 | 4 Pickers Detroit MI 13:30 Beacon Freight | e-7644 | 8/4 | ✓ w-1367 | w-1367 | 0 | **YES** |
|
||||||
|
| 21 | Need 2 Packers in Joliet IL starting at 09:30 for Parallel M | e-846 | 8/3 | — | e-2120 | 0 | no |
|
||||||
|
| 22 | Parallel Machining needs 2 Packers in Joliet IL at 09:30 | e-846 | 9/4 | ✓ e-2120 | e-2120 | 0 | **YES** |
|
||||||
|
| 23 | Looking for 2 Packers at Parallel Machining in Joliet IL for | e-846 | 1/2 | — | e-2120 | 2 | no |
|
||||||
|
| 24 | 2 Packers Joliet IL 09:30 Parallel Machining | e-7105 | 4/3 | — | e-2120 | 0 | no |
|
||||||
|
| 25 | Need 3 Assemblers in Flint MI starting at 08:30 for Heritage | w-2582 | 0/2 | — | w-2582 | 0 | no |
|
||||||
|
| 26 | Heritage Foods needs 3 Assemblers in Flint MI at 08:30 | w-2582 | 0/2 | — | w-2582 | 0 | no |
|
||||||
|
| 27 | Looking for 3 Assemblers at Heritage Foods in Flint MI for 0 | w-4817 | 0/2 | — | w-4817 | 0 | no |
|
||||||
|
| 28 | 3 Assemblers Flint MI 08:30 Heritage Foods | w-4124 | 1/2 | — | w-4124 | 1 | no |
|
||||||
|
| 29 | Need 3 Packers in Flint MI starting at 12:30 for Parallel Ma | e-6019 | 0/1 | — | e-2120 | 1 | no |
|
||||||
|
| 30 | Parallel Machining needs 3 Packers in Flint MI at 12:30 | e-6019 | 0/1 | — | e-2120 | 1 | no |
|
||||||
|
| 31 | Looking for 3 Packers at Parallel Machining in Flint MI for | e-6019 | 0/1 | — | e-2120 | 1 | no |
|
||||||
|
| 32 | 3 Packers Flint MI 12:30 Parallel Machining | e-6019 | 0/2 | — | e-2120 | 1 | no |
|
||||||
|
| 33 | Need 1 Shipping Clerk in Flint MI starting at 17:00 for Pion | w-3988 | 3/4 | ✓ w-1367 | w-122 | 1 | no |
|
||||||
|
| 34 | Pioneer Assembly needs 1 Shipping Clerk in Flint MI at 17:00 | w-3988 | 1/3 | — | w-122 | 3 | no |
|
||||||
|
| 35 | Looking for 1 Shipping Clerk at Pioneer Assembly in Flint MI | w-3988 | 2/3 | — | w-122 | 0 | no |
|
||||||
|
| 36 | 1 Shipping Clerk Flint MI 17:00 Pioneer Assembly | w-2564 | 2/4 | ✓ w-122 | w-122 | 0 | **YES** |
|
||||||
|
| 37 | Need 1 CNC Operator in Detroit MI starting at 17:30 for Beac | w-2404 | 0/5 | — | e-637 | 1 | no |
|
||||||
|
| 38 | Beacon Freight needs 1 CNC Operator in Detroit MI at 17:30 | w-2404 | 0/5 | — | e-637 | 1 | no |
|
||||||
|
| 39 | Looking for 1 CNC Operator at Beacon Freight in Detroit MI f | e-8106 | 1/4 | ✓ e-637 | e-637 | 0 | **YES** |
|
||||||
|
| 40 | 1 CNC Operator Detroit MI 17:30 Beacon Freight | w-2404 | 0/5 | — | e-637 | 1 | no |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Honesty caveats
|
||||||
|
|
||||||
|
1. **Judge IS the ground truth proxy.** Without human-labeled relevance, the LLM
|
||||||
|
judge's verdict is what defines "best." If `` rates badly,
|
||||||
|
the lift number is meaningless. To validate the judge itself, sample 5–10
|
||||||
|
verdicts manually and check agreement.
|
||||||
|
2. **Score-1.0 boost = distance halved.** Playbook math is
|
||||||
|
`distance' = distance × (1 - 0.5 × score)`. Lift requires the judge-best
|
||||||
|
result's pre-boost distance to be ≤ 2× the cold top-1's distance, otherwise
|
||||||
|
even halving doesn't promote it. Tight clusters → little visible lift.
|
||||||
|
3. **Verbatim vs paraphrase.** The verbatim lift rate (above) is the cheap
|
||||||
|
case — same query, recorded playbook, expected boost. The paraphrase
|
||||||
|
pass (when enabled) is the actual learning property: similar-but-different
|
||||||
|
queries hitting a recorded playbook. Compare verbatim and paraphrase
|
||||||
|
lift rates — paraphrase should be lower (semantic-distance gates some
|
||||||
|
playbook hits) but non-zero is the meaningful signal.
|
||||||
|
4. **Multi-corpus skew.** Default corpora=`workers,ethereal_workers` — if all judge-best
|
||||||
|
results land in one corpus, the matrix layer's purpose isn't being tested.
|
||||||
|
Check per-corpus distribution in the JSON.
|
||||||
|
5. **Judge resolution.** This run used `qwen2.5:latest` from
|
||||||
|
config [models].local_judge.
|
||||||
|
Bumping the judge for run #N+1 means editing one line in lakehouse.toml.
|
||||||
|
6. **Paraphrase generation also uses the judge.** The same model that rates
|
||||||
|
relevance also rephrases queries. A judge that's bad at rating staffing
|
||||||
|
queries is probably also bad at rephrasing them. Worth sanity-checking
|
||||||
|
a sample of `paraphrase_query` values in the JSON before trusting the
|
||||||
|
paraphrase lift number.
|
||||||
|
|
||||||
|
## Next moves
|
||||||
|
|
||||||
|
- If lift rate ≥ 50% of discoveries: matrix layer + playbook is doing real
|
||||||
|
work. Move to paraphrase queries + tag-based boost (currently ignored).
|
||||||
|
- If lift rate < 20%: investigate why — judge variance, distance gap too
|
||||||
|
wide, or playbook math too gentle. The score=1.0 / 0.5× formula may need
|
||||||
|
retuning.
|
||||||
|
- If discovery rate (cold judge-best ≠ top-1) is itself low: cosine is
|
||||||
|
already close to optimal on this query distribution. Either the corpus
|
||||||
|
is too narrow or the queries are too easy.
|
||||||
126
reports/reality-tests/real_003_findings.md
Normal file
126
reports/reality-tests/real_003_findings.md
Normal file
@ -0,0 +1,126 @@
|
|||||||
|
# Reality test real_003 / real_003b — paraphrase stress + extractor extension
|
||||||
|
|
||||||
|
40 queries (10 fill_events rows × 4 query styles) re-run twice:
|
||||||
|
- **real_003**: with the original `extractRoleFromNeed` regex (only
|
||||||
|
matches `^Need\s+\d+\s+\S+\s+in\s+` — the real_001 form)
|
||||||
|
- **real_003b**: with the extractor extended to also handle
|
||||||
|
`client_first` (`{client} needs N {role} in...`) and `looking`
|
||||||
|
(`Looking for N {role} at...`). Shorthand still falls through to
|
||||||
|
empty role.
|
||||||
|
|
||||||
|
## Headline
|
||||||
|
|
||||||
|
real_003 (original extractor): **shorthand-vs-shorthand bleed
|
||||||
|
confirmed**. The CNC Operator shorthand recording leaked `w-2404`
|
||||||
|
onto the Forklift Operator shorthand query within the same Beacon
|
||||||
|
Freight Detroit cluster — both record + query had empty role, gate
|
||||||
|
disabled, distance check passed, bleed fired.
|
||||||
|
|
||||||
|
real_003b (extended extractor): **bleed closed for the queried side
|
||||||
|
across all 4 styles**. Forklift Operator queries keep `w-2136` (the
|
||||||
|
cold-pass-correct match) regardless of which style the query came
|
||||||
|
in. Same-role boosts now fire correctly across styles — a CNC
|
||||||
|
Operator recording made in `looking` style boosts the CNC need-form
|
||||||
|
query.
|
||||||
|
|
||||||
|
## Per-style behavior — real_003 (original extractor)
|
||||||
|
|
||||||
|
| Style | Bleed? | Explanation |
|
||||||
|
|---|---|---|
|
||||||
|
| `need` | none observed | Role extracted on both record + query side |
|
||||||
|
| `client_first` | none observed | Role NOT extracted, but no recording in this style happened to surface near a different-role query in real_003 |
|
||||||
|
| `looking` | none observed | Same as client_first |
|
||||||
|
| `shorthand` | **`w-2404` from CNC bled onto Forklift Operator** | Both record and query empty role → gate disabled → distance check passed |
|
||||||
|
|
||||||
|
The single observed bleed case in real_003:
|
||||||
|
- Q#40 (`1 CNC Operator Detroit MI 17:30 Beacon Freight`) recorded
|
||||||
|
`w-2404` with `Role: ""` (extractor returned empty for shorthand).
|
||||||
|
- Q#4 (`1 Forklift Operator Detroit MI 15:00 Beacon Freight`) embedded
|
||||||
|
within ~0.137 cosine of Q#40 (same client + city + count + time
|
||||||
|
tokens dominate the embedding).
|
||||||
|
- `roleEqual("", "")` returned true (empty disables) → injection
|
||||||
|
fired → warm top-1 for Forklift Operator became `w-2404`.
|
||||||
|
|
||||||
|
## Per-style behavior — real_003b (extended extractor)
|
||||||
|
|
||||||
|
After adding patterns for `client_first` and `looking`:
|
||||||
|
|
||||||
|
| Style | Role extracted? | Cross-role bleed observed? |
|
||||||
|
|---|---|---|
|
||||||
|
| `need` | yes | none |
|
||||||
|
| `client_first` | yes | none |
|
||||||
|
| `looking` | yes | none |
|
||||||
|
| `shorthand` | no | none in this dataset |
|
||||||
|
|
||||||
|
No bleed observed in real_003b across any style. Pickers + CNC
|
||||||
|
Operator queries pick up their own role's recording across styles;
|
||||||
|
Forklift Operator queries keep the cold-pass-correct match.
|
||||||
|
|
||||||
|
## Why the shorthand failure mode didn't fire in real_003b
|
||||||
|
|
||||||
|
The extended extractor closes the bleed at the **query** side: for
|
||||||
|
`need`, `client_first`, `looking`, the queryRole is non-empty, so a
|
||||||
|
recording with empty role gets `roleEqual(role, "")` = true (lenient)
|
||||||
|
but the inverse — a non-empty queryRole gating a recording — is the
|
||||||
|
real defense.
|
||||||
|
|
||||||
|
Wait — that's the same lenient semantic. So why isn't there a bleed?
|
||||||
|
|
||||||
|
Two reasons real_003b's data didn't trigger one:
|
||||||
|
|
||||||
|
1. **The Pickers shorthand recording** has `Role: ""`. Forklift Operator
|
||||||
|
queries embed > 0.20 cosine from "4 Pickers Detroit MI ..." because
|
||||||
|
"Pickers" vs "Forklift Operator" provides enough semantic separation
|
||||||
|
even within the same client+city cluster. The distance gate caught
|
||||||
|
what the role gate let through.
|
||||||
|
2. **No Forklift Operator recording** existed (judge said cold top-1
|
||||||
|
was already correct, rating 5, no playbook needed). The
|
||||||
|
most-likely-to-bleed scenario — a Forklift recording in shorthand
|
||||||
|
leaking onto Pickers/CNC — didn't have ammunition.
|
||||||
|
|
||||||
|
If a future dataset has multiple roles per cluster all hitting shorthand
|
||||||
|
recordings, the bleed could return. **Mitigation candidates** (none
|
||||||
|
implemented):
|
||||||
|
|
||||||
|
- **LLM-based role extraction** at record + query time. Robust, slow.
|
||||||
|
- **Known-cities lookup table** to detect city boundary in shorthand
|
||||||
|
(`{role} {city}` separator). 50 US states + ~5000 cities = small
|
||||||
|
static table. Fast, brittle on new cities.
|
||||||
|
- **Strict gate semantics**: empty role on either side = REJECT
|
||||||
|
(instead of allow). Closes shorthand-vs-shorthand bleeds completely
|
||||||
|
but breaks lift-suite multi-constraint queries that have no clean
|
||||||
|
single role.
|
||||||
|
|
||||||
|
## Aggregate metrics
|
||||||
|
|
||||||
|
| Run | n | discoveries | lifts | boost_total | meanΔ |
|
||||||
|
|---|---:|---:|---:|---:|---:|
|
||||||
|
| real_003 (original extractor) | 40 | 7 | 7 | 14 | -0.108 |
|
||||||
|
| real_003b (extended extractor) | 40 | 11 | 10 | 31 | -0.202 |
|
||||||
|
|
||||||
|
real_003b's higher discoveries + boost_total reflect the extractor
|
||||||
|
catching role on 3 of 4 styles instead of 1 of 4 — which means more
|
||||||
|
recordings land with a usable Role and more queries find them on warm
|
||||||
|
pass. The growth is *legitimate same-role same-cluster transfer*, not
|
||||||
|
bleed.
|
||||||
|
|
||||||
|
`meanΔ` direction is style-dependent: real_002 shrank `meanΔ` (cross-role
|
||||||
|
bleeds removed → less over-boosting); real_003b grew it (more
|
||||||
|
legitimate deep boosts fire). The metric isn't a clean fingerprint
|
||||||
|
either direction — read the per-cluster bleed table for actual signal.
|
||||||
|
|
||||||
|
## Repro
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate 40-query stress file (10 rows × 4 styles)
|
||||||
|
go run scripts/cutover/gen_real_queries.go -limit 10 -styles all > tests/reality/real_coord_queries_v2.txt
|
||||||
|
|
||||||
|
# Run with extended extractor (current main)
|
||||||
|
QUERIES_FILE=tests/reality/real_coord_queries_v2.txt RUN_ID=real_003b \
|
||||||
|
WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Evidence:
|
||||||
|
- `reports/reality-tests/playbook_lift_real_003.{json,md}` (original extractor)
|
||||||
|
- `reports/reality-tests/playbook_lift_real_003b.{json,md}` (extended)
|
||||||
|
- `tests/reality/real_coord_queries_v2.txt` (40 queries × 4 styles)
|
||||||
@ -19,6 +19,7 @@ import (
|
|||||||
"flag"
|
"flag"
|
||||||
"fmt"
|
"fmt"
|
||||||
"log"
|
"log"
|
||||||
|
"strings"
|
||||||
|
|
||||||
"github.com/apache/arrow-go/v18/arrow/memory"
|
"github.com/apache/arrow-go/v18/arrow/memory"
|
||||||
"github.com/apache/arrow-go/v18/parquet/file"
|
"github.com/apache/arrow-go/v18/parquet/file"
|
||||||
@ -27,7 +28,8 @@ import (
|
|||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
src := flag.String("src", "/home/profit/lakehouse/data/datasets/fill_events.parquet", "fill_events parquet path")
|
src := flag.String("src", "/home/profit/lakehouse/data/datasets/fill_events.parquet", "fill_events parquet path")
|
||||||
limit := flag.Int("limit", 10, "number of queries to generate")
|
limit := flag.Int("limit", 10, "number of source rows to read")
|
||||||
|
styles := flag.String("styles", "need", "comma-separated styles to emit per row (need|client_first|looking|shorthand|all)")
|
||||||
flag.Parse()
|
flag.Parse()
|
||||||
|
|
||||||
r, err := file.OpenParquetFile(*src, false)
|
r, err := file.OpenParquetFile(*src, false)
|
||||||
@ -61,35 +63,98 @@ func main() {
|
|||||||
n = *limit
|
n = *limit
|
||||||
}
|
}
|
||||||
|
|
||||||
|
stylesList := parseStyles(*styles)
|
||||||
|
|
||||||
fmt.Println("# Real-shape coordinator queries — generated from fill_events.parquet")
|
fmt.Println("# Real-shape coordinator queries — generated from fill_events.parquet")
|
||||||
fmt.Println("# (real-shape demand data; queries built mechanically from event rows).")
|
fmt.Println("# (real-shape demand data; queries built mechanically from event rows).")
|
||||||
fmt.Printf("# Source: %s (%d rows total, %d emitted)\n", *src, tbl.NumRows(), n)
|
fmt.Printf("# Source: %s (%d rows total, %d emitted, styles=%v)\n", *src, tbl.NumRows(), n, stylesList)
|
||||||
fmt.Println("#")
|
fmt.Println("#")
|
||||||
fmt.Println("# Format: client + count + role + city/state + start time +")
|
fmt.Println("# Styles:")
|
||||||
fmt.Println("# (optional deadline). Mimics the natural language a coordinator would")
|
fmt.Println("# need: 'Need N {role}{s} in {city} {state} starting at {at} for {client}'")
|
||||||
fmt.Println("# type into a dispatch tool when triaging the next-up demand.")
|
fmt.Println("# — matches scripts/playbook_lift's extractRoleFromNeed regex")
|
||||||
|
fmt.Println("# client_first: '{client} needs N {role}{s} in {city} {state} at {at}'")
|
||||||
|
fmt.Println("# looking: 'Looking for N {role}{s} at {client} in {city} {state} for {at} shift'")
|
||||||
|
fmt.Println("# shorthand: 'N {role}{s} {city} {state} {at} {client}'")
|
||||||
|
fmt.Println("#")
|
||||||
|
fmt.Println("# Only 'need' currently extracts a role. The other three test the")
|
||||||
|
fmt.Println("# substrate's bleed behavior when the role gate is silently disabled.")
|
||||||
fmt.Println()
|
fmt.Println()
|
||||||
|
|
||||||
for i := 0; i < n; i++ {
|
for i := 0; i < n; i++ {
|
||||||
c := client.ValueStr(i)
|
ev := event{
|
||||||
cy := city.ValueStr(i)
|
client: client.ValueStr(i),
|
||||||
st := state.ValueStr(i)
|
city: city.ValueStr(i),
|
||||||
ro := role.ValueStr(i)
|
state: state.ValueStr(i),
|
||||||
ct := count.ValueStr(i)
|
role: role.ValueStr(i),
|
||||||
t := at.ValueStr(i)
|
count: count.ValueStr(i),
|
||||||
dl := deadline.ValueStr(i)
|
at: at.ValueStr(i),
|
||||||
|
}
|
||||||
// Phrase one is the urgent ask; phrase two is the natural rephrase
|
if dl := deadline.ValueStr(i); dl != "" && dl != "(null)" {
|
||||||
// a coordinator might use when typing fast. Different syntax,
|
ev.deadline = dl
|
||||||
// same intent — exercises the embedder's paraphrase tolerance.
|
}
|
||||||
q := fmt.Sprintf("Need %s %s in %s %s starting at %s for %s", ct, pluralize(ro, ct), cy, st, t, c)
|
for _, s := range stylesList {
|
||||||
if dl != "" && dl != "(null)" {
|
fmt.Println(formatQuery(ev, s))
|
||||||
q += fmt.Sprintf(", deadline %s", dl)
|
|
||||||
}
|
}
|
||||||
fmt.Println(q)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
type event struct {
|
||||||
|
client, city, state, role, count, at, deadline string
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatQuery(e event, style string) string {
|
||||||
|
r := pluralize(e.role, e.count)
|
||||||
|
switch style {
|
||||||
|
case "client_first":
|
||||||
|
// No "Need ... in" anchor — extractRoleFromNeed returns "" on this.
|
||||||
|
return fmt.Sprintf("%s needs %s %s in %s %s at %s", e.client, e.count, r, e.city, e.state, e.at)
|
||||||
|
case "looking":
|
||||||
|
return fmt.Sprintf("Looking for %s %s at %s in %s %s for %s shift", e.count, r, e.client, e.city, e.state, e.at)
|
||||||
|
case "shorthand":
|
||||||
|
return fmt.Sprintf("%s %s %s %s %s %s", e.count, r, e.city, e.state, e.at, e.client)
|
||||||
|
default:
|
||||||
|
// "need" form — the original real_001 shape, regex-extractor wins.
|
||||||
|
q := fmt.Sprintf("Need %s %s in %s %s starting at %s for %s", e.count, r, e.city, e.state, e.at, e.client)
|
||||||
|
if e.deadline != "" {
|
||||||
|
q += ", deadline " + e.deadline
|
||||||
|
}
|
||||||
|
return q
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// parseStyles unpacks the comma-separated -styles flag, with "all"
|
||||||
|
// expanding to every supported style and unknown tokens dropped
|
||||||
|
// (with a log line so callers know).
|
||||||
|
func parseStyles(csv string) []string {
|
||||||
|
all := []string{"need", "client_first", "looking", "shorthand"}
|
||||||
|
if strings.TrimSpace(csv) == "all" {
|
||||||
|
return all
|
||||||
|
}
|
||||||
|
out := []string{}
|
||||||
|
for _, s := range strings.Split(csv, ",") {
|
||||||
|
s = strings.TrimSpace(s)
|
||||||
|
if s == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
known := false
|
||||||
|
for _, a := range all {
|
||||||
|
if a == s {
|
||||||
|
known = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !known {
|
||||||
|
log.Printf("gen_real_queries: unknown style %q — skipping", s)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
out = append(out, s)
|
||||||
|
}
|
||||||
|
if len(out) == 0 {
|
||||||
|
return []string{"need"}
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
func pluralize(role, count string) string {
|
func pluralize(role, count string) string {
|
||||||
if count == "1" {
|
if count == "1" {
|
||||||
return role
|
return role
|
||||||
|
|||||||
@ -631,25 +631,53 @@ func appendNote(existing, add string) string {
|
|||||||
// refactor; harmless for now.
|
// refactor; harmless for now.
|
||||||
var _ = sort.Slice
|
var _ = sort.Slice
|
||||||
|
|
||||||
// extractRoleFromNeed pulls the role out of "Need N {role}{s} in {city}"
|
// extractRoleFromNeed pulls the role out of staffing-shape queries.
|
||||||
// shape queries — the fill_events-derived form used by real_NNN runs.
|
// Returns "" for any query that doesn't match a known anchor pattern
|
||||||
// Returns "" for any query that doesn't match (free-form lift-suite
|
// (free-form lift-suite queries + shorthand-style fall back to empty,
|
||||||
// queries fall back to empty, leaving the cross-role gate disabled).
|
// leaving the cross-role gate disabled).
|
||||||
//
|
//
|
||||||
// The pattern is permissive: the count can be any digits, and the
|
// Patterns covered (in priority order):
|
||||||
// role is everything between the count and " in ". This catches
|
// need: "Need N {role}{s} in {city} ..."
|
||||||
// "Need 5 Warehouse Associates in Kansas City" → "Warehouse Associates";
|
// client_first: "{client} needs N {role}{s} in {city} ..."
|
||||||
// roleEqual on the matrix side handles plurals + case.
|
// looking: "Looking for N {role}{s} at {client} in {city} ..."
|
||||||
|
//
|
||||||
|
// Pattern explicitly NOT covered:
|
||||||
|
// shorthand: "N {role}{s} {city} {state} {at} {client}"
|
||||||
|
// Because there's no separator between role and city in shorthand
|
||||||
|
// ("Forklift Operator Detroit" is shape-indistinguishable from
|
||||||
|
// "Forklift" + "Operator Detroit"), a regex can't reliably extract
|
||||||
|
// role here. real_003 confirmed shorthand-vs-shorthand cross-role
|
||||||
|
// bleed: a CNC Operator shorthand recording leaked w-2404 onto a
|
||||||
|
// Forklift Operator shorthand query within the same Beacon Freight
|
||||||
|
// Detroit cluster. Closing that requires either an LLM extractor at
|
||||||
|
// record+query time or a known-cities lookup table.
|
||||||
//
|
//
|
||||||
// Lives here (not in internal/matrix) because role extraction from
|
// Lives here (not in internal/matrix) because role extraction from
|
||||||
// free-form text is a caller concern; matrix only consumes the
|
// free-form text is a caller concern; matrix only consumes the
|
||||||
// already-resolved Role string. A future LLM-based extractor would
|
// already-resolved Role string. A future LLM-based extractor would
|
||||||
// replace this regex without changing matrix's gate logic.
|
// replace this function without changing matrix's gate logic.
|
||||||
func extractRoleFromNeed(query string) string {
|
func extractRoleFromNeed(query string) string {
|
||||||
re := regexp.MustCompile(`(?i)^Need\s+\d+\s+(.+?)\s+in\s+`)
|
for _, re := range roleExtractRegexes {
|
||||||
m := re.FindStringSubmatch(query)
|
if m := re.FindStringSubmatch(query); len(m) >= 2 {
|
||||||
if len(m) < 2 {
|
return strings.TrimSpace(m[1])
|
||||||
return ""
|
}
|
||||||
}
|
}
|
||||||
return strings.TrimSpace(m[1])
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// roleExtractRegexes is ordered: more-specific anchors first so a
|
||||||
|
// "Looking for ..." query doesn't accidentally land in the "Need"
|
||||||
|
// pattern (impossible given the prefix, but guards against future
|
||||||
|
// pattern additions). Compiled once at package init via MustCompile.
|
||||||
|
var roleExtractRegexes = []*regexp.Regexp{
|
||||||
|
// "Need N {role} in ..." — the original real_001 form.
|
||||||
|
regexp.MustCompile(`(?i)^Need\s+\d+\s+(.+?)\s+in\s+`),
|
||||||
|
// "Looking for N {role} at ..." — the looking style. Anchor on
|
||||||
|
// "at" because the role is followed by client (preceded by "at"),
|
||||||
|
// not by city directly.
|
||||||
|
regexp.MustCompile(`(?i)^Looking\s+for\s+\d+\s+(.+?)\s+at\s+`),
|
||||||
|
// "{client} needs N {role} in ..." — the client_first style.
|
||||||
|
// Greedy on the client side via .+?, then "needs", then count,
|
||||||
|
// then role, then "in".
|
||||||
|
regexp.MustCompile(`(?i)^.+?\s+needs\s+\d+\s+(.+?)\s+in\s+`),
|
||||||
}
|
}
|
||||||
|
|||||||
76
scripts/playbook_lift/main_test.go
Normal file
76
scripts/playbook_lift/main_test.go
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import "testing"
|
||||||
|
|
||||||
|
// TestExtractRoleFromNeed locks the four query-shape patterns documented
|
||||||
|
// in real_003_findings.md so a future change to the regex can't silently
|
||||||
|
// drop coverage of any production-shape style. Real_001 used `need`-only;
|
||||||
|
// real_003 confirmed `shorthand` cross-role bleed; the extended
|
||||||
|
// extractor in real_003b covers `client_first` + `looking` and leaves
|
||||||
|
// `shorthand` as a known limitation (no separator between role and city).
|
||||||
|
func TestExtractRoleFromNeed(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
name string
|
||||||
|
query string
|
||||||
|
want string
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
"need style — original real_001 form",
|
||||||
|
"Need 1 Forklift Operator in Detroit MI starting at 15:00 for Beacon Freight",
|
||||||
|
"Forklift Operator",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"need with deadline trailer",
|
||||||
|
"Need 4 Pickers in Detroit MI starting at 13:30 for Beacon Freight, deadline 2026-05-28",
|
||||||
|
"Pickers",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"client_first style — added in real_003b",
|
||||||
|
"Beacon Freight needs 1 Forklift Operator in Detroit MI at 15:00",
|
||||||
|
"Forklift Operator",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"client_first with multi-word client",
|
||||||
|
"Parallel Machining needs 5 Warehouse Associates in Kansas City MO at 09:00",
|
||||||
|
"Warehouse Associates",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"looking style — added in real_003b",
|
||||||
|
"Looking for 1 Forklift Operator at Beacon Freight in Detroit MI for 15:00 shift",
|
||||||
|
"Forklift Operator",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"looking with multi-word role + 4-digit count",
|
||||||
|
"Looking for 1234 Senior Production Supervisors at Heritage Foods in Flint MI for 08:30 shift",
|
||||||
|
"Senior Production Supervisors",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"shorthand — known limitation, returns empty",
|
||||||
|
"1 Forklift Operator Detroit MI 15:00 Beacon Freight",
|
||||||
|
"",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"shorthand multi-word city — also empty",
|
||||||
|
"5 Warehouse Associates Kansas City MO 09:00 Parallel Machining",
|
||||||
|
"",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"lift-suite multi-constraint — no clean role, returns empty",
|
||||||
|
"Forklift operator with OSHA-30, warehouse experience, day shift availability",
|
||||||
|
"",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"OOD honesty signal — lift-suite, returns empty",
|
||||||
|
"Dental hygienist with three years experience, Indianapolis area",
|
||||||
|
"",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
for _, c := range cases {
|
||||||
|
t.Run(c.name, func(t *testing.T) {
|
||||||
|
got := extractRoleFromNeed(c.query)
|
||||||
|
if got != c.want {
|
||||||
|
t.Errorf("extractRoleFromNeed(%q) = %q, want %q", c.query, got, c.want)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
54
tests/reality/real_coord_queries_v2.txt
Normal file
54
tests/reality/real_coord_queries_v2.txt
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
# Real-shape coordinator queries — generated from fill_events.parquet
|
||||||
|
# (real-shape demand data; queries built mechanically from event rows).
|
||||||
|
# Source: /home/profit/lakehouse/data/datasets/fill_events.parquet (123 rows total, 10 emitted, styles=[need client_first looking shorthand])
|
||||||
|
#
|
||||||
|
# Styles:
|
||||||
|
# need: 'Need N {role}{s} in {city} {state} starting at {at} for {client}'
|
||||||
|
# — matches scripts/playbook_lift's extractRoleFromNeed regex
|
||||||
|
# client_first: '{client} needs N {role}{s} in {city} {state} at {at}'
|
||||||
|
# looking: 'Looking for N {role}{s} at {client} in {city} {state} for {at} shift'
|
||||||
|
# shorthand: 'N {role}{s} {city} {state} {at} {client}'
|
||||||
|
#
|
||||||
|
# Only 'need' currently extracts a role. The other three test the
|
||||||
|
# substrate's bleed behavior when the role gate is silently disabled.
|
||||||
|
|
||||||
|
Need 5 Warehouse Associates in Kansas City MO starting at 09:00 for Parallel Machining
|
||||||
|
Parallel Machining needs 5 Warehouse Associates in Kansas City MO at 09:00
|
||||||
|
Looking for 5 Warehouse Associates at Parallel Machining in Kansas City MO for 09:00 shift
|
||||||
|
5 Warehouse Associates Kansas City MO 09:00 Parallel Machining
|
||||||
|
Need 1 Forklift Operator in Detroit MI starting at 15:00 for Beacon Freight, deadline 2026-05-28
|
||||||
|
Beacon Freight needs 1 Forklift Operator in Detroit MI at 15:00
|
||||||
|
Looking for 1 Forklift Operator at Beacon Freight in Detroit MI for 15:00 shift
|
||||||
|
1 Forklift Operator Detroit MI 15:00 Beacon Freight
|
||||||
|
Need 4 Loaders in Indianapolis IN starting at 12:00 for Midway Distribution
|
||||||
|
Midway Distribution needs 4 Loaders in Indianapolis IN at 12:00
|
||||||
|
Looking for 4 Loaders at Midway Distribution in Indianapolis IN for 12:00 shift
|
||||||
|
4 Loaders Indianapolis IN 12:00 Midway Distribution
|
||||||
|
Need 3 Warehouse Associates in Fort Wayne IN starting at 17:30 for Cornerstone Fabrication, deadline 2026-05-17
|
||||||
|
Cornerstone Fabrication needs 3 Warehouse Associates in Fort Wayne IN at 17:30
|
||||||
|
Looking for 3 Warehouse Associates at Cornerstone Fabrication in Fort Wayne IN for 17:30 shift
|
||||||
|
3 Warehouse Associates Fort Wayne IN 17:30 Cornerstone Fabrication
|
||||||
|
Need 4 Pickers in Detroit MI starting at 13:30 for Beacon Freight, deadline 2026-05-28
|
||||||
|
Beacon Freight needs 4 Pickers in Detroit MI at 13:30
|
||||||
|
Looking for 4 Pickers at Beacon Freight in Detroit MI for 13:30 shift
|
||||||
|
4 Pickers Detroit MI 13:30 Beacon Freight
|
||||||
|
Need 2 Packers in Joliet IL starting at 09:30 for Parallel Machining
|
||||||
|
Parallel Machining needs 2 Packers in Joliet IL at 09:30
|
||||||
|
Looking for 2 Packers at Parallel Machining in Joliet IL for 09:30 shift
|
||||||
|
2 Packers Joliet IL 09:30 Parallel Machining
|
||||||
|
Need 3 Assemblers in Flint MI starting at 08:30 for Heritage Foods
|
||||||
|
Heritage Foods needs 3 Assemblers in Flint MI at 08:30
|
||||||
|
Looking for 3 Assemblers at Heritage Foods in Flint MI for 08:30 shift
|
||||||
|
3 Assemblers Flint MI 08:30 Heritage Foods
|
||||||
|
Need 3 Packers in Flint MI starting at 12:30 for Parallel Machining
|
||||||
|
Parallel Machining needs 3 Packers in Flint MI at 12:30
|
||||||
|
Looking for 3 Packers at Parallel Machining in Flint MI for 12:30 shift
|
||||||
|
3 Packers Flint MI 12:30 Parallel Machining
|
||||||
|
Need 1 Shipping Clerk in Flint MI starting at 17:00 for Pioneer Assembly
|
||||||
|
Pioneer Assembly needs 1 Shipping Clerk in Flint MI at 17:00
|
||||||
|
Looking for 1 Shipping Clerk at Pioneer Assembly in Flint MI for 17:00 shift
|
||||||
|
1 Shipping Clerk Flint MI 17:00 Pioneer Assembly
|
||||||
|
Need 1 CNC Operator in Detroit MI starting at 17:30 for Beacon Freight, deadline 2026-05-28
|
||||||
|
Beacon Freight needs 1 CNC Operator in Detroit MI at 17:30
|
||||||
|
Looking for 1 CNC Operator at Beacon Freight in Detroit MI for 17:30 shift
|
||||||
|
1 CNC Operator Detroit MI 17:30 Beacon Freight
|
||||||
Loading…
x
Reference in New Issue
Block a user