# Reality test real_002 — cross-role bleed fix verification Same query set as real_001 (10 queries from `fill_events.parquet`), re-run after the role-scoped playbook gate landed in `internal/matrix/playbook.go`. The gate adds a `Role` field to `PlaybookEntry` + a `QueryRole` field to `SearchRequest`; both `ApplyPlaybookBoost` (Shape A) and `InjectPlaybookMisses` (Shape B) reject playbook hits when both sides set Role and they differ. Companion to `playbook_lift_real_002.md` (auto-generated harness output) and `real_001_findings.md` (the original bleed finding). ## Headline The bleed is closed. Compare real_001 (no gate) vs real_002 (with gate): | Query | real_001 warm-top-1 | real_002 warm-top-1 | Status | |---|---|---|---| | Forklift Operator @ Beacon Freight Detroit | `e-6193` (lifted) | `w-3098` (different worker) | Same-role lift still works | | Pickers @ Beacon Freight Detroit | **`e-6193`** (cross-role bleed) | `e-8499` (cold-pass top-1) | **Bleed closed** | | CNC Operator @ Beacon Freight Detroit | **`e-6193`** (cross-role bleed) | `w-2404` (cold-pass top-1) | **Bleed closed** | Both previously-bled queries now keep their cold-pass top-1 because the role-gated playbook entry doesn't fire on different-role queries. ## Aggregate metrics | Metric | real_001 (no gate) | real_002 (gate) | |---|---:|---:| | Discoveries (judge-best ≠ cold top-1) | 2 | 2 | | Lifts (recorded → top-1) | 2 | 2 | | Boosted count (Shape A fires) | 2 | 2 | | Mean Δ top-1 distance | -0.127 | -0.040 | The `meanΔdist` magnitude shrinking by 3× is the fingerprint of the fix working. Previously boosts were firing too widely and pulling distances through the floor on cross-role mismatches; now they only fire on legitimate intra-role matches, so the average shift is smaller and more honest. Critically, **discovery + lift counts didn't drop** — the gate doesn't reject legitimate same-role lifts. Forklift Q#2 still recorded a playbook and Q#9 (Shipping Clerk) still discovered + lifted. Cross-role bleed protection doesn't erase same-role learning. ## How the gate fires (concrete trace) real_002's harness extracts role mechanically from the query prefix: | Query | Extracted role | |---|---| | `Need 1 Forklift Operator in Detroit MI ...` | `Forklift Operator` | | `Need 4 Pickers in Detroit MI ...` | `Pickers` | | `Need 1 CNC Operator in Detroit MI ...` | `CNC Operator` | The recorded playbook entry from Q#2 carries `Role: "Forklift Operator"`. At retrieve time: - Q#5 query has `query_role: "Pickers"` → `roleEqual("Pickers", "Forklift Operator")` returns false → `InjectPlaybookMisses` rejects the candidate before it can fire. - Q#10 query has `query_role: "CNC Operator"` → same path, same rejection. `roleEqual` normalization handles case + trailing-s plurals, so the recorded singular "Forklift Operator" matches a query for "Forklift Operators" without a separate rule. ## What remains the same - Free-form queries with no extractable role (lift suite multi- constraint queries) → empty role → gate disabled → behavior unchanged. This is intentional: the gate is opt-in via caller-supplied role, not a global filter. - Shape A boost on intra-role matches is unaffected. The 2 lifts in real_002 are real Shape A wins on same-role same-cluster queries. ## Repro ```bash # Same query set as real_001 go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt # Run with role-aware harness QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_002 \ WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh ``` Evidence: `reports/reality-tests/playbook_lift_real_002.{json,md}`. ## Test coverage `internal/matrix/playbook_test.go` adds 5 new tests: - `TestInjectPlaybookMisses_RoleGateRejectsCrossRole` — Q#10 scenario - `TestInjectPlaybookMisses_RoleGateAllowsSameRole` — same-role lift - `TestInjectPlaybookMisses_RoleGateBackwardCompat` — empty Role disables gate (both directions) - `TestApplyPlaybookBoost_RoleGateRejectsCrossRole` — Shape A defense in depth - `TestRoleEqual_PluralAndCase` — normalization edge cases The cross-role rejection tests use the exact distance + role values from the real_001 finding (distance 0.135, recorded role "Forklift Operator", query role "CNC Operator"), so a regression that re-opens the bleed would fail at unit-test level before any reality test runs.