golangLAKEHOUSE/reports/reality-tests/real_002_findings.md

# Reality test real_002 — cross-role bleed fix verification

Same query set as real_001 (10 queries from `fill_events.parquet`),
re-run after the role-scoped playbook gate landed in
`internal/matrix/playbook.go`. The gate adds a `Role` field to
`PlaybookEntry` + a `QueryRole` field to `SearchRequest`; both
`ApplyPlaybookBoost` (Shape A) and `InjectPlaybookMisses` (Shape B)
reject playbook hits when both sides set Role and they differ.

Companion to `playbook_lift_real_002.md` (auto-generated harness
output) and `real_001_findings.md` (the original bleed finding).

## Headline

The bleed is closed. Compare real_001 (no gate) vs real_002 (with gate):

| Query | real_001 warm-top-1 | real_002 warm-top-1 | Status |
|---|---|---|---|
| Forklift Operator @ Beacon Freight Detroit | `e-6193` (lifted) | `w-3098` (different worker) | Same-role lift still works |
| Pickers @ Beacon Freight Detroit | **`e-6193`** (cross-role bleed) | `e-8499` (cold-pass top-1) | **Bleed closed** |
| CNC Operator @ Beacon Freight Detroit | **`e-6193`** (cross-role bleed) | `w-2404` (cold-pass top-1) | **Bleed closed** |

Both previously-bled queries now keep their cold-pass top-1 because
the role-gated playbook entry doesn't fire on different-role queries.

## Aggregate metrics

| Metric | real_001 (no gate) | real_002 (gate) |
|---|---:|---:|
| Discoveries (judge-best ≠ cold top-1) | 2 | 2 |
| Lifts (recorded → top-1) | 2 | 2 |
| Boosted count (Shape A fires) | 2 | 2 |
| Mean Δ top-1 distance | -0.127 | -0.040 |

The `meanΔdist` magnitude shrinking by 3× is the fingerprint of the
fix working. Previously boosts were firing too widely and pulling
distances through the floor on cross-role mismatches; now they only
fire on legitimate intra-role matches, so the average shift is
smaller and more honest.

Critically, **discovery + lift counts didn't drop** — the gate
doesn't reject legitimate same-role lifts. Forklift Q#2 still
recorded a playbook and Q#9 (Shipping Clerk) still discovered +
lifted. Cross-role bleed protection doesn't erase same-role learning.

## How the gate fires (concrete trace)

real_002's harness extracts role mechanically from the query prefix:

| Query | Extracted role |
|---|---|
| `Need 1 Forklift Operator in Detroit MI ...` | `Forklift Operator` |
| `Need 4 Pickers in Detroit MI ...` | `Pickers` |
| `Need 1 CNC Operator in Detroit MI ...` | `CNC Operator` |

The recorded playbook entry from Q#2 carries `Role: "Forklift Operator"`.
At retrieve time:

- Q#5 query has `query_role: "Pickers"` → `roleEqual("Pickers",
  "Forklift Operator")` returns false → `InjectPlaybookMisses`
  rejects the candidate before it can fire.
- Q#10 query has `query_role: "CNC Operator"` → same path, same
  rejection.

`roleEqual` normalization handles case + trailing-s plurals, so
the recorded singular "Forklift Operator" matches a query for
"Forklift Operators" without a separate rule.

## What remains the same

- Free-form queries with no extractable role (lift suite multi-
  constraint queries) → empty role → gate disabled → behavior
  unchanged. This is intentional: the gate is opt-in via
  caller-supplied role, not a global filter.
- Shape A boost on intra-role matches is unaffected. The 2 lifts
  in real_002 are real Shape A wins on same-role same-cluster queries.

## Repro

```bash
# Same query set as real_001
go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt

# Run with role-aware harness
QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_002 \
  WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh
```

Evidence: `reports/reality-tests/playbook_lift_real_002.{json,md}`.

## Test coverage

`internal/matrix/playbook_test.go` adds 5 new tests:

- `TestInjectPlaybookMisses_RoleGateRejectsCrossRole` — Q#10 scenario
- `TestInjectPlaybookMisses_RoleGateAllowsSameRole` — same-role lift
- `TestInjectPlaybookMisses_RoleGateBackwardCompat` — empty Role disables gate (both directions)
- `TestApplyPlaybookBoost_RoleGateRejectsCrossRole` — Shape A defense in depth
- `TestRoleEqual_PluralAndCase` — normalization edge cases

The cross-role rejection tests use the exact distance + role values
from the real_001 finding (distance 0.135, recorded role "Forklift
Operator", query role "CNC Operator"), so a regression that re-opens
the bleed would fail at unit-test level before any reality test runs.