golangLAKEHOUSE/reports/reality-tests/real_002_findings.md
root 997527be4d matrix: cross-role playbook gate — closes real_001 bleed (OPEN #1)
real_001 surfaced same-client+city queries bleeding across roles:
Q#2 (Forklift Operator @ Beacon Freight Detroit) recorded e-6193
in the playbook corpus. Q#5 (Pickers same client+city) and Q#10
(CNC Operator same client+city) embedded within 0.13-0.18 cosine of
Q#2's query — well inside the 0.20 inject threshold — so e-6193
injected on both, demoting the cold-pass-correct workers.

Root cause: the inject distance threshold isn't tight enough on
the same-client+city cluster. Cosine collapses queries that share
city + client + count-token + time-token regardless of role. The
existing judge gate is per-injection at record time and doesn't
fire at retrieve time.

Fix: structural role gate in front of both Shape A boost and
Shape B inject. PlaybookEntry gains Role; SearchRequest gains
QueryRole. When both are non-empty and differ under roleEqual's
case+plural normalization, the entry is rejected before BoostFactor
or judge-gate logic runs.

Backward-compat: empty role on either side disables the gate —
preserves behavior for the lift suite's free-form multi-constraint
queries that have no clean single role. Caller-supplied (not
inferred), so existing recordings unaffected.

Wire-through:
- internal/matrix/playbook.go: Role field, NewPlaybookEntryWithRole,
  roleEqual helper with plural+case normalization
- internal/matrix/retrieve.go: QueryRole on SearchRequest, threaded
  to both ApplyPlaybookBoost + InjectPlaybookMisses
- cmd/matrixd/main.go: role on POST /matrix/playbooks/record + bulk
- scripts/playbook_lift/main.go: extractRoleFromNeed regex pulls
  role from "Need N {role}{s} in" queries (the fill_events shape);
  free-form queries fall back to empty (gate disabled)

Tests (5 new):
- TestInjectPlaybookMisses_RoleGateRejectsCrossRole: exact Q#10
  scenario (distance 0.135, recorded "Forklift Operator", query
  "CNC Operator") — locks the bleed at unit level
- TestInjectPlaybookMisses_RoleGateAllowsSameRole: Forklift Operator
  recording fires on Forklift Operators query (plural normalization)
- TestInjectPlaybookMisses_RoleGateBackwardCompat: empty Role on
  either side = gate disabled, preserves current behavior
- TestApplyPlaybookBoost_RoleGateRejectsCrossRole: Shape A defense
  in depth — boost doesn't fire on cross-role even when answer is
  in cold top-K
- TestRoleEqual_PluralAndCase: case + -s + -es plural normalization

Verification (real_002, same query set as real_001):
- Q#5 Pickers @ Beacon Freight: e-6193 → e-8499 (no bleed)
- Q#10 CNC Operator @ Beacon Freight: e-6193 → w-2404 (no bleed)
- Discoveries + lifts unchanged at 2 each (same-role lift still fires)
- Mean Δdist tightens from -0.127 to -0.040 (boosts no longer
  pulling distances through the floor on cross-role mismatches)

Findings: reports/reality-tests/real_002_findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:34:10 -05:00

105 lines
4.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reality test real_002 — cross-role bleed fix verification
Same query set as real_001 (10 queries from `fill_events.parquet`),
re-run after the role-scoped playbook gate landed in
`internal/matrix/playbook.go`. The gate adds a `Role` field to
`PlaybookEntry` + a `QueryRole` field to `SearchRequest`; both
`ApplyPlaybookBoost` (Shape A) and `InjectPlaybookMisses` (Shape B)
reject playbook hits when both sides set Role and they differ.
Companion to `playbook_lift_real_002.md` (auto-generated harness
output) and `real_001_findings.md` (the original bleed finding).
## Headline
The bleed is closed. Compare real_001 (no gate) vs real_002 (with gate):
| Query | real_001 warm-top-1 | real_002 warm-top-1 | Status |
|---|---|---|---|
| Forklift Operator @ Beacon Freight Detroit | `e-6193` (lifted) | `w-3098` (different worker) | Same-role lift still works |
| Pickers @ Beacon Freight Detroit | **`e-6193`** (cross-role bleed) | `e-8499` (cold-pass top-1) | **Bleed closed** |
| CNC Operator @ Beacon Freight Detroit | **`e-6193`** (cross-role bleed) | `w-2404` (cold-pass top-1) | **Bleed closed** |
Both previously-bled queries now keep their cold-pass top-1 because
the role-gated playbook entry doesn't fire on different-role queries.
## Aggregate metrics
| Metric | real_001 (no gate) | real_002 (gate) |
|---|---:|---:|
| Discoveries (judge-best ≠ cold top-1) | 2 | 2 |
| Lifts (recorded → top-1) | 2 | 2 |
| Boosted count (Shape A fires) | 2 | 2 |
| Mean Δ top-1 distance | -0.127 | -0.040 |
The `meanΔdist` magnitude shrinking by 3× is the fingerprint of the
fix working. Previously boosts were firing too widely and pulling
distances through the floor on cross-role mismatches; now they only
fire on legitimate intra-role matches, so the average shift is
smaller and more honest.
Critically, **discovery + lift counts didn't drop** — the gate
doesn't reject legitimate same-role lifts. Forklift Q#2 still
recorded a playbook and Q#9 (Shipping Clerk) still discovered +
lifted. Cross-role bleed protection doesn't erase same-role learning.
## How the gate fires (concrete trace)
real_002's harness extracts role mechanically from the query prefix:
| Query | Extracted role |
|---|---|
| `Need 1 Forklift Operator in Detroit MI ...` | `Forklift Operator` |
| `Need 4 Pickers in Detroit MI ...` | `Pickers` |
| `Need 1 CNC Operator in Detroit MI ...` | `CNC Operator` |
The recorded playbook entry from Q#2 carries `Role: "Forklift Operator"`.
At retrieve time:
- Q#5 query has `query_role: "Pickers"` → `roleEqual("Pickers",
"Forklift Operator")` returns false → `InjectPlaybookMisses`
rejects the candidate before it can fire.
- Q#10 query has `query_role: "CNC Operator"` → same path, same
rejection.
`roleEqual` normalization handles case + trailing-s plurals, so
the recorded singular "Forklift Operator" matches a query for
"Forklift Operators" without a separate rule.
## What remains the same
- Free-form queries with no extractable role (lift suite multi-
constraint queries) → empty role → gate disabled → behavior
unchanged. This is intentional: the gate is opt-in via
caller-supplied role, not a global filter.
- Shape A boost on intra-role matches is unaffected. The 2 lifts
in real_002 are real Shape A wins on same-role same-cluster queries.
## Repro
```bash
# Same query set as real_001
go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt
# Run with role-aware harness
QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_002 \
WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh
```
Evidence: `reports/reality-tests/playbook_lift_real_002.{json,md}`.
## Test coverage
`internal/matrix/playbook_test.go` adds 5 new tests:
- `TestInjectPlaybookMisses_RoleGateRejectsCrossRole` — Q#10 scenario
- `TestInjectPlaybookMisses_RoleGateAllowsSameRole` — same-role lift
- `TestInjectPlaybookMisses_RoleGateBackwardCompat` — empty Role disables gate (both directions)
- `TestApplyPlaybookBoost_RoleGateRejectsCrossRole` — Shape A defense in depth
- `TestRoleEqual_PluralAndCase` — normalization edge cases
The cross-role rejection tests use the exact distance + role values
from the real_001 finding (distance 0.135, recorded role "Forklift
Operator", query role "CNC Operator"), so a regression that re-opens
the bleed would fail at unit-test level before any reality test runs.