real_001 surfaced same-client+city queries bleeding across roles:
Q#2 (Forklift Operator @ Beacon Freight Detroit) recorded e-6193
in the playbook corpus. Q#5 (Pickers same client+city) and Q#10
(CNC Operator same client+city) embedded within 0.13-0.18 cosine of
Q#2's query — well inside the 0.20 inject threshold — so e-6193
injected on both, demoting the cold-pass-correct workers.
Root cause: the inject distance threshold isn't tight enough on
the same-client+city cluster. Cosine collapses queries that share
city + client + count-token + time-token regardless of role. The
existing judge gate is per-injection at record time and doesn't
fire at retrieve time.
Fix: structural role gate in front of both Shape A boost and
Shape B inject. PlaybookEntry gains Role; SearchRequest gains
QueryRole. When both are non-empty and differ under roleEqual's
case+plural normalization, the entry is rejected before BoostFactor
or judge-gate logic runs.
Backward-compat: empty role on either side disables the gate —
preserves behavior for the lift suite's free-form multi-constraint
queries that have no clean single role. Caller-supplied (not
inferred), so existing recordings unaffected.
Wire-through:
- internal/matrix/playbook.go: Role field, NewPlaybookEntryWithRole,
roleEqual helper with plural+case normalization
- internal/matrix/retrieve.go: QueryRole on SearchRequest, threaded
to both ApplyPlaybookBoost + InjectPlaybookMisses
- cmd/matrixd/main.go: role on POST /matrix/playbooks/record + bulk
- scripts/playbook_lift/main.go: extractRoleFromNeed regex pulls
role from "Need N {role}{s} in" queries (the fill_events shape);
free-form queries fall back to empty (gate disabled)
Tests (5 new):
- TestInjectPlaybookMisses_RoleGateRejectsCrossRole: exact Q#10
scenario (distance 0.135, recorded "Forklift Operator", query
"CNC Operator") — locks the bleed at unit level
- TestInjectPlaybookMisses_RoleGateAllowsSameRole: Forklift Operator
recording fires on Forklift Operators query (plural normalization)
- TestInjectPlaybookMisses_RoleGateBackwardCompat: empty Role on
either side = gate disabled, preserves current behavior
- TestApplyPlaybookBoost_RoleGateRejectsCrossRole: Shape A defense
in depth — boost doesn't fire on cross-role even when answer is
in cold top-K
- TestRoleEqual_PluralAndCase: case + -s + -es plural normalization
Verification (real_002, same query set as real_001):
- Q#5 Pickers @ Beacon Freight: e-6193 → e-8499 (no bleed)
- Q#10 CNC Operator @ Beacon Freight: e-6193 → w-2404 (no bleed)
- Discoveries + lifts unchanged at 2 each (same-role lift still fires)
- Mean Δdist tightens from -0.127 to -0.040 (boosts no longer
pulling distances through the floor on cross-role mismatches)
Findings: reports/reality-tests/real_002_findings.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.3 KiB
Reality test real_002 — cross-role bleed fix verification
Same query set as real_001 (10 queries from fill_events.parquet),
re-run after the role-scoped playbook gate landed in
internal/matrix/playbook.go. The gate adds a Role field to
PlaybookEntry + a QueryRole field to SearchRequest; both
ApplyPlaybookBoost (Shape A) and InjectPlaybookMisses (Shape B)
reject playbook hits when both sides set Role and they differ.
Companion to playbook_lift_real_002.md (auto-generated harness
output) and real_001_findings.md (the original bleed finding).
Headline
The bleed is closed. Compare real_001 (no gate) vs real_002 (with gate):
| Query | real_001 warm-top-1 | real_002 warm-top-1 | Status |
|---|---|---|---|
| Forklift Operator @ Beacon Freight Detroit | e-6193 (lifted) |
w-3098 (different worker) |
Same-role lift still works |
| Pickers @ Beacon Freight Detroit | e-6193 (cross-role bleed) |
e-8499 (cold-pass top-1) |
Bleed closed |
| CNC Operator @ Beacon Freight Detroit | e-6193 (cross-role bleed) |
w-2404 (cold-pass top-1) |
Bleed closed |
Both previously-bled queries now keep their cold-pass top-1 because the role-gated playbook entry doesn't fire on different-role queries.
Aggregate metrics
| Metric | real_001 (no gate) | real_002 (gate) |
|---|---|---|
| Discoveries (judge-best ≠ cold top-1) | 2 | 2 |
| Lifts (recorded → top-1) | 2 | 2 |
| Boosted count (Shape A fires) | 2 | 2 |
| Mean Δ top-1 distance | -0.127 | -0.040 |
The meanΔdist magnitude shrinking by 3× is the fingerprint of the
fix working. Previously boosts were firing too widely and pulling
distances through the floor on cross-role mismatches; now they only
fire on legitimate intra-role matches, so the average shift is
smaller and more honest.
Critically, discovery + lift counts didn't drop — the gate doesn't reject legitimate same-role lifts. Forklift Q#2 still recorded a playbook and Q#9 (Shipping Clerk) still discovered + lifted. Cross-role bleed protection doesn't erase same-role learning.
How the gate fires (concrete trace)
real_002's harness extracts role mechanically from the query prefix:
| Query | Extracted role |
|---|---|
Need 1 Forklift Operator in Detroit MI ... |
Forklift Operator |
Need 4 Pickers in Detroit MI ... |
Pickers |
Need 1 CNC Operator in Detroit MI ... |
CNC Operator |
The recorded playbook entry from Q#2 carries Role: "Forklift Operator".
At retrieve time:
- Q#5 query has
query_role: "Pickers"→roleEqual("Pickers", "Forklift Operator")returns false →InjectPlaybookMissesrejects the candidate before it can fire. - Q#10 query has
query_role: "CNC Operator"→ same path, same rejection.
roleEqual normalization handles case + trailing-s plurals, so
the recorded singular "Forklift Operator" matches a query for
"Forklift Operators" without a separate rule.
What remains the same
- Free-form queries with no extractable role (lift suite multi- constraint queries) → empty role → gate disabled → behavior unchanged. This is intentional: the gate is opt-in via caller-supplied role, not a global filter.
- Shape A boost on intra-role matches is unaffected. The 2 lifts in real_002 are real Shape A wins on same-role same-cluster queries.
Repro
# Same query set as real_001
go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt
# Run with role-aware harness
QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_002 \
WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh
Evidence: reports/reality-tests/playbook_lift_real_002.{json,md}.
Test coverage
internal/matrix/playbook_test.go adds 5 new tests:
TestInjectPlaybookMisses_RoleGateRejectsCrossRole— Q#10 scenarioTestInjectPlaybookMisses_RoleGateAllowsSameRole— same-role liftTestInjectPlaybookMisses_RoleGateBackwardCompat— empty Role disables gate (both directions)TestApplyPlaybookBoost_RoleGateRejectsCrossRole— Shape A defense in depthTestRoleEqual_PluralAndCase— normalization edge cases
The cross-role rejection tests use the exact distance + role values from the real_001 finding (distance 0.135, recorded role "Forklift Operator", query role "CNC Operator"), so a regression that re-opens the bleed would fail at unit-test level before any reality test runs.