golangLAKEHOUSE/reports/reality-tests/real_002_findings.md
root 997527be4d matrix: cross-role playbook gate — closes real_001 bleed (OPEN #1)
real_001 surfaced same-client+city queries bleeding across roles:
Q#2 (Forklift Operator @ Beacon Freight Detroit) recorded e-6193
in the playbook corpus. Q#5 (Pickers same client+city) and Q#10
(CNC Operator same client+city) embedded within 0.13-0.18 cosine of
Q#2's query — well inside the 0.20 inject threshold — so e-6193
injected on both, demoting the cold-pass-correct workers.

Root cause: the inject distance threshold isn't tight enough on
the same-client+city cluster. Cosine collapses queries that share
city + client + count-token + time-token regardless of role. The
existing judge gate is per-injection at record time and doesn't
fire at retrieve time.

Fix: structural role gate in front of both Shape A boost and
Shape B inject. PlaybookEntry gains Role; SearchRequest gains
QueryRole. When both are non-empty and differ under roleEqual's
case+plural normalization, the entry is rejected before BoostFactor
or judge-gate logic runs.

Backward-compat: empty role on either side disables the gate —
preserves behavior for the lift suite's free-form multi-constraint
queries that have no clean single role. Caller-supplied (not
inferred), so existing recordings unaffected.

Wire-through:
- internal/matrix/playbook.go: Role field, NewPlaybookEntryWithRole,
  roleEqual helper with plural+case normalization
- internal/matrix/retrieve.go: QueryRole on SearchRequest, threaded
  to both ApplyPlaybookBoost + InjectPlaybookMisses
- cmd/matrixd/main.go: role on POST /matrix/playbooks/record + bulk
- scripts/playbook_lift/main.go: extractRoleFromNeed regex pulls
  role from "Need N {role}{s} in" queries (the fill_events shape);
  free-form queries fall back to empty (gate disabled)

Tests (5 new):
- TestInjectPlaybookMisses_RoleGateRejectsCrossRole: exact Q#10
  scenario (distance 0.135, recorded "Forklift Operator", query
  "CNC Operator") — locks the bleed at unit level
- TestInjectPlaybookMisses_RoleGateAllowsSameRole: Forklift Operator
  recording fires on Forklift Operators query (plural normalization)
- TestInjectPlaybookMisses_RoleGateBackwardCompat: empty Role on
  either side = gate disabled, preserves current behavior
- TestApplyPlaybookBoost_RoleGateRejectsCrossRole: Shape A defense
  in depth — boost doesn't fire on cross-role even when answer is
  in cold top-K
- TestRoleEqual_PluralAndCase: case + -s + -es plural normalization

Verification (real_002, same query set as real_001):
- Q#5 Pickers @ Beacon Freight: e-6193 → e-8499 (no bleed)
- Q#10 CNC Operator @ Beacon Freight: e-6193 → w-2404 (no bleed)
- Discoveries + lifts unchanged at 2 each (same-role lift still fires)
- Mean Δdist tightens from -0.127 to -0.040 (boosts no longer
  pulling distances through the floor on cross-role mismatches)

Findings: reports/reality-tests/real_002_findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:34:10 -05:00

4.3 KiB
Raw Permalink Blame History

Reality test real_002 — cross-role bleed fix verification

Same query set as real_001 (10 queries from fill_events.parquet), re-run after the role-scoped playbook gate landed in internal/matrix/playbook.go. The gate adds a Role field to PlaybookEntry + a QueryRole field to SearchRequest; both ApplyPlaybookBoost (Shape A) and InjectPlaybookMisses (Shape B) reject playbook hits when both sides set Role and they differ.

Companion to playbook_lift_real_002.md (auto-generated harness output) and real_001_findings.md (the original bleed finding).

Headline

The bleed is closed. Compare real_001 (no gate) vs real_002 (with gate):

Query real_001 warm-top-1 real_002 warm-top-1 Status
Forklift Operator @ Beacon Freight Detroit e-6193 (lifted) w-3098 (different worker) Same-role lift still works
Pickers @ Beacon Freight Detroit e-6193 (cross-role bleed) e-8499 (cold-pass top-1) Bleed closed
CNC Operator @ Beacon Freight Detroit e-6193 (cross-role bleed) w-2404 (cold-pass top-1) Bleed closed

Both previously-bled queries now keep their cold-pass top-1 because the role-gated playbook entry doesn't fire on different-role queries.

Aggregate metrics

Metric real_001 (no gate) real_002 (gate)
Discoveries (judge-best ≠ cold top-1) 2 2
Lifts (recorded → top-1) 2 2
Boosted count (Shape A fires) 2 2
Mean Δ top-1 distance -0.127 -0.040

The meanΔdist magnitude shrinking by 3× is the fingerprint of the fix working. Previously boosts were firing too widely and pulling distances through the floor on cross-role mismatches; now they only fire on legitimate intra-role matches, so the average shift is smaller and more honest.

Critically, discovery + lift counts didn't drop — the gate doesn't reject legitimate same-role lifts. Forklift Q#2 still recorded a playbook and Q#9 (Shipping Clerk) still discovered + lifted. Cross-role bleed protection doesn't erase same-role learning.

How the gate fires (concrete trace)

real_002's harness extracts role mechanically from the query prefix:

Query Extracted role
Need 1 Forklift Operator in Detroit MI ... Forklift Operator
Need 4 Pickers in Detroit MI ... Pickers
Need 1 CNC Operator in Detroit MI ... CNC Operator

The recorded playbook entry from Q#2 carries Role: "Forklift Operator". At retrieve time:

  • Q#5 query has query_role: "Pickers"roleEqual("Pickers", "Forklift Operator") returns false → InjectPlaybookMisses rejects the candidate before it can fire.
  • Q#10 query has query_role: "CNC Operator" → same path, same rejection.

roleEqual normalization handles case + trailing-s plurals, so the recorded singular "Forklift Operator" matches a query for "Forklift Operators" without a separate rule.

What remains the same

  • Free-form queries with no extractable role (lift suite multi- constraint queries) → empty role → gate disabled → behavior unchanged. This is intentional: the gate is opt-in via caller-supplied role, not a global filter.
  • Shape A boost on intra-role matches is unaffected. The 2 lifts in real_002 are real Shape A wins on same-role same-cluster queries.

Repro

# Same query set as real_001
go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt

# Run with role-aware harness
QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_002 \
  WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh

Evidence: reports/reality-tests/playbook_lift_real_002.{json,md}.

Test coverage

internal/matrix/playbook_test.go adds 5 new tests:

  • TestInjectPlaybookMisses_RoleGateRejectsCrossRole — Q#10 scenario
  • TestInjectPlaybookMisses_RoleGateAllowsSameRole — same-role lift
  • TestInjectPlaybookMisses_RoleGateBackwardCompat — empty Role disables gate (both directions)
  • TestApplyPlaybookBoost_RoleGateRejectsCrossRole — Shape A defense in depth
  • TestRoleEqual_PluralAndCase — normalization edge cases

The cross-role rejection tests use the exact distance + role values from the real_001 finding (distance 0.135, recorded role "Forklift Operator", query role "CNC Operator"), so a regression that re-opens the bleed would fail at unit-test level before any reality test runs.