golangLAKEHOUSE

profit/golangLAKEHOUSE

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	eb4308d8fd	real_006 diagnosis: Q43 leak is cross-city, not cross-role Traced w-279's path through the substrate. The leak source is Q49 (Packers in Indianapolis IN for Midway Distribution), NOT Q18 as the initial reading suspected. Q49 recorded w-279 with role=Packers, client=Midway Distribution, city=Indianapolis. Q43 (Packer in Chicago IL for Midway Distribution) ran later. roleEqual("Packer","Packers") → both normalize to "packer" → role gate passes (correctly, by design — they ARE the same role under plural-strip). Cosine distance between Q49's recorded query and Q43's query is small enough to fit inside the 0.20 inject threshold because role + client + count + time-token dominate the embedding (only the city and singular/plural noun differ). Inject fires, w-279 surfaces at Q43's warm top-1 in Chicago, judge correctly rates 2/5 — wrong city. The role gate IS working. What's missing is a CITY gate. Real_002's fix targeted cross-role bleed (Forklift → CNC). real_006 surfaced cross-city bleed within same role + same client — a hole prior tests structurally couldn't reach because they all sourced from rows 0-9 where no such pair existed. Concrete fix surface documented (1 new field, 2 gate checks, 1 regex, ~5 tests). Half a session of work, same shape as real_002. Not implementing tonight — diagnosis only. The 18 unit-level role-gate tests still pass, confirming the gate is doing what it was specified to do. The bug is a missing specification, not a broken implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 05:13:19 -05:00
root	95f155b017	real_006: distribution-shift test on rows 10-59 of fill_events Methodology fix: gen_real_queries.go gains -offset N flag. Every prior real_NNN test sourced queries from rows 0-9 of fill_events.parquet (default -limit 10), so the substrate's published "8/10 cold-pass top-1 = judge-best" was measured on a memorized slice, not held-out data. real_006 samples 50 fresh rows (offset 10, never seen by the workers or ethereal_workers corpora). Same harness, same local qwen2.5:latest judge, same K=10. ~14 min wall total. Local-only, no cloud calls. Headline findings: - Cold-pass top-1 = judge-best (rank match): 41/50 (82%) vs real_001's 8/10 (80%) — substrate generalizes at rank level. - Strict (rating ≥ 2): 34/50 (68%) — 12-point drop from real_001's 80%. ~7 of 41 "no-discovery" queries had cold top-1 the judge rated 1; the corpus has gaps for some role-city combos in the v3 slice. - Verbatim lift: 9/9 discoveries → warm top-1 (clean, matches real_001 2/2) - Paraphrase recovery: 6/9 → top-1, 9/9 any-rank - Quality regressed: 3/50 — Q43 is the structural one Q43 (Packer at Midway Distribution / Chicago IL) regressed from rating 5 to rating 2 on warm pass with `warm_boosted_count=0` and `playbook_recorded=false`. Q18 (Shipping Clerks at the same client+city) recorded a playbook entry. The regression suggests Q18's recording leaked into Q43 via the warm-pass playbook corpus retrieval surface even though the role gate from real_002 should have blocked it. Three possible paths: extractor failed on one query, gate fires on boost path but not Shape B inject, or cosine drift puts the recorded worker close enough to Q43's embedding that warm-pass retrieval picks it up directly. Diagnosis is the next move. Three same-(client, city) clusters tested: - Heritage Foods Gary IN × 3 distinct roles: clean, distinct workers - Riverfront Steel Columbus OH × 4: cosine-level confusion (Q9/Q25 surface same worker w-281 for Assemblers vs Quality Techs at cold- pass), but no playbook bleed - Midway Distribution Chicago IL × 3: Q43 regression as above What this confirms: substrate works on the fresh distribution at the rank level, verbatim lift is real, paraphrase recovery is real. What this falsifies: real_002's role-gate fix is not structurally airtight. The bleed pattern can still fire under conditions the prior tests didn't reach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 04:54:03 -05:00

Author

SHA1

Message

Date

root

eb4308d8fd

real_006 diagnosis: Q43 leak is cross-city, not cross-role

Traced w-279's path through the substrate. The leak source is Q49
(Packers in Indianapolis IN for Midway Distribution), NOT Q18 as
the initial reading suspected.

Q49 recorded w-279 with role=Packers, client=Midway Distribution,
city=Indianapolis. Q43 (Packer in Chicago IL for Midway Distribution)
ran later. roleEqual("Packer","Packers") → both normalize to
"packer" → role gate passes (correctly, by design — they ARE the
same role under plural-strip). Cosine distance between Q49's
recorded query and Q43's query is small enough to fit inside the
0.20 inject threshold because role + client + count + time-token
dominate the embedding (only the city and singular/plural noun
differ). Inject fires, w-279 surfaces at Q43's warm top-1 in
Chicago, judge correctly rates 2/5 — wrong city.

The role gate IS working. What's missing is a CITY gate. Real_002's
fix targeted cross-role bleed (Forklift → CNC). real_006 surfaced
cross-city bleed within same role + same client — a hole prior
tests structurally couldn't reach because they all sourced from
rows 0-9 where no such pair existed.

Concrete fix surface documented (1 new field, 2 gate checks, 1
regex, ~5 tests). Half a session of work, same shape as real_002.
Not implementing tonight — diagnosis only.

The 18 unit-level role-gate tests still pass, confirming the gate
is doing what it was specified to do. The bug is a missing
specification, not a broken implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 05:13:19 -05:00

root

95f155b017

real_006: distribution-shift test on rows 10-59 of fill_events

Methodology fix: gen_real_queries.go gains -offset N flag. Every prior
real_NNN test sourced queries from rows 0-9 of fill_events.parquet
(default -limit 10), so the substrate's published "8/10 cold-pass top-1
= judge-best" was measured on a memorized slice, not held-out data.

real_006 samples 50 fresh rows (offset 10, never seen by the workers
or ethereal_workers corpora). Same harness, same local qwen2.5:latest
judge, same K=10. ~14 min wall total. Local-only, no cloud calls.

Headline findings:

- Cold-pass top-1 = judge-best (rank match): 41/50 (82%) vs real_001's
  8/10 (80%) — substrate generalizes at rank level.
- Strict (rating ≥ 2): 34/50 (68%) — 12-point drop from real_001's
  80%. ~7 of 41 "no-discovery" queries had cold top-1 the judge rated
  1; the corpus has gaps for some role-city combos in the v3 slice.
- Verbatim lift: 9/9 discoveries → warm top-1 (clean, matches real_001 2/2)
- Paraphrase recovery: 6/9 → top-1, 9/9 any-rank
- Quality regressed: 3/50 — Q43 is the structural one

Q43 (Packer at Midway Distribution / Chicago IL) regressed from
rating 5 to rating 2 on warm pass with `warm_boosted_count=0` and
`playbook_recorded=false`. Q18 (Shipping Clerks at the same client+city)
recorded a playbook entry. The regression suggests Q18's recording
leaked into Q43 via the warm-pass playbook corpus retrieval surface
even though the role gate from real_002 should have blocked it.
Three possible paths: extractor failed on one query, gate fires on
boost path but not Shape B inject, or cosine drift puts the recorded
worker close enough to Q43's embedding that warm-pass retrieval picks
it up directly. Diagnosis is the next move.

Three same-(client, city) clusters tested:
- Heritage Foods Gary IN × 3 distinct roles: clean, distinct workers
- Riverfront Steel Columbus OH × 4: cosine-level confusion (Q9/Q25
  surface same worker w-281 for Assemblers vs Quality Techs at cold-
  pass), but no playbook bleed
- Midway Distribution Chicago IL × 3: Q43 regression as above

What this confirms: substrate works on the fresh distribution at the
rank level, verbatim lift is real, paraphrase recovery is real.

What this falsifies: real_002's role-gate fix is not structurally
airtight. The bleed pattern can still fire under conditions the
prior tests didn't reach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 04:54:03 -05:00

2 Commits