golangLAKEHOUSE

History

root cca32344f3 reality_test real_005: negation probe — substrate gap is correctly out-of-scope

5 explicit-negation queries ("Need Forklift Operators in Aurora IL,
NOT in Detroit", "excluding Cornerstone Fabrication roster", etc.)
through the standard playbook_lift harness. Goal: characterize
whether the substrate has negation handling or silently treats
"NOT X" as "X".

Headline: substrate has zero negation handling. Cosine on dense
embeddings tokenizes "NOT in Detroit" identical to "in Detroit"
plus noise — there is no logical-quantifier representation in the
embedding space. This is a structural property of dense embeddings,
not a substrate bug.

Per-query observations:
- Q1 (Aurora IL, NOT Detroit): all top-10 rated 1-2/5 by judge
- Q2 (NOT Beacon Freight): top-1 rated 4/5 — accidentally OK
  because role+city signal pulled non-Beacon worker naturally
- Q3 (excluding Cornerstone): unanimous 1/5 across top-10
- Q4 (NOT Detroit-area): all top-10 rated 1-2/5
- Q5 (exclude Heritage Foods): top-1 rated 4/5 — accidentally OK

The judge IS the safety net: when retrieval can't honor the
constraint, the judge refuses to approve any result. That's the
honesty signal — `discovery=0` for the run aggregates it.

No code change. The architectural answer for production is:
- UI surfaces an "exclude" affordance that populates ExcludeIDs
  (already supported, added in multi-coord stress 200-worker swap)
- Coordinators don't type natural-language negation — they click
- Substrate's role: surface honesty signal (judge ratings) + don't
  pretend to honor unparseable constraints

Adding NL-negation handling at the substrate level would be product
debt — it would let coordinators type sloppier queries that
silently fail when the LLM extractor misses a phrasing. Don't ship
until production traffic demonstrates demand for it.

Findings: reports/reality-tests/real_005_findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 23:06:06 -05:00

contracts

multi-coord stress Phase 1.5: shared-role contracts + paraphrase handover

2026-04-30 08:03:16 -05:00

negation_queries.txt

reality_test real_005: negation probe — substrate gap is correctly out-of-scope

2026-04-30 23:06:06 -05:00

playbook_lift_queries.txt

playbook_lift: harness expansion + reality test #001 (7/8 lift, 87.5%)

2026-04-30 06:22:21 -05:00

real_coord_queries_v2.txt

reality_test real_003: 40-query paraphrase stress + extractor extension

2026-04-30 21:42:02 -05:00

real_coord_queries.txt

reality_test real_001: real-shape coordinator queries — surfaces cross-role bleed

2026-04-30 20:18:40 -05:00