golangLAKEHOUSE/tests/reality/negation_queries.txt
root cca32344f3 reality_test real_005: negation probe — substrate gap is correctly out-of-scope
5 explicit-negation queries ("Need Forklift Operators in Aurora IL,
NOT in Detroit", "excluding Cornerstone Fabrication roster", etc.)
through the standard playbook_lift harness. Goal: characterize
whether the substrate has negation handling or silently treats
"NOT X" as "X".

Headline: substrate has zero negation handling. Cosine on dense
embeddings tokenizes "NOT in Detroit" identical to "in Detroit"
plus noise — there is no logical-quantifier representation in the
embedding space. This is a structural property of dense embeddings,
not a substrate bug.

Per-query observations:
- Q1 (Aurora IL, NOT Detroit): all top-10 rated 1-2/5 by judge
- Q2 (NOT Beacon Freight): top-1 rated 4/5 — accidentally OK
  because role+city signal pulled non-Beacon worker naturally
- Q3 (excluding Cornerstone): unanimous 1/5 across top-10
- Q4 (NOT Detroit-area): all top-10 rated 1-2/5
- Q5 (exclude Heritage Foods): top-1 rated 4/5 — accidentally OK

The judge IS the safety net: when retrieval can't honor the
constraint, the judge refuses to approve any result. That's the
honesty signal — `discovery=0` for the run aggregates it.

No code change. The architectural answer for production is:
- UI surfaces an "exclude" affordance that populates ExcludeIDs
  (already supported, added in multi-coord stress 200-worker swap)
- Coordinators don't type natural-language negation — they click
- Substrate's role: surface honesty signal (judge ratings) + don't
  pretend to honor unparseable constraints

Adding NL-negation handling at the substrate level would be product
debt — it would let coordinators type sloppier queries that
silently fail when the LLM extractor misses a phrasing. Don't ship
until production traffic demonstrates demand for it.

Findings: reports/reality-tests/real_005_findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:06:06 -05:00

18 lines
972 B
Plaintext

# Negation reality-test queries — real_005
#
# Each query carries an explicit negation that should suppress some
# subset of workers/clients/cities. Cosine on dense embeddings has
# known weaknesses around negation: "NOT Detroit" still tokenizes
# "Detroit" and pulls Detroit-aligned vectors near. Without explicit
# negation handling, the system may silently surface exactly the
# entities the coordinator excluded.
#
# Test goal: characterize whether the substrate degrades silently
# (treats "NOT X" as "X") or surfaces an honesty signal.
Need 5 Forklift Operators in Aurora IL, NOT in Detroit (Detroit pool is reserved for another contract)
Need 3 Warehouse Associates, but NOT anyone from Beacon Freight
Looking for Pickers in Indianapolis, excluding the Cornerstone Fabrication roster
1 CNC Operator needed in Flint MI - we cannot use any Detroit-area workers due to non-compete
Need 2 Loaders in Joliet IL but exclude all currently-placed Heritage Foods workers