5 explicit-negation queries ("Need Forklift Operators in Aurora IL,
NOT in Detroit", "excluding Cornerstone Fabrication roster", etc.)
through the standard playbook_lift harness. Goal: characterize
whether the substrate has negation handling or silently treats
"NOT X" as "X".
Headline: substrate has zero negation handling. Cosine on dense
embeddings tokenizes "NOT in Detroit" identical to "in Detroit"
plus noise — there is no logical-quantifier representation in the
embedding space. This is a structural property of dense embeddings,
not a substrate bug.
Per-query observations:
- Q1 (Aurora IL, NOT Detroit): all top-10 rated 1-2/5 by judge
- Q2 (NOT Beacon Freight): top-1 rated 4/5 — accidentally OK
because role+city signal pulled non-Beacon worker naturally
- Q3 (excluding Cornerstone): unanimous 1/5 across top-10
- Q4 (NOT Detroit-area): all top-10 rated 1-2/5
- Q5 (exclude Heritage Foods): top-1 rated 4/5 — accidentally OK
The judge IS the safety net: when retrieval can't honor the
constraint, the judge refuses to approve any result. That's the
honesty signal — `discovery=0` for the run aggregates it.
No code change. The architectural answer for production is:
- UI surfaces an "exclude" affordance that populates ExcludeIDs
(already supported, added in multi-coord stress 200-worker swap)
- Coordinators don't type natural-language negation — they click
- Substrate's role: surface honesty signal (judge ratings) + don't
pretend to honor unparseable constraints
Adding NL-negation handling at the substrate level would be product
debt — it would let coordinators type sloppier queries that
silently fail when the LLM extractor misses a phrasing. Don't ship
until production traffic demonstrates demand for it.
Findings: reports/reality-tests/real_005_findings.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 lines
972 B
Plaintext
18 lines
972 B
Plaintext
# Negation reality-test queries — real_005
|
|
#
|
|
# Each query carries an explicit negation that should suppress some
|
|
# subset of workers/clients/cities. Cosine on dense embeddings has
|
|
# known weaknesses around negation: "NOT Detroit" still tokenizes
|
|
# "Detroit" and pulls Detroit-aligned vectors near. Without explicit
|
|
# negation handling, the system may silently surface exactly the
|
|
# entities the coordinator excluded.
|
|
#
|
|
# Test goal: characterize whether the substrate degrades silently
|
|
# (treats "NOT X" as "X") or surfaces an honesty signal.
|
|
|
|
Need 5 Forklift Operators in Aurora IL, NOT in Detroit (Detroit pool is reserved for another contract)
|
|
Need 3 Warehouse Associates, but NOT anyone from Beacon Freight
|
|
Looking for Pickers in Indianapolis, excluding the Cornerstone Fabrication roster
|
|
1 CNC Operator needed in Flint MI - we cannot use any Detroit-area workers due to non-compete
|
|
Need 2 Loaders in Joliet IL but exclude all currently-placed Heritage Foods workers
|