First reality test driver. Two-pass design:
- Pass 1 (cold): matrix.search use_playbook=false → small-model judge
rates top-K → record playbook entry pointing at the highest-rated
result (which may NOT be top-1 by distance — that's the discovery).
- Pass 2 (warm): same queries with use_playbook=true → measure
ranking shift. Lift = real if recorded answer becomes top-1.
Files:
- scripts/playbook_lift/main.go driver (391 LoC)
- scripts/playbook_lift.sh stack-bring-up + report gen
- tests/reality/playbook_lift_queries.txt query corpus (5 placeholders;
J writes real 20+)
- reports/reality-tests/README.md framework + interpretation
- .gitignore track reports/reality-tests/
but ignore per-run JSON evidence
This answers the gate from project_small_model_pipeline_vision.md:
"the playbook + matrix indexer must give the results we're looking
for." Without ground-truth labels, the LLM judge is the proxy — the
same small-model thesis applied to evaluation. Honest about that
limitation in the generated reports.
Driver compiles clean; full run requires Ollama + workers/candidates
ingest. Skips cleanly if Ollama absent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19 lines
923 B
Plaintext
19 lines
923 B
Plaintext
# Playbook lift reality test — staffing query corpus.
|
|
#
|
|
# Each non-blank, non-comment line is one query. The harness will run
|
|
# each through matrix.search (cold pass, then warm pass with playbook),
|
|
# ask the LLM judge to rate top-K results, and record lift metrics.
|
|
#
|
|
# Goal: 20 queries, weighted toward the kinds of asks a staffing
|
|
# coordinator would actually issue. Specific roles + certifications +
|
|
# constraints surface playbook lift better than generic "find a worker"
|
|
# style queries.
|
|
#
|
|
# Placeholders (5) — J: replace + extend to 20+ for the real test.
|
|
|
|
Forklift operator with OSHA-30, warehouse experience, day shift availability
|
|
Bilingual customer service rep, Spanish + English, two years call-center experience
|
|
CDL Class A driver, clean record, willing to do regional 4-day routes
|
|
Production line supervisor with lean manufacturing background
|
|
Dental hygienist with three years experience, Indianapolis area
|