golangLAKEHOUSE/scripts/cutover/gen_real_queries.go
root 7f2f112e6a reality_test real_001: real-shape coordinator queries — surfaces cross-role bleed
First retrieval probe with non-synthetic query distribution. Pulls
N rows from /home/profit/lakehouse/data/datasets/fill_events.parquet
(real-shape demand data) and translates each to the natural language
a coordinator would type: "Need {count} {role}s in {city} {state}
starting at {at} for {client}".

Headline: 8/10 cold-pass top-1 = judge-best on real distribution.
Substrate works on queries it was never trained for. v2-moe + workers
corpus carry the load.

Surfaced finding (the real value of running this): same-client+city
queries cluster, and Shape A's distance boost bleeds across roles
within the cluster. Q#2 (Forklift @ Beacon Freight Detroit) records
e-6193 in the playbook corpus. Q#5 (Pickers same client+city) and
Q#10 (CNC Operator same client+city) inherit e-6193 at warm top-1
even though:
- Neither query has its own recorded playbook.
- Neither warm pass triggers a Shape B inject (boosted=0).
- The roles are different staffing categories.

Q#10 specifically demoted the cold-pass-correct w-3759 (judge rating
4 at rank 0) for a worker who was approved by the judge for a
different role on a different query.

Why the lift suite missed it: synthetic queries use 7 disjoint
scenario buckets (forklift+OSHA+WI / CDL+IL / etc.). Real demand
clusters on (client, city). The cluster doesn't exist in the
synthetic distribution.

Why the judge gate doesn't catch it: the gate (5a3364f) is
per-injection at record time. After approval the worker rides Shape A
distance boosts on all later same-cluster queries with no second
gate call.

Becomes new OPEN #1. Fix candidate: role-scoped playbook corpus
metadata + Shape A boost gate on role match. Cheap; doesn't need
new judge calls.

Files:
- scripts/cutover/gen_real_queries.go: parquet → coordinator NL
- tests/reality/real_coord_queries.txt: 10 generated queries
- reports/reality-tests/playbook_lift_real_001.md: harness output
- reports/reality-tests/real_001_findings.md: the reading

Repro:
  go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt
  QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_001 \
    WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:18:40 -05:00

101 lines
3.4 KiB
Go

// gen_real_queries — pull N rows from fill_events.parquet and translate
// each into a coordinator-style natural-language query.
//
// Output: one query per line, written to stdout (intended for redirect
// into tests/reality/real_coord_queries.txt and then fed to
// scripts/playbook_lift.sh as --queries=<path>).
//
// Why: the lift harness's standard query corpus is hand-crafted to
// stress multi-constraint matching. Real coordinator demand has a
// different distribution — single-role, single-geo, count + time —
// and we want to probe whether the substrate handles that shape too.
// The fill_events parquet on the Rust side is the closest thing to
// "real demand" we have on disk (123 rows, sourced from staffing
// fixture generation but shaped like genuine fills).
package main
import (
"context"
"flag"
"fmt"
"log"
"github.com/apache/arrow-go/v18/arrow/memory"
"github.com/apache/arrow-go/v18/parquet/file"
"github.com/apache/arrow-go/v18/parquet/pqarrow"
)
func main() {
src := flag.String("src", "/home/profit/lakehouse/data/datasets/fill_events.parquet", "fill_events parquet path")
limit := flag.Int("limit", 10, "number of queries to generate")
flag.Parse()
r, err := file.OpenParquetFile(*src, false)
if err != nil {
log.Fatalf("open %s: %v", *src, err)
}
defer r.Close()
pr, err := pqarrow.NewFileReader(r, pqarrow.ArrowReadProperties{}, memory.DefaultAllocator)
if err != nil {
log.Fatalf("pqarrow reader: %v", err)
}
tbl, err := pr.ReadTable(context.Background())
if err != nil {
log.Fatalf("read table: %v", err)
}
defer tbl.Release()
// Field order must match parquet schema (see scripts/cutover dev probe):
// 3=client, 5=city, 6=state, 7=role, 8=count, 10=at, 12=deadline.
client := tbl.Column(3).Data().Chunk(0)
city := tbl.Column(5).Data().Chunk(0)
state := tbl.Column(6).Data().Chunk(0)
role := tbl.Column(7).Data().Chunk(0)
count := tbl.Column(8).Data().Chunk(0)
at := tbl.Column(10).Data().Chunk(0)
deadline := tbl.Column(12).Data().Chunk(0)
n := int(tbl.NumRows())
if *limit < n {
n = *limit
}
fmt.Println("# Real-shape coordinator queries — generated from fill_events.parquet")
fmt.Println("# (real-shape demand data; queries built mechanically from event rows).")
fmt.Printf("# Source: %s (%d rows total, %d emitted)\n", *src, tbl.NumRows(), n)
fmt.Println("#")
fmt.Println("# Format: client + count + role + city/state + start time +")
fmt.Println("# (optional deadline). Mimics the natural language a coordinator would")
fmt.Println("# type into a dispatch tool when triaging the next-up demand.")
fmt.Println()
for i := 0; i < n; i++ {
c := client.ValueStr(i)
cy := city.ValueStr(i)
st := state.ValueStr(i)
ro := role.ValueStr(i)
ct := count.ValueStr(i)
t := at.ValueStr(i)
dl := deadline.ValueStr(i)
// Phrase one is the urgent ask; phrase two is the natural rephrase
// a coordinator might use when typing fast. Different syntax,
// same intent — exercises the embedder's paraphrase tolerance.
q := fmt.Sprintf("Need %s %s in %s %s starting at %s for %s", ct, pluralize(ro, ct), cy, st, t, c)
if dl != "" && dl != "(null)" {
q += fmt.Sprintf(", deadline %s", dl)
}
fmt.Println(q)
}
}
func pluralize(role, count string) string {
if count == "1" {
return role
}
// "Warehouse Associate" → "Warehouse Associates"; "Loader" → "Loaders".
// Naive but fits the staffing-domain vocabulary in fill_events.
return role + "s"
}