golangLAKEHOUSE

Author	SHA1	Message	Date
root	b314ed1c94	parity: /v1/embed cross-runtime probe (5th probe, 8/8 cosine match) Today's sidecar drop (lakehouse ba928b1) changed Rust's embed transport from gateway → sidecar → Ollama (2 hops) to gateway → Ollama directly. Go's embedd has always been direct. A drift here would mean: same query, different vector → different HNSW top-K → different staffing recommendations. This probe is the regression gate for that surface. Fixtures cover staffing-domain shapes (forklift, welder, OSHA, dental, CNC) plus stress shapes (unicode "Café résumé ⭐ 你好", single char "x", 200-word long fixture). Match metric: cosine similarity ≥ 0.99999. Byte-equal isn't expected — Go round-trips through []float32 internally while Rust stays at Vec<f64>, so JSON serialization introduces small float drift. What matters operationally is vector direction (HNSW uses cosine distance), and both runtimes preserve it when calling the same Ollama with the same model. Result: 8/8 fixtures match including the long + unicode cases. Sidecar drop didn't disturb the embed surface. The probe also forces both endpoints to use `nomic-embed-text` so the v1-vs-v2-moe default difference doesn't pollute the comparison. 5th cross-runtime parity probe joining the family: - validator_parity (6/6) - extract_json_parity (12/12) - session_log_parity (4/4) - materializer_parity (2/2) - embed_parity (8/8) — this commit Cumulative: 32/32 parity assertions across 5 probes covering HTTP shape (validator, embed), CLI output (materializer), unit behavior (extract_json), and persisted shape (session_log). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 06:28:40 -05:00
root	fa4e1b4e16	parity: session_log probe + Rust observability parity recorded Companion to lakehouse commit 57bde63 (Rust gateway gains trace-id propagation + coordinator session JSONL). The cross-runtime parity probe is the regression gate that prevents silent schema drift between the two runtimes. scripts/cutover/parity/session_log_parity.sh: - 4 fixtures (accepted_grounded, max_iter_exhausted, infra_error, unicode_in_prompt) feed identical input to both helpers - jq -e validity gate + non-trivial-equal guard prevents the "both sides fail identically → spurious match" failure mode (caught one IFS='\|\|' bug during initial authoring — recorded in the script comment) - normalize() strips timestamp + daemon (legitimate per-producer differences); everything else must be byte-equal - Result: 4/4 fixtures match, including unicode scripts/cutover/parity/session_log_helper/main.go: - Tiny stdin/stdout Go helper that round-trips a fixture through validator.SessionRecord serde - Counterpart to crates/gateway/src/bin/parity_session_log.rs docs/ARCHITECTURE_COMPARISON.md decisions tracker: - "Rust observability parity" row added (DONE 2026-05-02) - Cross-runtime probe documented as reusable gate STATE_OF_PLAY refreshed. Both observability pieces (trace-id propagation, session JSONL) now exist on both runtimes. Operators who point Rust gateway and Go validatord at the same session-log path get a unified longitudinal stream queryable via DuckDB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:39:49 -05:00
root	b0c8a3f227	parity probes: materializer + extract_json (caught + fixed real bug) Two new cross-runtime parity probes joining the validator probe from the gauntlet wave. Pattern: feed identical input through Rust and Go; diff outputs. Each probe surfaced a different signal. ## Materializer parity probe scripts/cutover/parity/materializer_parity.sh runs Bun + Go materializer against an identical synthetic data/_kb/ root, diffs the resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at). First run: 0/2 match. Real finding: Go's Provenance.LineOffset had `json:"line_offset,omitempty"` which strips the field when value is 0. Line offset 0 is the FIRST ROW of every source file — a real semantic value, not absent. Bun side always emits it. Fix: drop `omitempty` on Provenance.LineOffset. Updated comment explaining why. Re-run: 2/2 match. On-wire JSON parity holds. ## extract_json parity probe scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture strings through both runtimes' extract_json: - fenced ```json``` blocks - unfenced ``` blocks - bare braces with prose around - first-balanced-of-many - nested objects - unicode in string values - escaped quotes - empty object - top-level array (both return first inner object) - no JSON - depth-balanced but invalid syntax - trailing garbage Substrate gate: cargo test -p gateway extract_json PASS before probe. Result: 12/12 match. Algorithms genuinely equivalent. ## scripts/cutover/parity/extract_json_helper/main.go Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints {matched, value} JSON. Counterpart to the Rust parity_extract_json binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit). ## Pattern crystallized Every cross-runtime port should land with a parity probe. Three probes now exist: - validator (5/6 wire-format gap captured 2026-05-02) - materializer (caught + fixed real bug 2026-05-02) - extract_json (12/12 match 2026-05-02) The instrument is reusable — each new shared HTTP/CLI surface gets a probe row added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:43:54 -05:00
root	e8cf113af8	gauntlet 2026-05-02: smoke chain + per-component scrum + parity probe Production-readiness gauntlet exploiting the dual Rust/Go implementation as a measurement instrument. ## Phase 1 — Full smoke chain 21/21 PASS in ~60s. Substrate intact across the full service surface. ## Phase 2 — Per-component scrum (token-volume fix) Prior wave (165KB diff): Kimi 62 tokens out, Qwen 297 → no useful analysis. This wave splits today's commits into 4 focused bundles (36-71KB each): c1 validatord (46KB) → 0 convergent / 11 distinct c2 vectord substrate (36KB) → 0 convergent / 10 distinct c3 materializer (71KB) → 0 convergent / 6 distinct (Opus emitted a BLOCK then self-retracted in same response) c4 replay (45KB) → 0 convergent / 10 distinct Reviewer engagement vs prior wave: Kimi went 62 → ~250 tokens out once bundles dropped below 60KB. scripts/scrum_review.sh hardening: * Diff-size guard (warn >60KB, hard-fail >100KB, SCRUM_FORCE_OVERSIZE=1 override) * Tightened prompt — file path must appear EXACTLY as in diff so post-processor can grep WHERE: lines reliably * Auto-tally step dedupes by (reviewer, location); convergence counts distinct lineages (closes the prior `opus+opus+opus` false-convergence bug) ## Phase 3 — Cross-runtime validator parity probe (the headline finding) scripts/cutover/parity/validator_parity.sh sends 6 identical /v1/validate cases to Rust :3100 AND Go :4110, compares status+body. Result: 6/6 status codes match · 5/6 body shapes diverge. Rust returns serde-tagged enum: {"Schema":{"field":"x","reason":"y"}} Go returns flat exported-fields: {"Kind":"schema","Field":"x","Reason":"y"} Both round-trip inside their own runtime; a caller swapping one for the other would break parsing silently. Captured as new _open_ row in docs/ARCHITECTURE_COMPARISON.md decisions tracker. This is the "use the dual-implementation as a measurement instrument" return — single-repo scrums can't catch this class of cross-runtime drift. ## Phase 4 — Production assessment ship-with-known-gap. Validator wire-format gap is documented, not regressed. ~50 LOC future fix on Go side (custom MarshalJSON on ValidationError to match Rust's serde shape). Persistent stack config (/tmp/lakehouse-persistent.toml) gains validatord on :3221 + persistent-validatord binary so operators bringing up the persistent stack get the new daemon automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:05:18 -05:00
root	277884b5eb	multitier_100k: 335k scenarios @ 1,115/sec against 100k corpus, 4/6 at 0% fail J asked for a much more sophisticated test using the 100k corpus from the Rust legacy database. This commit ships: scripts/cutover/multitier/main.go — 6-scenario harness with weighted random selection per goroutine. Mixes search, email/SMS/fill validators (in-process via internal/validator), profile swap with ExcludeIDs, repeat-cache exercise, and playbook record/replay. Scenarios + weights (cumulative scenario fractions): 35% cold_search_email — search + email outreach + EmailValidator 15% surge_fill_validate — search + fill proposal + FillValidator + record 15% profile_swap — original search + ExcludeIDs swap + no-overlap check 15% repeat_cache — same query × 5 (cache effectiveness) 10% sms_validate — SMS draft (≤160 chars, phone for SSN-FP guard) 10% playbook_record_replay — cold → record → warm w/ use_playbook=true Test results (5-min sustained, conc=50, 100k workers indexed): TOTAL 335,257 scenarios @ 1,115/sec cold_search_email 117k @ 0.0% fail · p50 2.2ms · p99 8.6ms surge_fill_validate 50k @ 98.8% fail (substrate bug below) profile_swap 50k @ 0.0% fail · p50 4.5ms · ExcludeIDs verified repeat_cache 50k × 5 = 252k searches @ 0.0% fail · p50 11.7ms sms_validate 33k @ 0.0% fail · phone-pattern guard works playbook_record_replay 33k @ 96.8% fail (substrate bug below) Total successful workflows: ~250k+ Validator integration verified at load: 150,930 EmailValidator passes across cold_search_email + sms_validate 35 + 1,061 successful FillValidator + playbook_record (where the bug didn't fire) zero false positives on the SSN-pattern guard against phone numbers Resource footprint at 100k: vectord 1.23GB RSS (linear with 100k vectors) matrixd 26MB, 75% CPU (1-core saturated at conc=50) Total across 11 daemons: 1.7GB Compare to Rust at 14.9GB — ~10× less even at 100k. SUBSTRATE BUG SURFACED: coder/hnsw v0.6.1 nil-deref in layerNode.search at graph.go:95. Triggers on /v1/matrix/playbooks/record under sustained writes to the small playbook_memory index. Both Add and Search paths can panic. Workaround applied (this commit) in internal/vectord/index.go BatchAdd: recover() guard converts panic to error; daemon stays up instead of crashing the request handler. Operator recovery procedure (also documented in the report): curl -X DELETE http://localhost:4215/vectors/index/playbook_memory Next record recreates the index fresh. Real fix DEFERRED — open in docs/ARCHITECTURE_COMPARISON.md Decisions tracker. Three options: a) upstream patch to coder/hnsw b) custom small-index Add path that always rebuilds when len < threshold c) alternate store for playbook_memory (Lance? in-memory map?) Evidence: reports/cutover/multitier_100k.md (full methodology + results + repro + bug analysis). docs/ARCHITECTURE_COMPARISON.md Decisions tracker updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 06:28:50 -05:00
root	c164a3da96	g5 cutover: production load test — 0 errors / 101k req · Go direct = 2,772 RPS Sustained-traffic load test against the cutover slice. Three runs, zero correctness errors across 101,770 total requests. Substrate holds up under concurrent load — matrix gate, vectord HNSW, embedd cache, gateway proxy all hold. This was the load test's primary question; latency numbers are secondary. scripts/cutover/loadgen — focused Go load generator. 6-query rotating body mix (Forklift/CNC/Warehouse/Picker/Loader/Shipping). Configurable URL/concurrency/duration. Reports per-status-code counts + p50/p95/p99 latencies + JSON summary on stderr. Three runs: baseline (Bun → Go, conc=1, 10s): 4,085 req · 408 RPS · p50 1.3ms · p99 32ms · max 215ms sustained (Bun → Go, conc=10, 30s): 14,527 req · 484 RPS · p50 4.6ms · p99 92ms · max 372ms direct (→ Go, conc=10, 30s): 83,158 req · 2,772 RPS · p50 2.5ms · p99 8.5ms · max 16ms Critical findings: 1. ZERO correctness errors across 101k requests. No 5xx, no transport errors, no panics. Concurrency-safety verified across matrix gate / vectord / gateway / embedd cache. 2. Direct-to-Go is production-grade. 2,772 RPS at p99 8.5ms on a single host, no scaling cliff at concurrency=10. 3. Bun frontend is the bottleneck. -82% RPS, +982% p99 vs direct. Single-process JS event loop queueing under concurrent requests — known Bun proxy-mode characteristic. The substrate itself isn't the limiter. 4. For staffing-domain demand levels (<1 RPS typical per coordinator), Bun-fronted 484 RPS has 480× headroom. No urgency to optimize Bun out of the data path. If/when concurrent demand grows orders of magnitude, the path is nginx → Go direct for hot endpoints, skip Bun. Substrate is now load-tested and verified production-ready. What this load test does NOT cover (documented in g5_load_test.md): cold-cache embed, larger corpus, mixed read/write, multi-host, full 5-loop traffic with judge gate calls. Each is its own probe shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 04:20:41 -05:00
root	4fd560cad6	start_go_stack.sh: third isolation layer (port range :4xxx for persistent) Earlier push exposed the gap in the previous 2-layer isolation: smokes still failed because they tried to bind :3211-:3220 which my persistent stack already had. Smoke catalogd's bind-failure went undetected because poll_health 3212 succeeded responding to the persistent catalogd, and smoke proceeded against the wrong backend with the wrong bucket expectations. Fix: persistent stack now uses :4110 + :4211-:4219 via additional sed in the temp toml (bind addresses + upstream URLs). Smoke harnesses keep :3110 + :3211-:3219. Both reach the SAME chatd at :3220 because chatd is read-mostly (no state to clobber) and operators don't want to maintain two LLM provider key sets. Three isolation layers now in effect: 1. Binary names (bin/persistent-* via symlinks) 2. MinIO buckets (lakehouse-go-persistent vs lakehouse-go-primary) 3. Port range (:4xxx vs :3xxx, with shared chatd on :3220) Verified pre-push: - 11 persistent ports listening on :4xxx + :3220 - 0 smoke ports listening on :3110-:3219 (free for smokes) Pushed while persistent stack live — first cross-isolation test (no port collision, no bucket collision, no name collision). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 03:26:41 -05:00
root	c48b58ff8d	start_go_stack.sh: 2-layer isolation from smoke harness The 2026-05-01 persistent-stack milestone exposed two collision modes between the long-running Go stack and the pre-push smoke harness: 1. PKILL COLLISION: smoke teardown uses anchored `pkill -f "bin/(storaged\|...\|gateway)$"`. Same-named persistent processes match → smokes kill 7 of 11 persistent daemons. 2. MINIO STATE COLLISION: persistent stack writes `_vectors/workers.lhv1` to the shared lakehouse-go-primary bucket. Smoke vectord rehydrates from same bucket → sees both smoke-owned and persistent-owned indexes → assertion failures. Both fixed in this commit by adding two isolation layers: LAYER 1 — distinct binary names via symlink: bin/persistent-<daemon> → bin/<daemon> Persistent stack runs as ./bin/persistent-gateway etc. Smoke pattern `bin/(name)$` matches `bin/gateway$` but NOT `bin/persistent-gateway$` (regex group requires bin/ followed immediately by a daemon name; "bin/p..." doesn't qualify). Cmdline lookup verified: 7 persistent procs, 0 match smoke pkill. LAYER 2 — separate MinIO bucket via temp config: Persistent stack writes to lakehouse-go-persistent (configurable via $LH_PERSISTENT_BUCKET). Temp toml at /tmp/lakehouse-persistent.toml inherits everything from lakehouse.toml except [s3].bucket which is sed-replaced. Bucket auto-created via mc if missing. Verified: workers.lhv1 lands in persistent bucket; primary bucket _vectors/ stays empty. Net effect: the persistent stack should survive `git push` (which runs smokes that rehydrate vectord from primary bucket and pkill their own bin/<name>$ daemons). This commit is the first push test WITH the persistent stack live. Caveat: bin/persistent-* symlinks are gitignored already (/bin/ is in .gitignore wholesale), so the symlinks need to be created on each fresh checkout — which start_go_stack.sh does idempotently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 03:20:00 -05:00
root	54b2e7db76	start_go_stack.sh: document smoke-vs-persistent-stack pkill conflict Caught immediately after the prior commit pushed: pre-push smokes killed 7 of 11 persistent Go daemons because the smokes' anchored `pkill -f "bin/(name)$"` teardown matches ANY process named `bin/<daemon>`, not just the smokes' own children. Documented in the script header as a KNOWN CONSTRAINT with a workaround (re-run start_go_stack.sh after every push) and a proper-fix sketch (give the persistent stack a different binary name via build tag or symlink). Proper fix deferred until trigger fires — operators living through this once will know to want it. Persistent stack restored (all 11 healthy as of this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:56:52 -05:00
root	09904d5222	cutover: persistent Go stack milestone — first long-running deployment + first Go-emitted audit_baselines entry J's "let's go" instruction: leave OPEN list behind, push the Go substrate forward into actual deployment shape. This commit marks the first time the Go side has run as long-running daemons rather than per-harness transient processes, and the first time the shared cross-runtime longitudinal log has carried a Go-emitted entry alongside the Rust ones. What landed: scripts/cutover/start_go_stack.sh — the persistent-stack runbook. Brings up all 11 daemons (storaged → catalogd → ingestd → queryd → embedd → vectord → pathwayd → observerd → matrixd → gateway, plus chatd-if-not-already-up) in dependency order via nohup + disown. Anchored pkill per feedback_pkill_scope (never bare "bin/"). Logs land in /tmp/gostack-logs/<bin>.log, one per daemon. Verified live state: - All 11 services healthy on :3110 + :3211-:3220 - gateway → embedd proxy returns nomic-embed-text-v2-moe vectors - chatd reports 5/5 providers loaded - No port collision with Rust gateway on :3100 - Daemons stay up after exit of the start script (production shape, not harness-transient) audit_baselines.jsonl crosses the runtime boundary: - 7 Rust-emitted entries (last: ca7375ea 2026-04-27) - 1 Go-emitted entry (ee2a40c 2026-05-01T07:53:54Z) appended via ./bin/audit_full -append-baseline - Same envelope shape, same metric set, same drift comparator semantics — operators running either runtime grow the same log What this DOES prove: - Substrate parity at deployment shape (not just unit tests) - Cross-runtime artifact write-side compatibility (was previously proven on read side via audit_baselines roundtrip) - The deploy machinery works end-to-end for the persistent case What this does NOT prove (still ahead): - Real coordinator traffic against the Go stack (no nginx flip yet; devop.live/lakehouse/ still serves through Rust) - Go-side production materializer (Phase 2 is observer-only) - Replay tool parity (Phase 7 is observer-only) - The 5-loop product gate against actual humans reports/cutover/SUMMARY.md now logs three new rows: - audit-FULL with 12/12 phases ported - First Go-emitted audit_baselines entry - Persistent Go stack live Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:55:29 -05:00
root	0d4f033b34	audit_baselines: round-trip validation against live Rust data Same shape of proof as embed_parity.sh for the embed endpoint: take the just-shipped Go port (ca142b9) and validate it against the actual production data the Rust legacy emits, not just unit- test fixtures. Locks the cross-runtime parity that operators running mixed pipelines depend on. scripts/cutover/audit_baselines_validate.go: - Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl - Parses every entry via the Go AuditBaseline struct - Round-trips the last entry: encode → decode → field-by-field equality check (catches any silently-dropped JSON keys) - Calls LoadLastBaseline against the live file (proves the public API works on real shapes, not just inline parsing) - Computes BuildAuditDriftTable(first → last) — full-window lineage drift over the captured baselines Live-data probe results (reports/cutover/audit_baselines_roundtrip.md): - 7 entries parse without error - Round-trip is byte-equal on every metric + every header field - Drift table fires the expected verdicts: - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold) - p3_accepted/partial/rejected/human 0→non-zero → warn (the zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline was designed to lock — verified now firing on real history) - p4_* metrics +0% → ok (stable across the window) What this does NOT prove (documented in the report): the Go-side audit-FULL pipeline that PRODUCES baselines doesn't exist yet. Only the load/append/drift substrate is ported. Operators running audit-full from Go would still need a metric-collection pass — that's a separate port deliberately not in this wave. reports/cutover/SUMMARY.md gains a new row alongside the embed parity entries; cutover-prep verification log keeps the discipline of "verified against real data, not just fixtures." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:20:18 -05:00
root	3263254f1c	reality_test real_003: 40-query paraphrase stress + extractor extension Stress-tests the role gate with 40 queries (10 fill_events rows × 4 styles): need, client_first, looking, shorthand. Each row's role + client + city stays the same; only the surface phrasing changes. real_003 (original extractor) confirmed the shorthand-vs-shorthand failure mode: CNC Operator shorthand recording leaked w-2404 onto Forklift Operator shorthand query within the same Beacon Freight Detroit cluster. Both record + query had empty role (extractor returns "" for shorthand because there's no separator between role and city), gate disabled, distance check passed, bleed fired. Fix: extended extractRoleFromNeed to handle client_first ("{client} needs N {role} in...") and looking ("Looking for N {role} at...") patterns. Shorthand left intentionally unmatched — "Forklift Operator Detroit" is shape-indistinguishable from "Forklift" + "Operator Detroit" without an LLM extractor or known- cities lookup. real_003b (extended extractor) verifies bleed closed across all 4 styles for this dataset. Forklift Operator queries keep w-2136 (the cold-pass-correct match) regardless of which style the query came in. Same-role boosts now fire correctly across styles — a CNC Operator recording made in `looking` style boosts the CNC need-form query. scripts/cutover/gen_real_queries.go: added -styles flag with values need\|client_first\|looking\|shorthand\|all (default need preserves real_001/002 behavior). Tests/reality/real_coord_queries_v2.txt is the 40-query stress file. scripts/playbook_lift/main_test.go: 10 sub-tests lock the four documented patterns + shorthand limitation + lift-suite-style queries (no clean role, returns empty as expected). Aggregate metrics: - real_003 (original): disc=7, lift=7, boost=14, meanΔ=-0.108 - real_003b (extended): disc=11, lift=10, boost=31, meanΔ=-0.202 The growth reflects more LEGITIMATE same-role same-cluster transfer firing across styles, not bleed (verified by per-cluster bleed table — Forklift Operator queries unchanged across all 4 styles). Known limitation documented in real_003_findings.md: same-cluster, same-role queries in shorthand still embed close enough that a shorthand recording could bleed onto a different-role shorthand query if both record + query strip role. Closing this requires LLM extraction or known-cities lookup at record + query time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 21:42:02 -05:00
root	7f2f112e6a	reality_test real_001: real-shape coordinator queries — surfaces cross-role bleed First retrieval probe with non-synthetic query distribution. Pulls N rows from /home/profit/lakehouse/data/datasets/fill_events.parquet (real-shape demand data) and translates each to the natural language a coordinator would type: "Need {count} {role}s in {city} {state} starting at {at} for {client}". Headline: 8/10 cold-pass top-1 = judge-best on real distribution. Substrate works on queries it was never trained for. v2-moe + workers corpus carry the load. Surfaced finding (the real value of running this): same-client+city queries cluster, and Shape A's distance boost bleeds across roles within the cluster. Q#2 (Forklift @ Beacon Freight Detroit) records e-6193 in the playbook corpus. Q#5 (Pickers same client+city) and Q#10 (CNC Operator same client+city) inherit e-6193 at warm top-1 even though: - Neither query has its own recorded playbook. - Neither warm pass triggers a Shape B inject (boosted=0). - The roles are different staffing categories. Q#10 specifically demoted the cold-pass-correct w-3759 (judge rating 4 at rank 0) for a worker who was approved by the judge for a different role on a different query. Why the lift suite missed it: synthetic queries use 7 disjoint scenario buckets (forklift+OSHA+WI / CDL+IL / etc.). Real demand clusters on (client, city). The cluster doesn't exist in the synthetic distribution. Why the judge gate doesn't catch it: the gate (5a3364f) is per-injection at record time. After approval the worker rides Shape A distance boosts on all later same-cluster queries with no second gate call. Becomes new OPEN #1. Fix candidate: role-scoped playbook corpus metadata + Shape A boost gate on role match. Cheap; doesn't need new judge calls. Files: - scripts/cutover/gen_real_queries.go: parquet → coordinator NL - tests/reality/real_coord_queries.txt: 10 generated queries - reports/reality-tests/playbook_lift_real_001.md: harness output - reports/reality-tests/real_001_findings.md: the reading Repro: go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_001 \ WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:18:40 -05:00
root	5687ec65c2	G5 cutover prep: embed parity probe — Rust /ai/embed ↔ Go /v1/embed verified First concrete cutover artifact: scripts/cutover/embed_parity.sh brings up Go embedd + gateway alongside the live Rust gateway, hits both /ai/embed and /v1/embed with the same forced model, and emits a per-date verdict report under reports/cutover/. Why embed first: the parity invariant is one math identity (cosine sim of vectors against same input). Retrieve has thousands of edge cases. If embed parity holds, all downstream vector consumers inherit confidence; if it doesn't, we catch it in 30s instead of after a flip. Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both Ollamas have it loaded). Math is provably equivalent across the gateway plumbing. Drift catalog (reports/cutover/SUMMARY.md): - URL: Rust /ai/embed vs Go /v1/embed - Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors, dimension} (singular). Wire-format adapter is the only real cutover work for this endpoint. - L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same direction (cos=1.0); harmless under cosine-distance HNSW (which is Go vectord's default), but worth fixing in internal/embed/ before extending to euclidean indexes. reports/cutover/ now tracked (joined the scrum/ + reality-tests/ exemptions in .gitignore). Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the real user-facing retrieve path. Embed parity gives that probe a clean foundation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:07:04 -05:00

14 Commits