G5 cutover prep: embed parity probe — Rust /ai/embed ↔ Go /v1/embed verified

First concrete cutover artifact: scripts/cutover/embed_parity.sh
brings up Go embedd + gateway alongside the live Rust gateway,
hits both /ai/embed and /v1/embed with the same forced model, and
emits a per-date verdict report under reports/cutover/.

Why embed first: the parity invariant is one math identity (cosine
sim of vectors against same input). Retrieve has thousands of edge
cases. If embed parity holds, all downstream vector consumers
inherit confidence; if it doesn't, we catch it in 30s instead of
after a flip.

Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced
to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both
Ollamas have it loaded). Math is provably equivalent across the
gateway plumbing.

Drift catalog (reports/cutover/SUMMARY.md):
- URL: Rust /ai/embed vs Go /v1/embed
- Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors,
  dimension} (singular). Wire-format adapter is the only real
  cutover work for this endpoint.
- L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same
  direction (cos=1.0); harmless under cosine-distance HNSW (which
  is Go vectord's default), but worth fixing in internal/embed/
  before extending to euclidean indexes.

reports/cutover/ now tracked (joined the scrum/ + reality-tests/
exemptions in .gitignore).

Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the
real user-facing retrieve path. Embed parity gives that probe a
clean foundation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-30 20:07:04 -05:00
parent a2fa9a2ce7
commit 5687ec65c2
6 changed files with 337 additions and 0 deletions

1
.gitignore vendored
View File

@ -40,6 +40,7 @@ vendor/
/reports/*
!/reports/scrum/
!/reports/reality-tests/
!/reports/cutover/
# Inside the audit directory, the per-run _evidence/ dump (smoke logs,
# command output) IS runtime — track the dir, ignore its contents.
/reports/scrum/_evidence/*

View File

@ -258,6 +258,9 @@ The list is intentionally short. Items move to closed when the work demands them
| `7e6431e` | langfuse: Go-side client + Phase 1c instrumentation |
| `08a0867` | multi_coord_stress: fresh_workers two-tier index — fresh-resume now top-1 (3/3) |
| `5d49967` | multi_coord_stress: full Langfuse coverage — every phase + every call (111 observations) |
| `68d9e55` | shared: auto-emit Langfuse trace+span per HTTP request — closes OPEN #2 |
| `a2fa9a2` | scrum_review: pipe diff via temp files — fixes argv overflow on large bundles |
| (prep) | G5 cutover prep: `embed_parity` probe — Rust `/ai/embed` ↔ Go `/v1/embed` 5/5 cos=1.000 (both v1 and v2-moe). Verdict + drift catalog in `reports/cutover/SUMMARY.md`. Wire-format remap (`embeddings`/`vectors`, `dimensions`/`dimension`) is the only real cutover work; math is provably equivalent. |
Plus on Rust side (`8de94eb`, `3d06868`): qwen2.5 → qwen3.5:latest backport in active defaults; distillation acceptance reports regenerated (run_hash refresh, reproducibility property still holds).

View File

@ -0,0 +1,59 @@
# G5 cutover prep — verified-parity log
What works on Go gateway, what's been side-by-side compared to Rust,
what's safe to flip. Append a row when a new endpoint clears parity.
| Endpoint | Date | Rust path | Go path | Verdict | Notes |
|---|---|---|---|---|---|
| `embed` (forced v1) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text` forced both sides |
| `embed` (forced v2-moe) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model |
## Wire-format drift catalog
The Go gateway is *not* a literal nginx-swap drop-in for the Rust
gateway. Anything that flips needs a wire-shape adapter. Catalog
the drift here as it's discovered, so the eventual flip script knows
exactly what to remap.
### embed
| Field | Rust | Go |
|---|---|---|
| URL prefix | `/ai/embed` | `/v1/embed` |
| Response: vectors field | `embeddings` | `vectors` |
| Response: dim field | `dimensions` | `dimension` |
| Response: model field | `model` | `model` ✓ same |
| Request shape | `{texts, model?}` | `{texts, model?}` ✓ same |
| L2 normalization | unit vectors (‖v‖ ≈ 1.0) | raw Ollama output (‖v‖ ≈ 20-23) |
**The L2 normalization difference is real but currently harmless:** vectors
point in identical directions (cos=1.000) but Go has raw magnitudes. Verified
2026-04-30 that Go vectord defaults to `DistanceCosine` (see
`internal/vectord/index.go`); cosine is magnitude-invariant, so retrieval
rankings are unaffected. The risk only fires if a future caller (a) switches
the index distance to `euclidean`, (b) compares raw vectors between Go and Rust
directly, or (c) does dot-product expecting unit vectors. Adding a
normalization step in `internal/embed/embed.go` would make the cutover safer
and is cheap — but not blocking.
## Repro
```bash
./scripts/cutover/embed_parity.sh # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh # measure embedder
```
Each run drops a per-date verdict at `reports/cutover/embed_parity_<DATE>.md`.
## What's *not* yet probed
- `/v1/sql` ↔ Rust `/query` — query shape parity
- `/v1/vectors/search` ↔ Rust `/vectors/search` — recall@k parity
- `/v1/matrix/retrieve` ↔ Rust `/vectors/hybrid` — semantic retrieve parity (highest-leverage)
- `/v1/storage/*` ↔ Rust `/storage/*` — direct S3 abstraction parity
- `/v1/chat` — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested
The matrix-retrieve probe is the next-highest leverage because it's
the actual user-facing retrieval path. Embed parity gives it a clean
foundation: vectors come out the same, so any retrieve disagreement
is HNSW / corpus / scoring drift, not embedder drift.

View File

@ -0,0 +1,33 @@
# Embed parity probe — 2026-04-30T20:04-05:00
Forced model: `nomic-embed-text` on both sides (isolates plumbing from
default-model drift; Rust default = v1, Go default = v2-moe).
| # | Sample (head) | Dim R/G | Cosine | L2 R | L2 G | Max\|Δ\| |
|---|---|---|---|---|---|---|
| 1 | `hello` | 768 / 768 | 1.000000 | 1.000000 | 23.500095 | 3.943768 |
| 2 | `forklift operator with OSHA cert` | 768 / 768 | 1.000000 | 1.000000 | 21.434569 | 3.591839 |
| 3 | `Need 5 production workers in Aurora IL f` | 768 / 768 | 1.000000 | 1.000000 | 20.636244 | 3.937658 |
| 4 | `résumé: 12 yrs warehouse — pick/pack` | 768 / 768 | 1.000000 | 1.000000 | 19.695088 | 3.624522 |
| 5 | `Q: who's available next Friday? A: Bob, ` | 768 / 768 | 1.000000 | 1.000000 | 22.052443 | 3.720862 |
## Verdict
**PASS** — 5/5 samples ≥ 0.9990 cosine similarity. Gateway plumbing is at-parity for embed.
First-flip ready: nginx-side or Bun-side routing of `/ai/embed` to Go's `/v1/embed`
(with the wire-format remap noted in §Drift below) is safe to attempt.
## Drift notes
- **URL prefix**: Rust uses `/ai/embed` (nested under `/ai`); Go uses `/v1/embed` (gateway strips `/v1` then forwards to embedd at `:3216/embed`).
- **Wire format**: Rust returns `{embeddings, model, dimensions}` (plural); Go returns `{vectors, model, dimension}` (singular). A flip needs either a wire-shape adapter on the Go side, or callers updated to handle both shapes.
- **Default model**: Rust default = `nomic-embed-text` (v1, 137M); Go default = `nomic-embed-text-v2-moe` (v2 MoE, 475M). This probe forces v1 on both to isolate plumbing parity. The v2-moe upgrade is intentional and a separate dimension.
## Repro
```bash
cd /home/profit/golangLAKEHOUSE
./scripts/cutover/embed_parity.sh # default: model=nomic-embed-text
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh # measure embedder drift
```

View File

@ -0,0 +1,33 @@
# Embed parity probe — 2026-04-30T20:04-05:00
Forced model: `nomic-embed-text-v2-moe` on both sides (isolates plumbing from
default-model drift; Rust default = v1, Go default = v2-moe).
| # | Sample (head) | Dim R/G | Cosine | L2 R | L2 G | Max\|Δ\| |
|---|---|---|---|---|---|---|
| 1 | `hello` | 768 / 768 | 1.000000 | 1.000000 | 13.441846 | 3.401060 |
| 2 | `forklift operator with OSHA cert` | 768 / 768 | 1.000000 | 1.000000 | 13.752762 | 1.640122 |
| 3 | `Need 5 production workers in Aurora IL f` | 768 / 768 | 1.000000 | 1.000000 | 13.882167 | 1.529343 |
| 4 | `résumé: 12 yrs warehouse — pick/pack` | 768 / 768 | 1.000000 | 1.000000 | 12.507687 | 1.533719 |
| 5 | `Q: who's available next Friday? A: Bob, ` | 768 / 768 | 1.000000 | 1.000000 | 13.978468 | 1.829930 |
## Verdict
**PASS** — 5/5 samples ≥ 0.9990 cosine similarity. Gateway plumbing is at-parity for embed.
First-flip ready: nginx-side or Bun-side routing of `/ai/embed` to Go's `/v1/embed`
(with the wire-format remap noted in §Drift below) is safe to attempt.
## Drift notes
- **URL prefix**: Rust uses `/ai/embed` (nested under `/ai`); Go uses `/v1/embed` (gateway strips `/v1` then forwards to embedd at `:3216/embed`).
- **Wire format**: Rust returns `{embeddings, model, dimensions}` (plural); Go returns `{vectors, model, dimension}` (singular). A flip needs either a wire-shape adapter on the Go side, or callers updated to handle both shapes.
- **Default model**: Rust default = `nomic-embed-text` (v1, 137M); Go default = `nomic-embed-text-v2-moe` (v2 MoE, 475M). This probe forces v1 on both to isolate plumbing parity. The v2-moe upgrade is intentional and a separate dimension.
## Repro
```bash
cd /home/profit/golangLAKEHOUSE
./scripts/cutover/embed_parity.sh # default: model=nomic-embed-text
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh # measure embedder drift
```

208
scripts/cutover/embed_parity.sh Executable file
View File

@ -0,0 +1,208 @@
#!/usr/bin/env bash
# scripts/cutover/embed_parity.sh
#
# G5 cutover prep — first-flip probe on the cleanest endpoint.
#
# Brings up the Go embedd + gateway on :3216/:3110, then for a fixed
# corpus of texts, hits both:
# - Rust: POST localhost:3100/ai/embed {texts:[...], model:"nomic-embed-text"}
# - Go: POST localhost:3110/v1/embed {texts:[...], model:"nomic-embed-text"}
# and computes cosine similarity + L2 norm + max abs component delta.
#
# Verdict goes to reports/cutover/embed_parity_<DATE>.md.
#
# IMPORTANT: model is forced to "nomic-embed-text" on both sides so
# we isolate "is the gateway-plumbing equivalent?" from "is the
# default model the same?" (Rust default = v1, Go default = v2-moe;
# different models = different vectors by design).
#
# Why this is the first flip: vectors have a single trivially-
# measurable parity invariant (cosine sim + L2 norm). Retrieve has
# thousands of edge cases. If embed parity holds, all downstream
# vector-using endpoints inherit confidence. If it doesn't, we catch
# the issue in 30 seconds instead of after a flip.
set -euo pipefail
cd "$(dirname "$0")/../.."
REPO="$(pwd)"
DATE="$(date +%Y%m%d)"
REPORT="reports/cutover/embed_parity_${DATE}.md"
mkdir -p reports/cutover
RUST_URL="${RUST_URL:-http://127.0.0.1:3100}"
GO_URL="${GO_URL:-http://127.0.0.1:3110}"
MODEL="${MODEL:-nomic-embed-text}"
echo "[cutover] embed parity probe — Rust ${RUST_URL}/ai/embed vs Go ${GO_URL}/v1/embed"
echo "[cutover] model forced to: ${MODEL}"
# Verify Rust side is up before we bother launching Go.
if ! curl -sSf -m 3 "${RUST_URL}/health" >/dev/null 2>&1; then
echo "[cutover] Rust gateway not up at ${RUST_URL} — start lakehouse.service first"
exit 1
fi
# Anchored pkill — bin/(name)$ never matches /bin/ system tools
# (per feedback_pkill_scope; took out MinIO once with a bare pattern).
pkill -f "bin/(embedd|gateway)$" 2>/dev/null || true
sleep 0.3
PIDS=()
TMP="$(mktemp -d)"
CFG="$TMP/cutover.toml"
cleanup() {
echo "[cutover] cleanup"
for p in "${PIDS[@]:-}"; do [ -n "${p:-}" ] && kill "$p" 2>/dev/null || true; done
rm -rf "$TMP"
}
trap cleanup EXIT INT TERM
# Minimal config — only the two daemons under test. Other daemons
# (storaged/catalogd/...) aren't required because the gateway proxies
# lazily and we never hit a non-embed path.
cat > "$CFG" <<EOF
[gateway]
bind = "127.0.0.1:3110"
embedd_url = "http://127.0.0.1:3216"
[embedd]
bind = "127.0.0.1:3216"
provider_url = "http://localhost:11434"
default_model = "${MODEL}"
EOF
poll_health() {
local port="$1" name="$2"
for _ in $(seq 1 50); do
if curl -sSf -m 1 "http://127.0.0.1:${port}/health" >/dev/null 2>&1; then return 0; fi
sleep 0.1
done
echo "[cutover] ${name} (port ${port}) failed to come up"
return 1
}
echo "[cutover] launching embedd + gateway..."
./bin/embedd -config "$CFG" > /tmp/cutover_embedd.log 2>&1 & PIDS+=($!)
poll_health 3216 embedd
./bin/gateway -config "$CFG" > /tmp/cutover_gateway.log 2>&1 & PIDS+=($!)
poll_health 3110 gateway
# Sample corpus — short, medium, long, special chars, domain-flavored.
SAMPLES=(
"hello"
"forklift operator with OSHA cert"
"Need 5 production workers in Aurora IL for night shift starting Monday"
"résumé: 12 yrs warehouse — pick/pack, RF scanner, pallet jack — bilingual"
"Q: who's available next Friday? A: Bob, Carol, Dan."
)
echo "[cutover] running ${#SAMPLES[@]} parity samples..."
REPORT_TMP="$TMP/report.md"
{
echo "# Embed parity probe — $(date -Iminutes)"
echo
echo "Forced model: \`${MODEL}\` on both sides (isolates plumbing from"
echo "default-model drift; Rust default = v1, Go default = v2-moe)."
echo
echo "| # | Sample (head) | Dim R/G | Cosine | L2 R | L2 G | Max\\|Δ\\| |"
echo "|---|---|---|---|---|---|---|"
} > "$REPORT_TMP"
PASS=0
FAIL=0
i=0
for text in "${SAMPLES[@]}"; do
i=$((i+1))
body=$(jq -nc --arg t "$text" --arg m "$MODEL" '{texts:[$t], model:$m}')
rust_resp=$(curl -sS -m 30 -X POST "${RUST_URL}/ai/embed" \
-H 'content-type: application/json' --data "$body")
go_resp=$(curl -sS -m 30 -X POST "${GO_URL}/v1/embed" \
-H 'content-type: application/json' --data "$body")
# Hand to python3 for vector math — bash can't.
result=$(python3 - <<PYEOF
import json, math, sys
rust = json.loads('''${rust_resp}''')
go = json.loads('''${go_resp}''')
# Rust: {embeddings: [[...]], model: ..., dimensions: int}
# Go: {vectors: [[...]], model: ..., dimension: int}
rv = rust["embeddings"][0]
gv = go["vectors"][0]
rd = rust.get("dimensions", len(rv))
gd = go.get("dimension", len(gv))
def l2(v): return math.sqrt(sum(x*x for x in v))
def cos(a, b):
dot = sum(x*y for x, y in zip(a, b))
na, nb = l2(a), l2(b)
return dot / (na * nb) if na > 0 and nb > 0 else 0.0
if rd != gd:
print(f"DIM_MISMATCH|{rd}|{gd}|0.0|0.0|0.0|0.0")
else:
c = cos(rv, gv)
nr = l2(rv)
ng = l2(gv)
md = max(abs(x - y) for x, y in zip(rv, gv))
print(f"OK|{rd}|{gd}|{c:.6f}|{nr:.6f}|{ng:.6f}|{md:.6f}")
PYEOF
)
status=$(echo "$result" | cut -d'|' -f1)
rd=$(echo "$result" | cut -d'|' -f2)
gd=$(echo "$result" | cut -d'|' -f3)
cosv=$(echo "$result" | cut -d'|' -f4)
l2r=$(echo "$result" | cut -d'|' -f5)
l2g=$(echo "$result" | cut -d'|' -f6)
maxd=$(echo "$result" | cut -d'|' -f7)
head=$(echo "$text" | cut -c1-40)
if [ "$status" = "OK" ] && [ "$(awk -v c="$cosv" 'BEGIN{print (c>=0.9990)?1:0}')" = "1" ]; then
PASS=$((PASS+1))
verdict_row="✅"
else
FAIL=$((FAIL+1))
verdict_row="❌"
fi
echo "[cutover] sample $i: status=$status cos=$cosv ${verdict_row}"
echo "| $i | \`$head\` | $rd / $gd | $cosv | $l2r | $l2g | $maxd |" >> "$REPORT_TMP"
done
{
echo
echo "## Verdict"
echo
if [ "$FAIL" -eq 0 ]; then
echo "**PASS** — ${PASS}/${#SAMPLES[@]} samples ≥ 0.9990 cosine similarity. Gateway plumbing is at-parity for embed."
echo
echo "First-flip ready: nginx-side or Bun-side routing of \`/ai/embed\` to Go's \`/v1/embed\`"
echo "(with the wire-format remap noted in §Drift below) is safe to attempt."
else
echo "**FAIL** — ${FAIL}/${#SAMPLES[@]} samples below 0.9990 cosine. Investigate before flipping."
fi
echo
echo "## Drift notes"
echo
echo "- **URL prefix**: Rust uses \`/ai/embed\` (nested under \`/ai\`); Go uses \`/v1/embed\` (gateway strips \`/v1\` then forwards to embedd at \`:3216/embed\`)."
echo "- **Wire format**: Rust returns \`{embeddings, model, dimensions}\` (plural); Go returns \`{vectors, model, dimension}\` (singular). A flip needs either a wire-shape adapter on the Go side, or callers updated to handle both shapes."
echo "- **Default model**: Rust default = \`nomic-embed-text\` (v1, 137M); Go default = \`nomic-embed-text-v2-moe\` (v2 MoE, 475M). This probe forces v1 on both to isolate plumbing parity. The v2-moe upgrade is intentional and a separate dimension."
echo
echo "## Repro"
echo
echo "\`\`\`bash"
echo "cd $(realpath .)"
echo "./scripts/cutover/embed_parity.sh # default: model=nomic-embed-text"
echo "MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh # measure embedder drift"
echo "\`\`\`"
} >> "$REPORT_TMP"
cp "$REPORT_TMP" "$REPORT"
echo "[cutover] report → $REPORT"
echo
echo "[cutover] verdict: ${PASS} pass / ${FAIL} fail (threshold cos ≥ 0.9990)"
[ "$FAIL" -eq 0 ]