First concrete cutover artifact: scripts/cutover/embed_parity.sh
brings up Go embedd + gateway alongside the live Rust gateway,
hits both /ai/embed and /v1/embed with the same forced model, and
emits a per-date verdict report under reports/cutover/.
Why embed first: the parity invariant is one math identity (cosine
sim of vectors against same input). Retrieve has thousands of edge
cases. If embed parity holds, all downstream vector consumers
inherit confidence; if it doesn't, we catch it in 30s instead of
after a flip.
Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced
to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both
Ollamas have it loaded). Math is provably equivalent across the
gateway plumbing.
Drift catalog (reports/cutover/SUMMARY.md):
- URL: Rust /ai/embed vs Go /v1/embed
- Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors,
dimension} (singular). Wire-format adapter is the only real
cutover work for this endpoint.
- L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same
direction (cos=1.0); harmless under cosine-distance HNSW (which
is Go vectord's default), but worth fixing in internal/embed/
before extending to euclidean indexes.
reports/cutover/ now tracked (joined the scrum/ + reality-tests/
exemptions in .gitignore).
Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the
real user-facing retrieve path. Embed parity gives that probe a
clean foundation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1.9 KiB
1.9 KiB
Embed parity probe — 2026-04-30T20:04-05:00
Forced model: nomic-embed-text-v2-moe on both sides (isolates plumbing from
default-model drift; Rust default = v1, Go default = v2-moe).
| # | Sample (head) | Dim R/G | Cosine | L2 R | L2 G | Max|Δ| |
|---|---|---|---|---|---|---|
| 1 | hello |
768 / 768 | 1.000000 | 1.000000 | 13.441846 | 3.401060 |
| 2 | forklift operator with OSHA cert |
768 / 768 | 1.000000 | 1.000000 | 13.752762 | 1.640122 |
| 3 | Need 5 production workers in Aurora IL f |
768 / 768 | 1.000000 | 1.000000 | 13.882167 | 1.529343 |
| 4 | résumé: 12 yrs warehouse — pick/pack |
768 / 768 | 1.000000 | 1.000000 | 12.507687 | 1.533719 |
| 5 | Q: who's available next Friday? A: Bob, |
768 / 768 | 1.000000 | 1.000000 | 13.978468 | 1.829930 |
Verdict
PASS — 5/5 samples ≥ 0.9990 cosine similarity. Gateway plumbing is at-parity for embed.
First-flip ready: nginx-side or Bun-side routing of /ai/embed to Go's /v1/embed
(with the wire-format remap noted in §Drift below) is safe to attempt.
Drift notes
- URL prefix: Rust uses
/ai/embed(nested under/ai); Go uses/v1/embed(gateway strips/v1then forwards to embedd at:3216/embed). - Wire format: Rust returns
{embeddings, model, dimensions}(plural); Go returns{vectors, model, dimension}(singular). A flip needs either a wire-shape adapter on the Go side, or callers updated to handle both shapes. - Default model: Rust default =
nomic-embed-text(v1, 137M); Go default =nomic-embed-text-v2-moe(v2 MoE, 475M). This probe forces v1 on both to isolate plumbing parity. The v2-moe upgrade is intentional and a separate dimension.
Repro
cd /home/profit/golangLAKEHOUSE
./scripts/cutover/embed_parity.sh # default: model=nomic-embed-text
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh # measure embedder drift