First concrete cutover artifact: scripts/cutover/embed_parity.sh
brings up Go embedd + gateway alongside the live Rust gateway,
hits both /ai/embed and /v1/embed with the same forced model, and
emits a per-date verdict report under reports/cutover/.
Why embed first: the parity invariant is one math identity (cosine
sim of vectors against same input). Retrieve has thousands of edge
cases. If embed parity holds, all downstream vector consumers
inherit confidence; if it doesn't, we catch it in 30s instead of
after a flip.
Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced
to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both
Ollamas have it loaded). Math is provably equivalent across the
gateway plumbing.
Drift catalog (reports/cutover/SUMMARY.md):
- URL: Rust /ai/embed vs Go /v1/embed
- Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors,
dimension} (singular). Wire-format adapter is the only real
cutover work for this endpoint.
- L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same
direction (cos=1.0); harmless under cosine-distance HNSW (which
is Go vectord's default), but worth fixing in internal/embed/
before extending to euclidean indexes.
reports/cutover/ now tracked (joined the scrum/ + reality-tests/
exemptions in .gitignore).
Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the
real user-facing retrieve path. Embed parity gives that probe a
clean foundation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
60 lines
2.9 KiB
Markdown
60 lines
2.9 KiB
Markdown
# G5 cutover prep — verified-parity log
|
|
|
|
What works on Go gateway, what's been side-by-side compared to Rust,
|
|
what's safe to flip. Append a row when a new endpoint clears parity.
|
|
|
|
| Endpoint | Date | Rust path | Go path | Verdict | Notes |
|
|
|---|---|---|---|---|---|
|
|
| `embed` (forced v1) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text` forced both sides |
|
|
| `embed` (forced v2-moe) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model |
|
|
|
|
## Wire-format drift catalog
|
|
|
|
The Go gateway is *not* a literal nginx-swap drop-in for the Rust
|
|
gateway. Anything that flips needs a wire-shape adapter. Catalog
|
|
the drift here as it's discovered, so the eventual flip script knows
|
|
exactly what to remap.
|
|
|
|
### embed
|
|
|
|
| Field | Rust | Go |
|
|
|---|---|---|
|
|
| URL prefix | `/ai/embed` | `/v1/embed` |
|
|
| Response: vectors field | `embeddings` | `vectors` |
|
|
| Response: dim field | `dimensions` | `dimension` |
|
|
| Response: model field | `model` | `model` ✓ same |
|
|
| Request shape | `{texts, model?}` | `{texts, model?}` ✓ same |
|
|
| L2 normalization | unit vectors (‖v‖ ≈ 1.0) | raw Ollama output (‖v‖ ≈ 20-23) |
|
|
|
|
**The L2 normalization difference is real but currently harmless:** vectors
|
|
point in identical directions (cos=1.000) but Go has raw magnitudes. Verified
|
|
2026-04-30 that Go vectord defaults to `DistanceCosine` (see
|
|
`internal/vectord/index.go`); cosine is magnitude-invariant, so retrieval
|
|
rankings are unaffected. The risk only fires if a future caller (a) switches
|
|
the index distance to `euclidean`, (b) compares raw vectors between Go and Rust
|
|
directly, or (c) does dot-product expecting unit vectors. Adding a
|
|
normalization step in `internal/embed/embed.go` would make the cutover safer
|
|
and is cheap — but not blocking.
|
|
|
|
## Repro
|
|
|
|
```bash
|
|
./scripts/cutover/embed_parity.sh # default v1
|
|
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh # measure embedder
|
|
```
|
|
|
|
Each run drops a per-date verdict at `reports/cutover/embed_parity_<DATE>.md`.
|
|
|
|
## What's *not* yet probed
|
|
|
|
- `/v1/sql` ↔ Rust `/query` — query shape parity
|
|
- `/v1/vectors/search` ↔ Rust `/vectors/search` — recall@k parity
|
|
- `/v1/matrix/retrieve` ↔ Rust `/vectors/hybrid` — semantic retrieve parity (highest-leverage)
|
|
- `/v1/storage/*` ↔ Rust `/storage/*` — direct S3 abstraction parity
|
|
- `/v1/chat` — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested
|
|
|
|
The matrix-retrieve probe is the next-highest leverage because it's
|
|
the actual user-facing retrieval path. Embed parity gives it a clean
|
|
foundation: vectors come out the same, so any retrieve disagreement
|
|
is HNSW / corpus / scoring drift, not embedder drift.
|