golangLAKEHOUSE/reports/cutover/SUMMARY.md

# G5 cutover prep — verified-parity log

What works on Go gateway, what's been side-by-side compared to Rust,
what's safe to flip. Append a row when a new endpoint clears parity.

| Endpoint | Date | Rust path | Go path | Verdict | Notes |
|---|---|---|---|---|---|
| `embed` (forced v1)     | 2026-04-30 | `/ai/embed`              | `/v1/embed`              | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text` forced both sides |
| `embed` (forced v2-moe) | 2026-04-30 | `/ai/embed`              | `/v1/embed`              | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model |
| `audit_baselines.jsonl` | 2026-05-01 | `data/_kb/audit_baselines.jsonl` | `internal/distillation` `LoadLastBaseline` / `AppendBaseline` / `BuildAuditDriftTable` | ✅ PASS round-trip | Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See `audit_baselines_roundtrip.md`. |
| `audit-FULL` (phases 0/3/4) | 2026-05-01 | `scripts/distillation/audit_full.ts` | `cmd/audit_full` + `internal/distillation` `RunAuditFull` | ✅ PASS metric-equal | Go-side run against live Rust root: all 8 ported metrics (p3_*, p4_*) byte-equal to the last Rust-emitted `audit_baselines.jsonl` entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See `audit_full_go_vs_rust.md`. |
| `audit-FULL` (phases 0/1/2/3/4/5/7 — observer mode) | 2026-05-01 | `scripts/distillation/audit_full.ts` | `cmd/audit_full` + `internal/distillation` `RunAuditFull` | ✅ PASS 12/12 | Skips reduced from 4 → 1: phase 1 invokes `go test`, phases 2/5/7 read existing artifacts as observers (no live materializer/replay invocation). Only phase 6 (TS-only acceptance harness) remains skipped. `p2_evidence_rows=1055` matches Rust `summary.json` `collect.records_out=1055` byte-equal. Updated `audit_full_go_vs_rust.md`. |
| `audit_baselines.jsonl` write side | 2026-05-01 | `data/_kb/audit_baselines.jsonl` (Rust-emitted, 7 entries) | Go-emitted entry #8 via `cmd/audit_full -append-baseline` | ✅ Mixed-runtime log | First Go-side entry written to the shared longitudinal log: `git_commit=ee2a40c5...` (golangLAKEHOUSE SHA, distinguishable from prior Rust SHAs like `ca7375ea`). All 10 metric fields match Rust shape exactly — drift comparator fires correctly across the runtime boundary. |
| Full Go stack (persistent) | 2026-05-01 | per-binary on :31xx | 11 daemons (storaged/catalogd/ingestd/queryd/embedd/vectord/pathwayd/observerd/matrixd/gateway/chatd) | ✅ All 11 healthy | First time the Go stack runs as long-running daemons rather than per-harness transient processes. Brought up via `scripts/cutover/start_go_stack.sh`; gateway proxies `/v1/embed` correctly through to embedd; all 5 chatd providers loaded. Live alongside the Rust gateway on :3100 (no port conflict). |
| **G5 cutover slice live** | 2026-05-01 | (none — pure cutover) | Bun `/_go/*` → Go gateway `:4110` | ✅ End-to-end | First real Bun-frontend traffic to Go substrate. Rust legacy `mcp-server/index.ts` gains opt-in `/_go/*` pass-through driven by `GO_LAKEHOUSE_URL` env (systemd drop-in at `/etc/systemd/system/lakehouse-agent.service.d/go-cutover.conf`). `/_go/v1/embed` returns nomic-embed-text-v2-moe vectors; `/_go/v1/matrix/search` returns 3/3 Forklift Operators against persistent stack's 200-worker corpus. Reversible (unset env or revert systemd unit). See `g5_first_slice_live.md`. |
| **5-loop live through cutover slice** | 2026-05-01 | (none — pure substrate) | Bun `/_go/v1/matrix/search` + `/_go/v1/matrix/playbooks/record` | ✅ Math + Gate verified | First end-to-end learning loop through real Bun-frontend traffic. Cold dist 0.4449 → warm dist 0.2224 (BoostFactor=0.5 for score=1.0; 0.4449×0.5=0.2225 expected, 0.2224 observed — 4-decimal exact). Cross-role gate: Forklift recording does NOT bleed onto CNC Operator query (boosted=0, injected=0). Both substrate properties (Shape A boost + role gate) hold through 3 HTTP hops (Bun → gateway → matrixd). See `g5_first_loop_live.md`. |
| **Production load test** | 2026-05-01 | (none — pure load probe) | Bun `/_go/v1/matrix/search` + direct Go `:4110` | ✅ 0 errors / 101k req | Three runs, **zero correctness errors**. Direct-to-Go: 2,772 RPS @ p50 2.5ms / p99 8.5ms (production-grade). Via Bun: 484 RPS @ p50 4.6ms / p99 92ms (Bun event-loop is the bottleneck — 5.7× RPS hit, 11× p99 inflation; substrate itself is fine). For staffing-domain demand (<1 RPS typical), Bun-fronted has 480× headroom. See `g5_load_test.md`. |
| **Big load test (5K corpus, 200 bodies)** | 2026-05-01 | (none — pure load probe) | Direct Go `:4110/v1/matrix/search` + `:4110/v1/embed` | ✅ **0 errors / 5.87M req** | Concurrency sweep (10/50/100/200) + mixed embed+search workload. Peak: 8,114 RPS @ conc=200 (search). Mixed: 16,889 RPS combined. Saturation at conc=100+ — matrixd pegs 1 CPU core. **Total RSS ~370MB** across 11 daemons (40× lower than Rust 14.9G). matrixd identified as horizontal-scale target. See `g5_load_test_big.md`. |

## Wire-format drift catalog

The Go gateway is *not* a literal nginx-swap drop-in for the Rust
gateway. Anything that flips needs a wire-shape adapter. Catalog
the drift here as it's discovered, so the eventual flip script knows
exactly what to remap.

### embed

| Field | Rust | Go |
|---|---|---|
| URL prefix | `/ai/embed` | `/v1/embed` |
| Response: vectors field | `embeddings` | `vectors` |
| Response: dim field | `dimensions` | `dimension` |
| Response: model field | `model` | `model` ✓ same |
| Request shape | `{texts, model?}` | `{texts, model?}` ✓ same |
| L2 normalization | unit vectors (‖v‖ ≈ 1.0) | raw Ollama output (‖v‖ ≈ 20-23) |

**The L2 normalization difference is real but currently harmless:** vectors
point in identical directions (cos=1.000) but Go has raw magnitudes. Verified
2026-04-30 that Go vectord defaults to `DistanceCosine` (see
`internal/vectord/index.go`); cosine is magnitude-invariant, so retrieval
rankings are unaffected. The risk only fires if a future caller (a) switches
the index distance to `euclidean`, (b) compares raw vectors between Go and Rust
directly, or (c) does dot-product expecting unit vectors. Adding a
normalization step in `internal/embed/embed.go` would make the cutover safer
and is cheap — but not blocking.

## Repro

```bash
./scripts/cutover/embed_parity.sh                                     # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh       # measure embedder
```

Each run drops a per-date verdict at `reports/cutover/embed_parity_<DATE>.md`.

## What's *not* yet probed

- `/v1/sql` ↔ Rust `/query` — query shape parity
- `/v1/vectors/search` ↔ Rust `/vectors/search` — recall@k parity
- `/v1/matrix/retrieve` ↔ Rust `/vectors/hybrid` — semantic retrieve parity (highest-leverage)
- `/v1/storage/*` ↔ Rust `/storage/*` — direct S3 abstraction parity
- `/v1/chat` — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested

The matrix-retrieve probe is the next-highest leverage because it's
the actual user-facing retrieval path. Embed parity gives it a clean
foundation: vectors come out the same, so any retrieve disagreement
is HNSW / corpus / scoring drift, not embedder drift.