root 0d4f033b34 audit_baselines: round-trip validation against live Rust data

Same shape of proof as embed_parity.sh for the embed endpoint:
take the just-shipped Go port (ca142b9) and validate it against
the actual production data the Rust legacy emits, not just unit-
test fixtures. Locks the cross-runtime parity that operators
running mixed pipelines depend on.

scripts/cutover/audit_baselines_validate.go:
- Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl
- Parses every entry via the Go AuditBaseline struct
- Round-trips the last entry: encode → decode → field-by-field
  equality check (catches any silently-dropped JSON keys)
- Calls LoadLastBaseline against the live file (proves the public
  API works on real shapes, not just inline parsing)
- Computes BuildAuditDriftTable(first → last) — full-window
  lineage drift over the captured baselines

Live-data probe results (reports/cutover/audit_baselines_roundtrip.md):
- 7 entries parse without error
- Round-trip is byte-equal on every metric + every header field
- Drift table fires the expected verdicts:
  - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold)
  - p3_accepted/partial/rejected/human 0→non-zero → warn (the
    zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline
    was designed to lock — verified now firing on real history)
  - p4_* metrics +0% → ok (stable across the window)

What this does NOT prove (documented in the report): the Go-side
audit-FULL pipeline that PRODUCES baselines doesn't exist yet.
Only the load/append/drift substrate is ported. Operators running
audit-full from Go would still need a metric-collection pass —
that's a separate port deliberately not in this wave.

reports/cutover/SUMMARY.md gains a new row alongside the embed
parity entries; cutover-prep verification log keeps the
discipline of "verified against real data, not just fixtures."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 00:20:18 -05:00

3.2 KiB

Raw Blame History

G5 cutover prep — verified-parity log

What works on Go gateway, what's been side-by-side compared to Rust, what's safe to flip. Append a row when a new endpoint clears parity.

Endpoint	Date	Rust path	Go path	Verdict	Notes
`embed` (forced v1)	2026-04-30	`/ai/embed`	`/v1/embed`	✅ PASS 5/5 cos=1.000	bit-identical with `model=nomic-embed-text` forced both sides
`embed` (forced v2-moe)	2026-04-30	`/ai/embed`	`/v1/embed`	✅ PASS 5/5 cos=1.000	bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model
`audit_baselines.jsonl`	2026-05-01	`data/_kb/audit_baselines.jsonl`	`internal/distillation` `LoadLastBaseline` / `AppendBaseline` / `BuildAuditDriftTable`	✅ PASS round-trip	Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See `audit_baselines_roundtrip.md`.

Wire-format drift catalog

The Go gateway is not a literal nginx-swap drop-in for the Rust gateway. Anything that flips needs a wire-shape adapter. Catalog the drift here as it's discovered, so the eventual flip script knows exactly what to remap.

embed

Field	Rust	Go
URL prefix	`/ai/embed`	`/v1/embed`
Response: vectors field	`embeddings`	`vectors`
Response: dim field	`dimensions`	`dimension`
Response: model field	`model`	`model` ✓ same
Request shape	`{texts, model?}`	`{texts, model?}` ✓ same
L2 normalization	unit vectors (‖v‖ ≈ 1.0)	raw Ollama output (‖v‖ ≈ 20-23)

The L2 normalization difference is real but currently harmless: vectors point in identical directions (cos=1.000) but Go has raw magnitudes. Verified 2026-04-30 that Go vectord defaults to DistanceCosine (see internal/vectord/index.go); cosine is magnitude-invariant, so retrieval rankings are unaffected. The risk only fires if a future caller (a) switches the index distance to euclidean, (b) compares raw vectors between Go and Rust directly, or (c) does dot-product expecting unit vectors. Adding a normalization step in internal/embed/embed.go would make the cutover safer and is cheap — but not blocking.

Repro

./scripts/cutover/embed_parity.sh                                     # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh       # measure embedder

Each run drops a per-date verdict at reports/cutover/embed_parity_<DATE>.md.

What's not yet probed

/v1/sql ↔ Rust /query — query shape parity
/v1/vectors/search ↔ Rust /vectors/search — recall@k parity
/v1/matrix/retrieve ↔ Rust /vectors/hybrid — semantic retrieve parity (highest-leverage)
/v1/storage/* ↔ Rust /storage/* — direct S3 abstraction parity
/v1/chat — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested

The matrix-retrieve probe is the next-highest leverage because it's the actual user-facing retrieval path. Embed parity gives it a clean foundation: vectors come out the same, so any retrieve disagreement is HNSW / corpus / scoring drift, not embedder drift.

3.2 KiB Raw Blame History