root 55b8c76a8c distillation: audit-FULL pipeline port (phases 0/3/4) — cross-runtime metric parity verified
Ports the metric-collection passes from scripts/distillation/audit_full.ts.
The substrate that PRODUCES audit_baselines.jsonl entries — the
half OPEN #2 left as "deferred to next wave" after the read/write
substrate landed in ca142b9.

Phase coverage:
  Phase 0 (file presence)             ported
  Phase 1 (schema validators)         skipped (Go's `go test` covers it)
  Phase 2 (materializer dry-run)      deferred (Go materializer not yet ported)
  Phase 3 (scored-runs distribution)  ported
  Phase 4 (contamination firewall)    ported
  Phase 5 (receipts validation)       deferred (Go run-summary JSON not yet emitted)
  Phase 6 (replay sanity)             deferred (Go replay tool not ported)
  Phase 7 (run summary lineage)       deferred (same)

Cross-runtime parity verified end-to-end:
  Go-side audit-full against /home/profit/lakehouse produced
  metrics IDENTICAL to the last Rust-emitted audit_baselines.jsonl
  entry. All 8 ported metrics match byte-for-byte:
    p3_accepted=386, p3_partial=132, p3_rejected=57, p3_human=480,
    p4_sft_rows=353, p4_rag_rows=448, p4_pref_pairs=83, p4_total_quarantined=1325
  6/6 required checks pass on live data.

Components:
- internal/distillation/audit_full.go: PhaseCheck struct (mirrors
  Rust shape), PhaseCheckReport aggregation, RunAuditFull
  orchestrator, auditPhase0/3/4 implementations, FormatAuditFullReport
  Markdown writer.
- cmd/audit_full/main.go: CLI binary with -root, -out, -json,
  -append-baseline flags. Operators run "./bin/audit_full
  -append-baseline" to grow the longitudinal log alongside the
  Rust pipeline (entries are interchangeable — same envelope shape).
- 6 new tests: empty-root failure handling, full-fixture clean PASS
  (locks all 8 metrics + all 6 required checks), SFT firewall
  contamination detection, preference self-pair detection, sig_hash
  regex correctness (rejects wrong-length + uppercase), Markdown
  formatter smoke.

Live-data probe captured at reports/cutover/audit_full_go_vs_rust.md
(linked from reports/cutover/SUMMARY.md). Same shape as the
audit_baselines round-trip evidence — both Go-side ports of the
distillation surface are now validated against real Rust data, not
just fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:30:23 -05:00

62 lines
3.7 KiB
Markdown

# G5 cutover prep — verified-parity log
What works on Go gateway, what's been side-by-side compared to Rust,
what's safe to flip. Append a row when a new endpoint clears parity.
| Endpoint | Date | Rust path | Go path | Verdict | Notes |
|---|---|---|---|---|---|
| `embed` (forced v1) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text` forced both sides |
| `embed` (forced v2-moe) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model |
| `audit_baselines.jsonl` | 2026-05-01 | `data/_kb/audit_baselines.jsonl` | `internal/distillation` `LoadLastBaseline` / `AppendBaseline` / `BuildAuditDriftTable` | ✅ PASS round-trip | Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See `audit_baselines_roundtrip.md`. |
| `audit-FULL` (phases 0/3/4) | 2026-05-01 | `scripts/distillation/audit_full.ts` | `cmd/audit_full` + `internal/distillation` `RunAuditFull` | ✅ PASS metric-equal | Go-side run against live Rust root: all 8 ported metrics (p3_*, p4_*) byte-equal to the last Rust-emitted `audit_baselines.jsonl` entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See `audit_full_go_vs_rust.md`. |
## Wire-format drift catalog
The Go gateway is *not* a literal nginx-swap drop-in for the Rust
gateway. Anything that flips needs a wire-shape adapter. Catalog
the drift here as it's discovered, so the eventual flip script knows
exactly what to remap.
### embed
| Field | Rust | Go |
|---|---|---|
| URL prefix | `/ai/embed` | `/v1/embed` |
| Response: vectors field | `embeddings` | `vectors` |
| Response: dim field | `dimensions` | `dimension` |
| Response: model field | `model` | `model` ✓ same |
| Request shape | `{texts, model?}` | `{texts, model?}` ✓ same |
| L2 normalization | unit vectors (‖v‖ ≈ 1.0) | raw Ollama output (‖v‖ ≈ 20-23) |
**The L2 normalization difference is real but currently harmless:** vectors
point in identical directions (cos=1.000) but Go has raw magnitudes. Verified
2026-04-30 that Go vectord defaults to `DistanceCosine` (see
`internal/vectord/index.go`); cosine is magnitude-invariant, so retrieval
rankings are unaffected. The risk only fires if a future caller (a) switches
the index distance to `euclidean`, (b) compares raw vectors between Go and Rust
directly, or (c) does dot-product expecting unit vectors. Adding a
normalization step in `internal/embed/embed.go` would make the cutover safer
and is cheap — but not blocking.
## Repro
```bash
./scripts/cutover/embed_parity.sh # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh # measure embedder
```
Each run drops a per-date verdict at `reports/cutover/embed_parity_<DATE>.md`.
## What's *not* yet probed
- `/v1/sql` ↔ Rust `/query` — query shape parity
- `/v1/vectors/search` ↔ Rust `/vectors/search` — recall@k parity
- `/v1/matrix/retrieve` ↔ Rust `/vectors/hybrid` — semantic retrieve parity (highest-leverage)
- `/v1/storage/*` ↔ Rust `/storage/*` — direct S3 abstraction parity
- `/v1/chat` — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested
The matrix-retrieve probe is the next-highest leverage because it's
the actual user-facing retrieval path. Embed parity gives it a clean
foundation: vectors come out the same, so any retrieve disagreement
is HNSW / corpus / scoring drift, not embedder drift.