root 55b8c76a8c distillation: audit-FULL pipeline port (phases 0/3/4) — cross-runtime metric parity verified

Ports the metric-collection passes from scripts/distillation/audit_full.ts.
The substrate that PRODUCES audit_baselines.jsonl entries — the
half OPEN #2 left as "deferred to next wave" after the read/write
substrate landed in ca142b9.

Phase coverage:
  Phase 0 (file presence)             ported
  Phase 1 (schema validators)         skipped (Go's `go test` covers it)
  Phase 2 (materializer dry-run)      deferred (Go materializer not yet ported)
  Phase 3 (scored-runs distribution)  ported
  Phase 4 (contamination firewall)    ported
  Phase 5 (receipts validation)       deferred (Go run-summary JSON not yet emitted)
  Phase 6 (replay sanity)             deferred (Go replay tool not ported)
  Phase 7 (run summary lineage)       deferred (same)

Cross-runtime parity verified end-to-end:
  Go-side audit-full against /home/profit/lakehouse produced
  metrics IDENTICAL to the last Rust-emitted audit_baselines.jsonl
  entry. All 8 ported metrics match byte-for-byte:
    p3_accepted=386, p3_partial=132, p3_rejected=57, p3_human=480,
    p4_sft_rows=353, p4_rag_rows=448, p4_pref_pairs=83, p4_total_quarantined=1325
  6/6 required checks pass on live data.

Components:
- internal/distillation/audit_full.go: PhaseCheck struct (mirrors
  Rust shape), PhaseCheckReport aggregation, RunAuditFull
  orchestrator, auditPhase0/3/4 implementations, FormatAuditFullReport
  Markdown writer.
- cmd/audit_full/main.go: CLI binary with -root, -out, -json,
  -append-baseline flags. Operators run "./bin/audit_full
  -append-baseline" to grow the longitudinal log alongside the
  Rust pipeline (entries are interchangeable — same envelope shape).
- 6 new tests: empty-root failure handling, full-fixture clean PASS
  (locks all 8 metrics + all 6 required checks), SFT firewall
  contamination detection, preference self-pair detection, sig_hash
  regex correctness (rejects wrong-length + uppercase), Markdown
  formatter smoke.

Live-data probe captured at reports/cutover/audit_full_go_vs_rust.md
(linked from reports/cutover/SUMMARY.md). Same shape as the
audit_baselines round-trip evidence — both Go-side ports of the
distillation surface are now validated against real Rust data, not
just fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 01:30:23 -05:00

3.7 KiB

Raw Blame History

G5 cutover prep — verified-parity log

What works on Go gateway, what's been side-by-side compared to Rust, what's safe to flip. Append a row when a new endpoint clears parity.

Endpoint	Date	Rust path	Go path	Verdict	Notes
`embed` (forced v1)	2026-04-30	`/ai/embed`	`/v1/embed`	✅ PASS 5/5 cos=1.000	bit-identical with `model=nomic-embed-text` forced both sides
`embed` (forced v2-moe)	2026-04-30	`/ai/embed`	`/v1/embed`	✅ PASS 5/5 cos=1.000	bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model
`audit_baselines.jsonl`	2026-05-01	`data/_kb/audit_baselines.jsonl`	`internal/distillation` `LoadLastBaseline` / `AppendBaseline` / `BuildAuditDriftTable`	✅ PASS round-trip	Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See `audit_baselines_roundtrip.md`.
`audit-FULL` (phases 0/3/4)	2026-05-01	`scripts/distillation/audit_full.ts`	`cmd/audit_full` + `internal/distillation` `RunAuditFull`	✅ PASS metric-equal	Go-side run against live Rust root: all 8 ported metrics (p3_, p4_) byte-equal to the last Rust-emitted `audit_baselines.jsonl` entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See `audit_full_go_vs_rust.md`.

Wire-format drift catalog

The Go gateway is not a literal nginx-swap drop-in for the Rust gateway. Anything that flips needs a wire-shape adapter. Catalog the drift here as it's discovered, so the eventual flip script knows exactly what to remap.

embed

Field	Rust	Go
URL prefix	`/ai/embed`	`/v1/embed`
Response: vectors field	`embeddings`	`vectors`
Response: dim field	`dimensions`	`dimension`
Response: model field	`model`	`model` ✓ same
Request shape	`{texts, model?}`	`{texts, model?}` ✓ same
L2 normalization	unit vectors (‖v‖ ≈ 1.0)	raw Ollama output (‖v‖ ≈ 20-23)

The L2 normalization difference is real but currently harmless: vectors point in identical directions (cos=1.000) but Go has raw magnitudes. Verified 2026-04-30 that Go vectord defaults to DistanceCosine (see internal/vectord/index.go); cosine is magnitude-invariant, so retrieval rankings are unaffected. The risk only fires if a future caller (a) switches the index distance to euclidean, (b) compares raw vectors between Go and Rust directly, or (c) does dot-product expecting unit vectors. Adding a normalization step in internal/embed/embed.go would make the cutover safer and is cheap — but not blocking.

Repro

./scripts/cutover/embed_parity.sh                                     # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh       # measure embedder

Each run drops a per-date verdict at reports/cutover/embed_parity_<DATE>.md.

What's not yet probed

/v1/sql ↔ Rust /query — query shape parity
/v1/vectors/search ↔ Rust /vectors/search — recall@k parity
/v1/matrix/retrieve ↔ Rust /vectors/hybrid — semantic retrieve parity (highest-leverage)
/v1/storage/* ↔ Rust /storage/* — direct S3 abstraction parity
/v1/chat — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested

The matrix-retrieve probe is the next-highest leverage because it's the actual user-facing retrieval path. Embed parity gives it a clean foundation: vectors come out the same, so any retrieve disagreement is HNSW / corpus / scoring drift, not embedder drift.

3.7 KiB Raw Blame History