root ee2a40c505 audit-FULL: port phases 1/2/5/7 — only acceptance.ts (TS-only) remains skipped
Closes 4 of the 5 phases the initial audit-FULL port left as
deferred. The pattern: most "deferred" phases didn't actually need
the un-ported Rust pieces — they were observer-mode by design and
just needed to read existing on-disk artifacts.

Phase 1 (schema validators) → ported via exec.Command:
  Invokes `go test ./internal/distillation/...` — the Go equivalent
  of Rust's `bun test auditor/schemas/distillation/`. New
  GoTestModule field on AuditFullOptions controls the package
  pattern; empty disables the invocation (test mode, prevents
  recursion when audit-full is invoked from inside `go test`).

Phase 2 (evidence materialization) → ported as observer:
  Reads data/evidence/ directly and tallies rows + tier-1 source
  hits. Doesn't re-run the materializer (which is Rust-side TS).
  Emits p2_evidence_rows + p2_evidence_skips metrics matching
  Rust shape — drop-in audit_baselines.jsonl entries possible.

Phase 5 (run summary) → ported as observer:
  Reads reports/distillation/{run_id}/summary.json + 5 stage
  receipts. Validates schema_version=1, run_hash sha256, git_commit
  40-char hex, all stage receipts decode as JSON. Full schema
  validation (StageReceipt schema) is intentionally NOT ported —
  it would require porting the TS schemas/distillation/ validators
  in full; basic shape checks catch the load-bearing invariants.

Phase 7 (replay log) → ported as observer:
  Reads data/_kb/replay_runs.jsonl, validates last 50 rows parse
  as JSON. Skips the live-replay invocation that Rust's phase 7
  also does — porting Rust replay.ts is substantial and not in
  scope. The "log shape sanity" check is what audit-full actually
  needs; the live invocation is a separate concern.

Phase 6 (acceptance gate) — STILL SKIPPED:
  Rust acceptance.ts is a TS-only fixture harness with bun-specific
  deps. Porting the fixtures (tests/fixtures/distillation/acceptance/)
  + the 22-invariant runner to Go is an ADR-worth undertaking.
  Documented in the header comment.

Live-data probe (against /home/profit/lakehouse):
  Skips count: 4 → 1 (only phase 6).
  Required checks: 6/6 → 12/12 PASS.
  New metric: p2_evidence_rows=1055, BYTE-EQUAL to the Rust
  pipeline's collect.records_out from the latest summary.json.
  Cross-runtime parity now extends across phases 0/1/2/3/4/5/7.

6 new tests:
- TestPhase2_EvidenceTallyFromOnDisk: row + tier-1-hit tallying
- TestPhase5_FullSummaryFlow: complete run-summary fixture passes
- TestPhase5_ShortRunHashCaught: bad run_hash fails required check
- TestPhase7_ReplayLogReadsFromDisk: row-count reporting
- TestPhase7_MalformedTailRowsCaught: structural parse failure
- TestRunAuditFull_FullFixtureFlow updated to seed evidence/ +
  reports/distillation/ for the phases now wired.

Cleanup: removed local sortStrings helper (replaced with sort.Strings
now that `sort` is imported for phase 5's mtime-sort).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:35:13 -05:00

4.2 KiB

G5 cutover prep — verified-parity log

What works on Go gateway, what's been side-by-side compared to Rust, what's safe to flip. Append a row when a new endpoint clears parity.

Endpoint Date Rust path Go path Verdict Notes
embed (forced v1) 2026-04-30 /ai/embed /v1/embed PASS 5/5 cos=1.000 bit-identical with model=nomic-embed-text forced both sides
embed (forced v2-moe) 2026-04-30 /ai/embed /v1/embed PASS 5/5 cos=1.000 bit-identical with model=nomic-embed-text-v2-moe forced both sides — both Ollamas have the model
audit_baselines.jsonl 2026-05-01 data/_kb/audit_baselines.jsonl internal/distillation LoadLastBaseline / AppendBaseline / BuildAuditDriftTable PASS round-trip Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See audit_baselines_roundtrip.md.
audit-FULL (phases 0/3/4) 2026-05-01 scripts/distillation/audit_full.ts cmd/audit_full + internal/distillation RunAuditFull PASS metric-equal Go-side run against live Rust root: all 8 ported metrics (p3_, p4_) byte-equal to the last Rust-emitted audit_baselines.jsonl entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See audit_full_go_vs_rust.md.
audit-FULL (phases 0/1/2/3/4/5/7 — observer mode) 2026-05-01 scripts/distillation/audit_full.ts cmd/audit_full + internal/distillation RunAuditFull PASS 12/12 Skips reduced from 4 → 1: phase 1 invokes go test, phases 2/5/7 read existing artifacts as observers (no live materializer/replay invocation). Only phase 6 (TS-only acceptance harness) remains skipped. p2_evidence_rows=1055 matches Rust summary.json collect.records_out=1055 byte-equal. Updated audit_full_go_vs_rust.md.

Wire-format drift catalog

The Go gateway is not a literal nginx-swap drop-in for the Rust gateway. Anything that flips needs a wire-shape adapter. Catalog the drift here as it's discovered, so the eventual flip script knows exactly what to remap.

embed

Field Rust Go
URL prefix /ai/embed /v1/embed
Response: vectors field embeddings vectors
Response: dim field dimensions dimension
Response: model field model model ✓ same
Request shape {texts, model?} {texts, model?} ✓ same
L2 normalization unit vectors (‖v‖ ≈ 1.0) raw Ollama output (‖v‖ ≈ 20-23)

The L2 normalization difference is real but currently harmless: vectors point in identical directions (cos=1.000) but Go has raw magnitudes. Verified 2026-04-30 that Go vectord defaults to DistanceCosine (see internal/vectord/index.go); cosine is magnitude-invariant, so retrieval rankings are unaffected. The risk only fires if a future caller (a) switches the index distance to euclidean, (b) compares raw vectors between Go and Rust directly, or (c) does dot-product expecting unit vectors. Adding a normalization step in internal/embed/embed.go would make the cutover safer and is cheap — but not blocking.

Repro

./scripts/cutover/embed_parity.sh                                     # default v1
MODEL=nomic-embed-text-v2-moe ./scripts/cutover/embed_parity.sh       # measure embedder

Each run drops a per-date verdict at reports/cutover/embed_parity_<DATE>.md.

What's not yet probed

  • /v1/sql ↔ Rust /query — query shape parity
  • /v1/vectors/search ↔ Rust /vectors/search — recall@k parity
  • /v1/matrix/retrieve ↔ Rust /vectors/hybrid — semantic retrieve parity (highest-leverage)
  • /v1/storage/* ↔ Rust /storage/* — direct S3 abstraction parity
  • /v1/chat — both sides expose this, but providers + token shape differ; Phase 4 already declared chatd parity-tested

The matrix-retrieve probe is the next-highest leverage because it's the actual user-facing retrieval path. Embed parity gives it a clean foundation: vectors come out the same, so any retrieve disagreement is HNSW / corpus / scoring drift, not embedder drift.