root 0d4f033b34 audit_baselines: round-trip validation against live Rust data

Same shape of proof as embed_parity.sh for the embed endpoint:
take the just-shipped Go port (ca142b9) and validate it against
the actual production data the Rust legacy emits, not just unit-
test fixtures. Locks the cross-runtime parity that operators
running mixed pipelines depend on.

scripts/cutover/audit_baselines_validate.go:
- Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl
- Parses every entry via the Go AuditBaseline struct
- Round-trips the last entry: encode → decode → field-by-field
  equality check (catches any silently-dropped JSON keys)
- Calls LoadLastBaseline against the live file (proves the public
  API works on real shapes, not just inline parsing)
- Computes BuildAuditDriftTable(first → last) — full-window
  lineage drift over the captured baselines

Live-data probe results (reports/cutover/audit_baselines_roundtrip.md):
- 7 entries parse without error
- Round-trip is byte-equal on every metric + every header field
- Drift table fires the expected verdicts:
  - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold)
  - p3_accepted/partial/rejected/human 0→non-zero → warn (the
    zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline
    was designed to lock — verified now firing on real history)
  - p4_* metrics +0% → ok (stable across the window)

What this does NOT prove (documented in the report): the Go-side
audit-FULL pipeline that PRODUCES baselines doesn't exist yet.
Only the load/append/drift substrate is ported. Operators running
audit-full from Go would still need a metric-collection pass —
that's a separate port deliberately not in this wave.

reports/cutover/SUMMARY.md gains a new row alongside the embed
parity entries; cutover-prep verification log keeps the
discipline of "verified against real data, not just fixtures."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 00:20:18 -05:00

3.6 KiB

Raw Blame History

Audit-baselines port — round-trip validation against live Rust data

Proves the Go port at internal/distillation/audit_baseline.go parses, round-trips, and produces meaningful drift signal against the live Rust-side data/_kb/audit_baselines.jsonl. Same shape of proof as embed_parity.sh for the embed endpoint earlier in the session — port verified against real-shape data, not just fixtures.

Verdict

PASS. The Go port reads the live file end-to-end. JSON round-trip is byte-equal on every field. BuildAuditDriftTable produces the expected verdict tiers when fed real-history data.

Live-data probe output

loaded 7 baselines from /home/profit/lakehouse/data/_kb/audit_baselines.jsonl

✓ round-trip parity (encode → decode → match)
✓ LoadLastBaseline returns the most recent entry

Lineage drift: first (2026-04-27T04:47:30.220Z) → last (2026-04-27T15:43:38.019Z)
  span: 7 entries

metric                         baseline      current         Δ% flag
p2_evidence_rows                     12           82    +583.3% warn
p2_evidence_skips                     2            2      +0.0% ok
p3_accepted                           0          386          - warn
p3_human                              0          480          - warn
p3_partial                            0          132          - warn
p3_rejected                           0           57          - warn
p4_pref_pairs                        83           83      +0.0% ok
p4_rag_rows                         448          448      +0.0% ok
p4_sft_rows                         353          353      +0.0% ok
p4_total_quarantined               1325         1325      +0.0% ok

verdict: 5/10 metrics flagged warn, 0 first-run

What this confirms

Field-name parity is exact. All 10 metric fields decode into the Go AuditBaseline.Metrics map[string]int64 shape; no silently-dropped keys.
Header fields map cleanly. recorded_at + git_commit are the only non-Metrics fields in the Rust shape, both already present on the Go struct.
The zero-baseline edge case fires correctly. p3_accepted went 0→386 between first and last baseline — a metric that didn't exist in the early window. The drift table flagged it warn (zero→nonzero is always notable) without throwing on the division-by-zero. This was the specific case TestBuildAuditDriftTable_ZeroBaseline was designed to lock, and it's hitting the real-data behavior I wanted.
The +583% drift on p2_evidence_rows is honest signal. The pipeline scaled from 12 to 82 evidence rows over the captured window — well above the 20% warn threshold. Operator running this in CI would see "the audit pipeline output 7× more evidence than baseline; investigate" — which is exactly the point of audit_baselines.

Repro

go run ./scripts/cutover/audit_baselines_validate
# Or override path:
go run ./scripts/cutover/audit_baselines_validate \
  -path /path/to/audit_baselines.jsonl

What this does NOT prove

The Go-side audit-FULL pipeline that PRODUCES baselines doesn't exist yet — only the load/append/drift substrate. Operators running audit-full from Go would still need a metric-collection pass equivalent to the Rust auditPhase0..auditPhase7 chain. That's a separate port, deliberately not in this wave.
The git_commit field carries Rust git history (commits like ca7375ea from the Rust legacy repo). A Go-side audit-full would stamp golangLAKEHOUSE SHAs. The two are separate lineages — the file format is shared, but the git-commit references trace back to whichever repo emitted the entry.

3.6 KiB Raw Blame History Unescape Escape