golangLAKEHOUSE/reports/cutover/audit_baselines_roundtrip.md
root 0d4f033b34 audit_baselines: round-trip validation against live Rust data
Same shape of proof as embed_parity.sh for the embed endpoint:
take the just-shipped Go port (ca142b9) and validate it against
the actual production data the Rust legacy emits, not just unit-
test fixtures. Locks the cross-runtime parity that operators
running mixed pipelines depend on.

scripts/cutover/audit_baselines_validate.go:
- Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl
- Parses every entry via the Go AuditBaseline struct
- Round-trips the last entry: encode → decode → field-by-field
  equality check (catches any silently-dropped JSON keys)
- Calls LoadLastBaseline against the live file (proves the public
  API works on real shapes, not just inline parsing)
- Computes BuildAuditDriftTable(first → last) — full-window
  lineage drift over the captured baselines

Live-data probe results (reports/cutover/audit_baselines_roundtrip.md):
- 7 entries parse without error
- Round-trip is byte-equal on every metric + every header field
- Drift table fires the expected verdicts:
  - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold)
  - p3_accepted/partial/rejected/human 0→non-zero → warn (the
    zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline
    was designed to lock — verified now firing on real history)
  - p4_* metrics +0% → ok (stable across the window)

What this does NOT prove (documented in the report): the Go-side
audit-FULL pipeline that PRODUCES baselines doesn't exist yet.
Only the load/append/drift substrate is ported. Operators running
audit-full from Go would still need a metric-collection pass —
that's a separate port deliberately not in this wave.

reports/cutover/SUMMARY.md gains a new row alongside the embed
parity entries; cutover-prep verification log keeps the
discipline of "verified against real data, not just fixtures."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:20:18 -05:00

84 lines
3.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Audit-baselines port — round-trip validation against live Rust data
Proves the Go port at `internal/distillation/audit_baseline.go`
parses, round-trips, and produces meaningful drift signal against
the live Rust-side `data/_kb/audit_baselines.jsonl`. Same shape of
proof as `embed_parity.sh` for the embed endpoint earlier in the
session — port verified against real-shape data, not just fixtures.
## Verdict
**PASS.** The Go port reads the live file end-to-end. JSON
round-trip is byte-equal on every field. `BuildAuditDriftTable`
produces the expected verdict tiers when fed real-history data.
## Live-data probe output
```
loaded 7 baselines from /home/profit/lakehouse/data/_kb/audit_baselines.jsonl
✓ round-trip parity (encode → decode → match)
✓ LoadLastBaseline returns the most recent entry
Lineage drift: first (2026-04-27T04:47:30.220Z) → last (2026-04-27T15:43:38.019Z)
span: 7 entries
metric baseline current Δ% flag
p2_evidence_rows 12 82 +583.3% warn
p2_evidence_skips 2 2 +0.0% ok
p3_accepted 0 386 - warn
p3_human 0 480 - warn
p3_partial 0 132 - warn
p3_rejected 0 57 - warn
p4_pref_pairs 83 83 +0.0% ok
p4_rag_rows 448 448 +0.0% ok
p4_sft_rows 353 353 +0.0% ok
p4_total_quarantined 1325 1325 +0.0% ok
verdict: 5/10 metrics flagged warn, 0 first-run
```
## What this confirms
1. **Field-name parity is exact.** All 10 metric fields decode
into the Go `AuditBaseline.Metrics map[string]int64` shape; no
silently-dropped keys.
2. **Header fields map cleanly.** `recorded_at` + `git_commit` are
the only non-Metrics fields in the Rust shape, both already
present on the Go struct.
3. **The zero-baseline edge case fires correctly.** `p3_accepted`
went 0→386 between first and last baseline — a metric that
didn't exist in the early window. The drift table flagged it
`warn` (zero→nonzero is always notable) without throwing on
the division-by-zero. This was the specific case
`TestBuildAuditDriftTable_ZeroBaseline` was designed to lock,
and it's hitting the real-data behavior I wanted.
4. **The +583% drift on `p2_evidence_rows` is honest signal.** The
pipeline scaled from 12 to 82 evidence rows over the captured
window — well above the 20% warn threshold. Operator running
this in CI would see "the audit pipeline output 7× more
evidence than baseline; investigate" — which is exactly the
point of audit_baselines.
## Repro
```bash
go run ./scripts/cutover/audit_baselines_validate
# Or override path:
go run ./scripts/cutover/audit_baselines_validate \
-path /path/to/audit_baselines.jsonl
```
## What this does NOT prove
- The Go-side audit-FULL pipeline that PRODUCES baselines doesn't
exist yet — only the load/append/drift substrate. Operators
running audit-full from Go would still need a metric-collection
pass equivalent to the Rust `auditPhase0..auditPhase7` chain.
That's a separate port, deliberately not in this wave.
- The `git_commit` field carries Rust git history (commits like
`ca7375ea` from the Rust legacy repo). A Go-side audit-full
would stamp `golangLAKEHOUSE` SHAs. The two are separate
lineages — the file format is shared, but the git-commit
references trace back to whichever repo emitted the entry.