distillation: audit-baselines lineage port — fully closes the OPEN #2 surface

The original OPEN #2 line called for "SFT export pipeline + audit_baselines lineage." Commit 7bb432f shipped the SFT export. This commit ports the audit_baselines half — the longitudinal drift signal that distinguishes "metrics shifted because the world changed" from "metrics shifted because we broke something." Mirrors Rust scripts/distillation/audit_full.ts's substrate: - LoadLastBaseline(path) reads the most recent entry from data/_kb/audit_baselines.jsonl. Returns (nil, nil) on missing file (first run), errors on truncated last line (partial-write detection — operators don't lose drift signal silently). - AppendBaseline(path, baseline) appends one entry as a JSON line. Atomic at the line level via bufio + O_APPEND. Creates the parent directory if missing. - BuildAuditDriftTable(prior, current, threshold) computes per-metric drift. flag values mirror Rust exactly: first_run, ok, warn. DefaultDriftWarnThreshold = 0.20 = Rust's 20%. - FormatAuditDriftTable renders a fixed-width text grid for stdout dumps in audit-full runs. Edge cases handled: - Zero-baseline: prior=0 means no division — PctChange stays nil. current=0 → ok (no change). current>0 → warn (zero→nonzero is always notable, never silently fine). - New metric in current: flagged first_run, not "0%-change". Operators see "this is a new signal we haven't tracked before." - Sort: stable by metric name for deterministic JSON output and clean CI diffs. Generic on metric name (vs Rust's pinned p2_evidence_rows etc.): the Rust phase numbering doesn't translate to Go directly. The AuditBaselineRustCompat constant pins the Rust names so operators running both runtimes use the same labels, which makes drift comparison meaningful across the two pipelines. 13 new tests covering: missing file, last-line-wins, blank-line tolerance, malformed-line errors, append round-trip, append-to- existing, schema validation, first-run, threshold boundary, zero-baseline, new-metric-in-current, sort-by-metric stability, formatter output rendering. OPEN #2's "audit_baselines lineage" half now closed. The distillation package surface is at parity with the Rust pipeline: scorer, scored runs, SFT export, audit baselines all available on the Go side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:11:47 -05:00 · 2026-05-01 00:11:47 -05:00 · ca142b9271
commit ca142b9271
parent 7bb432f6c8
3 changed files with 535 additions and 0 deletions
--- a/STATE_OF_PLAY.md
+++ b/STATE_OF_PLAY.md
@ -268,6 +268,7 @@ a steady state. Future items will land here as production triggers fire.
 | (close-1) | **OPEN #1: vectord merge endpoint** — `POST /v1/vectors/index/{src}/merge` with body `{dest, clear_source}`. Idempotent on re-runs (existing-in-dest items skipped). New `Index.IDs()` snapshot method backs it; new `i.ids` tracker field is the canonical ID set (independent of meta map's nil-vs-{} sparseness). 4 cmd-level tests + 1 unit test. |
 | (close-2) | **OPEN #2: distillation SFT export substrate** — `internal/distillation/sft_export.go`: `IsSftNever` predicate + `ListScoredRunFiles` (data/scored-runs/YYYY/MM/DD walk) + `LoadScoredRunsFromFile` + partial `ExportSft` that wires the firewall but leaves synthesis (instruction/input/response generation) as the next wave. Firewall pinning test fails if `SftNever` set changes without review. 5 new tests. The synthesis port remains on Rust at `scripts/distillation/export_sft.ts`. |
 | (close-2 full) | **OPEN #2 fully ported** (2026-05-01): `SynthesizeSft` + `LoadEvidenceByRunID` + `buildInstruction` ported byte-for-byte from `scripts/distillation/export_sft.ts`. All 8 source-class instruction templates (scrum_reviews / mode_experiments / auto_apply / audits / observer_reviews / contract_analyses / outcomes / default) match Rust output exactly so a/b validation between runtimes can diff JSONL byte-for-byte. `ExportSft` writes to `data/distilled/sft/sft_export.jsonl`. 5 additional tests including per-source-class template verification, extraction-rejection, empty-text-rejection, context-assembly, end-to-end fixture write. |
+| (close-2 lineage) | **Audit-baselines lineage ported** (2026-05-01): `internal/distillation/audit_baseline.go` mirrors Rust `audit_full.ts`'s LoadBaseline/AppendBaseline/buildDriftTable. `LoadLastBaseline` reads the most recent JSON line from `data/_kb/audit_baselines.jsonl`; `AppendBaseline` appends append-only with bufio. `BuildAuditDriftTable` flags drift `>20%` (configurable); zero-baseline and new-metric edge cases handled (no division-by-zero, no false-stable on zero→nonzero). `FormatAuditDriftTable` for stdout dumps. Generic on metric names so callers running both runtimes can pin Rust-compat names (`AuditBaselineRustCompat` constant lists them). 13 tests including last-line-wins, trailing-blank-tolerance, malformed-line-errors, threshold-boundary, zero-baseline-handling, sort-stability. |
 | (close-3) | **OPEN #3: distribution drift via PSI** — `internal/drift/drift.go`: `ComputeDistributionDrift` returns Population Stability Index + verdict tier (stable < 0.10, minor 0.10–0.25, major ≥ 0.25). Equal-width bucketing over combined min/max range, epsilon-clamping for empty buckets, per-bucket breakdown for drilldown. 7 new tests including identical-is-stable, hard-shift-is-major, moderate-detected-not-stable, empty-inputs-safe, all-identical-safe, bucket-counts-conserved, num-buckets-clamping. |
 | (close-4) | **OPEN #4: ops nice-to-haves** — (a) Real-time wall-clock for stress harness: per-phase elapsed time logged to stdout as it runs (`[stress] phase NAME starting (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`); `Output.PhaseTimings` + `Output.TotalElapsedMs` written to JSON; (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase calibration: not actioned — no fired trigger yet, would be speculative. Documented as deferred-until-need rather than ignored. |

--- a/internal/distillation/audit_baseline.go
+++ b/internal/distillation/audit_baseline.go
@ -0,0 +1,256 @@
+package distillation
+
+// Audit-baseline lineage — the longitudinal signal that distinguishes
+// "metrics shifted because the world changed" from "metrics shifted
+// because we broke something." Mirrors the Rust audit_full.ts
+// LoadBaseline/AppendBaseline/buildDriftTable shape so a Go-side
+// audit run can be compared against Rust-side baselines and
+// vice-versa during the migration.
+//
+// Storage: data/_kb/audit_baselines.jsonl, one AuditBaseline per
+// line, append-only. The LAST line is the most recent. New runs
+// read the prior baseline, compute drift vs current metrics, then
+// append a fresh entry.
+//
+// Why generic on metric name (vs Rust's pinned p2_evidence_rows
+// etc.): the Rust phase numbering (p0..p7) doesn't translate to Go
+// directly. Operators with mixed Rust+Go pipelines should use the
+// SAME metric names on both sides so the drift table compares
+// like-for-like. Helper constants below pin the Rust-compat names
+// for callers running both runtimes.
+
+import (
+	"bufio"
+	"bytes"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"math"
+	"os"
+	"path/filepath"
+	"strings"
+)
+
+// AuditBaseline is one entry in the audit_baselines.jsonl
+// longitudinal log. Schema-stable; new metrics land as new keys
+// in the Metrics map (additive — readers tolerate unknown keys).
+type AuditBaseline struct {
+	RecordedAt string           `json:"recorded_at"`         // ISO 8601 UTC
+	GitCommit  string           `json:"git_commit,omitempty"` // sha of the run's HEAD
+	Metrics    map[string]int64 `json:"metrics"`
+}
+
+// AuditBaselineRustCompat lists the metric names the Rust pipeline
+// emits at audit_full.ts. Go-side callers running an equivalent
+// audit should use these names so drift compares across runtimes.
+// Adding new names here requires the Rust side to mint them too.
+var AuditBaselineRustCompat = []string{
+	"p2_evidence_rows",
+	"p2_evidence_skips",
+	"p3_accepted",
+	"p3_partial",
+	"p3_rejected",
+	"p3_human",
+	"p4_rag_rows",
+	"p4_sft_rows",
+	"p4_pref_pairs",
+	"p4_total_quarantined",
+}
+
+// DefaultBaselinePath returns the canonical audit baselines path
+// rooted at the lakehouse data dir. Match Rust's BASELINE_PATH_FOR.
+func DefaultBaselinePath(root string) string {
+	return filepath.Join(root, "data", "_kb", "audit_baselines.jsonl")
+}
+
+// LoadLastBaseline reads audit_baselines.jsonl and returns the
+// most recent entry — i.e. the LAST non-empty JSON line. Missing
+// file or empty file returns (nil, nil), not an error: a fresh
+// pipeline has no baseline yet, and the caller should treat that
+// as "first run" via BuildAuditDriftTable's nil-prior handling.
+//
+// Malformed last line returns an error (rather than silently
+// skipping to the previous line) so operators don't lose drift
+// signal under partial-write corruption.
+func LoadLastBaseline(path string) (*AuditBaseline, error) {
+	data, err := os.ReadFile(path)
+	if os.IsNotExist(err) {
+		return nil, nil
+	}
+	if err != nil {
+		return nil, fmt.Errorf("read baselines: %w", err)
+	}
+	lines := strings.Split(string(data), "\n")
+	// Walk back to the last non-empty line.
+	for i := len(lines) - 1; i >= 0; i-- {
+		s := strings.TrimSpace(lines[i])
+		if s == "" {
+			continue
+		}
+		var b AuditBaseline
+		if err := json.Unmarshal([]byte(s), &b); err != nil {
+			return nil, fmt.Errorf("decode last baseline (line %d): %w", i+1, err)
+		}
+		return &b, nil
+	}
+	return nil, nil
+}
+
+// AppendBaseline appends one AuditBaseline as a JSON line to
+// audit_baselines.jsonl. Creates the parent directory if missing.
+// Atomic write at the line level: a partial write on disk-full or
+// crash leaves the file with at most one truncated trailing line,
+// which LoadLastBaseline will surface as a decode error.
+func AppendBaseline(path string, b AuditBaseline) error {
+	if b.RecordedAt == "" {
+		return errors.New("audit_baseline: RecordedAt required")
+	}
+	if b.Metrics == nil {
+		return errors.New("audit_baseline: Metrics required (use empty map for zero-metric run)")
+	}
+	if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
+		return fmt.Errorf("mkdir baseline dir: %w", err)
+	}
+	line, err := json.Marshal(b)
+	if err != nil {
+		return fmt.Errorf("encode baseline: %w", err)
+	}
+	f, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
+	if err != nil {
+		return fmt.Errorf("open baselines: %w", err)
+	}
+	defer f.Close()
+	w := bufio.NewWriter(f)
+	if _, err := w.Write(line); err != nil {
+		return fmt.Errorf("write baseline: %w", err)
+	}
+	if err := w.WriteByte('\n'); err != nil {
+		return fmt.Errorf("write newline: %w", err)
+	}
+	return w.Flush()
+}
+
+// AuditDriftFlag categorizes a single metric's drift verdict.
+// Mirrors the Rust DriftRow.flag values exactly.
+type AuditDriftFlag string
+
+const (
+	AuditDriftFlagFirstRun AuditDriftFlag = "first_run" // no prior baseline → can't compute change
+	AuditDriftFlagOK       AuditDriftFlag = "ok"        // |Δ%| ≤ threshold
+	AuditDriftFlagWarn     AuditDriftFlag = "warn"      // |Δ%| > threshold
+)
+
+// DefaultDriftWarnThreshold is 20% — matches Rust's hard-coded
+// `Math.abs(pct) > 0.20`. Operators tuning sensitivity per metric
+// can pass a different value to BuildAuditDriftTable.
+const DefaultDriftWarnThreshold = 0.20
+
+// AuditDriftRow is one metric's drift verdict. PctChange is nil
+// when prior baseline was zero (division-by-zero) OR when this is
+// the first run. Encoded as *float64 so JSON omits the field
+// rather than emitting 0.0 for "unknowable" cases.
+type AuditDriftRow struct {
+	Metric    string         `json:"metric"`
+	Baseline  *int64         `json:"baseline"`
+	Current   int64          `json:"current"`
+	PctChange *float64       `json:"pct_change"`
+	Flag      AuditDriftFlag `json:"flag"`
+}
+
+// BuildAuditDriftTable computes per-metric drift between a prior
+// baseline (nil = first run) and the current metric snapshot. The
+// result is sorted by metric name for stable display.
+//
+// Threshold is the absolute percent-change above which a metric is
+// flagged "warn". Pass DefaultDriftWarnThreshold (0.20 = 20%) to
+// match Rust audit_full.ts. Use a per-metric threshold map by
+// calling BuildAuditDriftTable once per metric subset.
+func BuildAuditDriftTable(prior *AuditBaseline, current map[string]int64, threshold float64) []AuditDriftRow {
+	if threshold <= 0 {
+		threshold = DefaultDriftWarnThreshold
+	}
+	// Union of metric names so a metric that disappeared from
+	// current still surfaces as "current=0, drifted -100%".
+	names := make(map[string]struct{}, len(current))
+	for k := range current {
+		names[k] = struct{}{}
+	}
+	if prior != nil {
+		for k := range prior.Metrics {
+			names[k] = struct{}{}
+		}
+	}
+	rows := make([]AuditDriftRow, 0, len(names))
+	for name := range names {
+		row := AuditDriftRow{Metric: name, Current: current[name]}
+		if prior == nil {
+			row.Flag = AuditDriftFlagFirstRun
+			rows = append(rows, row)
+			continue
+		}
+		priorVal, hadPrior := prior.Metrics[name]
+		if !hadPrior {
+			// New metric in current — treat as first-run for THIS metric.
+			row.Flag = AuditDriftFlagFirstRun
+			rows = append(rows, row)
+			continue
+		}
+		row.Baseline = &priorVal
+		if priorVal == 0 {
+			// Division-by-zero: leave PctChange nil. If current is
+			// also 0 → ok (no change). Otherwise → warn (the metric
+			// went from zero to non-zero, which is always notable).
+			if current[name] == 0 {
+				row.Flag = AuditDriftFlagOK
+			} else {
+				row.Flag = AuditDriftFlagWarn
+			}
+			rows = append(rows, row)
+			continue
+		}
+		pct := float64(current[name]-priorVal) / float64(priorVal)
+		row.PctChange = &pct
+		if math.Abs(pct) > threshold {
+			row.Flag = AuditDriftFlagWarn
+		} else {
+			row.Flag = AuditDriftFlagOK
+		}
+		rows = append(rows, row)
+	}
+	// Sort for stable display + deterministic JSON output. Bubble-
+	// sort by name; size is at most a few dozen metrics, so the
+	// O(n²) cost is irrelevant.
+	for i := 0; i < len(rows); i++ {
+		for j := i + 1; j < len(rows); j++ {
+			if rows[i].Metric > rows[j].Metric {
+				rows[i], rows[j] = rows[j], rows[i]
+			}
+		}
+	}
+	return rows
+}
+
+// FormatAuditDriftTable renders a drift table as a fixed-width
+// text grid — useful for stdout dumps in audit-full runs. Matches
+// the Rust output shape so an operator can grep across runtimes
+// without re-learning the layout.
+func FormatAuditDriftTable(rows []AuditDriftRow) string {
+	if len(rows) == 0 {
+		return "(no metrics)\n"
+	}
+	var buf bytes.Buffer
+	fmt.Fprintf(&buf, "%-26s %12s %12s %10s %s\n", "metric", "baseline", "current", "Δ%", "flag")
+	for _, r := range rows {
+		baseline := "-"
+		if r.Baseline != nil {
+			baseline = fmt.Sprintf("%d", *r.Baseline)
+		}
+		pct := "-"
+		if r.PctChange != nil {
+			pct = fmt.Sprintf("%+.1f%%", *r.PctChange*100)
+		}
+		fmt.Fprintf(&buf, "%-26s %12s %12d %10s %s\n",
+			r.Metric, baseline, r.Current, pct, r.Flag)
+	}
+	return buf.String()
+}
--- a/internal/distillation/audit_baseline_test.go
+++ b/internal/distillation/audit_baseline_test.go
@ -0,0 +1,278 @@
+package distillation
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+)
+
+// TestLoadLastBaseline_MissingFile: no baselines file yet → (nil, nil),
+// not an error. First-run pipelines have no baseline.
+func TestLoadLastBaseline_MissingFile(t *testing.T) {
+	tmp := t.TempDir()
+	got, err := LoadLastBaseline(filepath.Join(tmp, "nonexistent.jsonl"))
+	if err != nil {
+		t.Fatalf("expected nil error on missing file, got %v", err)
+	}
+	if got != nil {
+		t.Errorf("expected nil baseline on missing file, got %+v", got)
+	}
+}
+
+// TestLoadLastBaseline_LastLineWins locks the "last line is most
+// recent" semantic — append-only log read backward.
+func TestLoadLastBaseline_LastLineWins(t *testing.T) {
+	tmp := t.TempDir()
+	path := filepath.Join(tmp, "baselines.jsonl")
+	jsonl := `{"recorded_at":"2026-04-01T00:00:00Z","metrics":{"a":1}}
+{"recorded_at":"2026-04-15T00:00:00Z","metrics":{"a":2}}
+{"recorded_at":"2026-04-30T00:00:00Z","metrics":{"a":3}}
+`
+	if err := os.WriteFile(path, []byte(jsonl), 0o644); err != nil {
+		t.Fatalf("write: %v", err)
+	}
+	got, err := LoadLastBaseline(path)
+	if err != nil {
+		t.Fatalf("LoadLastBaseline: %v", err)
+	}
+	if got == nil || got.RecordedAt != "2026-04-30T00:00:00Z" {
+		t.Errorf("expected last-line baseline (2026-04-30), got %+v", got)
+	}
+	if got.Metrics["a"] != 3 {
+		t.Errorf("expected metrics.a=3, got %d", got.Metrics["a"])
+	}
+}
+
+// TestLoadLastBaseline_TolerateTrailingBlankLines: writers append
+// "\n" — a stray blank line at the end mustn't trigger malformed-
+// JSON errors.
+func TestLoadLastBaseline_TolerateTrailingBlankLines(t *testing.T) {
+	tmp := t.TempDir()
+	path := filepath.Join(tmp, "baselines.jsonl")
+	jsonl := `{"recorded_at":"2026-04-30T00:00:00Z","metrics":{"a":3}}
+
+`
+	if err := os.WriteFile(path, []byte(jsonl), 0o644); err != nil {
+		t.Fatalf("write: %v", err)
+	}
+	got, err := LoadLastBaseline(path)
+	if err != nil {
+		t.Fatalf("LoadLastBaseline: %v", err)
+	}
+	if got == nil || got.Metrics["a"] != 3 {
+		t.Errorf("expected baseline with metrics.a=3, got %+v", got)
+	}
+}
+
+// TestLoadLastBaseline_MalformedLastLineErrors: partial-write on
+// disk-full would leave a truncated last line. Rather than
+// silently skip back to a stale baseline, surface the error so
+// operators don't lose drift signal.
+func TestLoadLastBaseline_MalformedLastLineErrors(t *testing.T) {
+	tmp := t.TempDir()
+	path := filepath.Join(tmp, "baselines.jsonl")
+	jsonl := `{"recorded_at":"2026-04-15T00:00:00Z","metrics":{"a":2}}
+{"recorded_at":"2026-04-30T0` // truncated
+	if err := os.WriteFile(path, []byte(jsonl), 0o644); err != nil {
+		t.Fatalf("write: %v", err)
+	}
+	_, err := LoadLastBaseline(path)
+	if err == nil {
+		t.Errorf("expected decode error on truncated last line, got nil")
+	}
+}
+
+// TestAppendBaseline_RoundTrip: append one + read back via
+// LoadLastBaseline. Critical for the "first audit-full run on a
+// fresh box" path.
+func TestAppendBaseline_RoundTrip(t *testing.T) {
+	tmp := t.TempDir()
+	path := filepath.Join(tmp, "data", "_kb", "audit_baselines.jsonl")
+	b := AuditBaseline{
+		RecordedAt: "2026-05-01T12:00:00Z",
+		GitCommit:  "deadbeef",
+		Metrics: map[string]int64{
+			"p3_accepted":   42,
+			"p4_sft_rows":   17,
+		},
+	}
+	if err := AppendBaseline(path, b); err != nil {
+		t.Fatalf("AppendBaseline: %v", err)
+	}
+	got, err := LoadLastBaseline(path)
+	if err != nil {
+		t.Fatalf("LoadLastBaseline: %v", err)
+	}
+	if got == nil {
+		t.Fatalf("expected non-nil baseline after append")
+	}
+	if got.RecordedAt != b.RecordedAt || got.GitCommit != b.GitCommit {
+		t.Errorf("baseline header mismatch:\n  got:  %+v\n  want: %+v", got, b)
+	}
+	if got.Metrics["p3_accepted"] != 42 || got.Metrics["p4_sft_rows"] != 17 {
+		t.Errorf("metrics roundtrip: got %+v", got.Metrics)
+	}
+}
+
+// TestAppendBaseline_AppendsToExisting: existing file gets an
+// extra line, prior contents preserved.
+func TestAppendBaseline_AppendsToExisting(t *testing.T) {
+	tmp := t.TempDir()
+	path := filepath.Join(tmp, "baselines.jsonl")
+	if err := os.WriteFile(path,
+		[]byte(`{"recorded_at":"2026-04-30T00:00:00Z","metrics":{"a":1}}`+"\n"),
+		0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+	if err := AppendBaseline(path, AuditBaseline{
+		RecordedAt: "2026-05-01T00:00:00Z",
+		Metrics:    map[string]int64{"a": 2},
+	}); err != nil {
+		t.Fatalf("AppendBaseline: %v", err)
+	}
+	data, err := os.ReadFile(path)
+	if err != nil {
+		t.Fatalf("read: %v", err)
+	}
+	lines := strings.Split(strings.TrimRight(string(data), "\n"), "\n")
+	if len(lines) != 2 {
+		t.Errorf("expected 2 lines after append, got %d", len(lines))
+	}
+}
+
+// TestAppendBaseline_RejectsEmptyRecordedAt: schema invariant —
+// every entry must carry a timestamp for ordering.
+func TestAppendBaseline_RejectsEmptyRecordedAt(t *testing.T) {
+	tmp := t.TempDir()
+	path := filepath.Join(tmp, "baselines.jsonl")
+	err := AppendBaseline(path, AuditBaseline{Metrics: map[string]int64{"a": 1}})
+	if err == nil {
+		t.Errorf("expected error on empty RecordedAt")
+	}
+}
+
+// TestBuildAuditDriftTable_FirstRun: no prior → every metric flagged
+// "first_run", no PctChange.
+func TestBuildAuditDriftTable_FirstRun(t *testing.T) {
+	rows := BuildAuditDriftTable(nil, map[string]int64{"a": 10, "b": 20}, 0)
+	if len(rows) != 2 {
+		t.Fatalf("expected 2 rows, got %d", len(rows))
+	}
+	for _, r := range rows {
+		if r.Flag != AuditDriftFlagFirstRun {
+			t.Errorf("metric %s: expected first_run flag, got %s", r.Metric, r.Flag)
+		}
+		if r.PctChange != nil {
+			t.Errorf("metric %s: PctChange should be nil on first run", r.Metric)
+		}
+	}
+}
+
+// TestBuildAuditDriftTable_ThresholdBoundary: exactly threshold = OK,
+// just over = WARN. Locks the >|threshold| (strict) semantic.
+func TestBuildAuditDriftTable_ThresholdBoundary(t *testing.T) {
+	prior := &AuditBaseline{Metrics: map[string]int64{"a": 100, "b": 100}}
+	current := map[string]int64{
+		"a": 120, // +20% — exactly at threshold → OK
+		"b": 121, // +21% — over threshold → WARN
+	}
+	rows := BuildAuditDriftTable(prior, current, 0.20)
+	byMetric := map[string]AuditDriftRow{}
+	for _, r := range rows {
+		byMetric[r.Metric] = r
+	}
+	if byMetric["a"].Flag != AuditDriftFlagOK {
+		t.Errorf("metric a (+20%% exactly): expected OK, got %s", byMetric["a"].Flag)
+	}
+	if byMetric["b"].Flag != AuditDriftFlagWarn {
+		t.Errorf("metric b (+21%%): expected warn, got %s", byMetric["b"].Flag)
+	}
+}
+
+// TestBuildAuditDriftTable_ZeroBaseline: prior=0 means we can't
+// compute pct (div-by-0). PctChange stays nil; current=0 stays
+// OK; current>0 escalates to WARN (zero→nonzero is always
+// notable).
+func TestBuildAuditDriftTable_ZeroBaseline(t *testing.T) {
+	prior := &AuditBaseline{Metrics: map[string]int64{"stayed_zero": 0, "went_nonzero": 0}}
+	current := map[string]int64{
+		"stayed_zero":   0,
+		"went_nonzero":  5,
+	}
+	rows := BuildAuditDriftTable(prior, current, 0.20)
+	byMetric := map[string]AuditDriftRow{}
+	for _, r := range rows {
+		byMetric[r.Metric] = r
+	}
+	if byMetric["stayed_zero"].Flag != AuditDriftFlagOK {
+		t.Errorf("0→0 should be OK, got %s", byMetric["stayed_zero"].Flag)
+	}
+	if byMetric["went_nonzero"].Flag != AuditDriftFlagWarn {
+		t.Errorf("0→5 should be warn, got %s", byMetric["went_nonzero"].Flag)
+	}
+	if byMetric["stayed_zero"].PctChange != nil || byMetric["went_nonzero"].PctChange != nil {
+		t.Errorf("zero-baseline rows must have nil PctChange (no division by zero)")
+	}
+}
+
+// TestBuildAuditDriftTable_NewMetricInCurrent: a metric present in
+// current but not in prior is flagged first_run, not "0%-change".
+func TestBuildAuditDriftTable_NewMetricInCurrent(t *testing.T) {
+	prior := &AuditBaseline{Metrics: map[string]int64{"old_only": 5}}
+	current := map[string]int64{"old_only": 5, "brand_new": 10}
+	rows := BuildAuditDriftTable(prior, current, 0)
+	byMetric := map[string]AuditDriftRow{}
+	for _, r := range rows {
+		byMetric[r.Metric] = r
+	}
+	if byMetric["brand_new"].Flag != AuditDriftFlagFirstRun {
+		t.Errorf("new metric should be first_run, got %s", byMetric["brand_new"].Flag)
+	}
+	if byMetric["old_only"].Flag != AuditDriftFlagOK {
+		t.Errorf("unchanged metric should be OK, got %s", byMetric["old_only"].Flag)
+	}
+}
+
+// TestBuildAuditDriftTable_SortedByMetric: deterministic JSON
+// output requires stable sort — drift tables in CI runs need to
+// diff cleanly.
+func TestBuildAuditDriftTable_SortedByMetric(t *testing.T) {
+	prior := &AuditBaseline{Metrics: map[string]int64{"zoo": 1, "alpha": 1, "midway": 1}}
+	current := map[string]int64{"zoo": 1, "alpha": 1, "midway": 1}
+	rows := BuildAuditDriftTable(prior, current, 0)
+	want := []string{"alpha", "midway", "zoo"}
+	for i, r := range rows {
+		if r.Metric != want[i] {
+			t.Errorf("rows[%d]: got %q, want %q", i, r.Metric, want[i])
+		}
+	}
+}
+
+// TestFormatAuditDriftTable_RendersFlags: stdout dump shape — we
+// don't pin every byte but verify the metric names + flags appear.
+func TestFormatAuditDriftTable_RendersFlags(t *testing.T) {
+	rows := []AuditDriftRow{
+		{Metric: "p3_accepted", Current: 50, Flag: AuditDriftFlagFirstRun},
+		{Metric: "p4_sft_rows", Baseline: int64Ptr(10), Current: 13, PctChange: float64Ptr(0.30), Flag: AuditDriftFlagWarn},
+	}
+	out := FormatAuditDriftTable(rows)
+	for _, want := range []string{"p3_accepted", "p4_sft_rows", "first_run", "warn", "+30.0%"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("expected %q in output:\n%s", want, out)
+		}
+	}
+}
+
+// TestFormatAuditDriftTable_EmptyHeader: empty rows yields a
+// single-line "(no metrics)" — operators see something instead of
+// blank output.
+func TestFormatAuditDriftTable_EmptyHeader(t *testing.T) {
+	out := FormatAuditDriftTable(nil)
+	if !strings.Contains(out, "no metrics") {
+		t.Errorf("expected 'no metrics' notice on empty input, got %q", out)
+	}
+}
+
+func int64Ptr(v int64) *int64       { return &v }
+func float64Ptr(v float64) *float64 { return &v }