root be65f85f17 F: drift quantification — scorer drift first
PRD's 5-loop substrate names "drift" as loop 5: quantify when
historical decisions stop matching current reality. Distinct from
the rating+distillation loop because drift is MEASUREMENT, not
LEARNING. The learning loop says "this match worked, remember it";
the drift loop says "this 4-month-old playbook entry — does it
still match what the substrate would surface today?"

First-shipped drift shape: SCORER drift. When the deterministic
scorer's ScorerVersion bumps, historical ScoredRuns may no longer
match what the current scorer produces on the same EvidenceRecord.

internal/drift/drift.go:
  - ScorerDriftInput  — (EvidenceRecord, persisted_category) pair
  - ScorerDriftEntry  — one mismatch with current reasons attached
  - CategoryShift     — (from, to, count) cell in the shift matrix
  - ScorerDriftReport — summary + sorted shift matrix + optional entries
  - ComputeScorerDrift(inputs, includeEntries) — pure function;
    re-runs ScoreRecord over each input and reports mismatches

Why this matters: without a drift quantifier, a scorer-rule change
silently invalidates the historical training data feeding the
learning loop. With drift quantification, a rule change surfaces
a concrete number ("847 of 4701 historical ScoredRuns now
disagree") that triggers a re-score-and-retrain cycle rather than
letting the substrate quietly rot.

Tests (6/6 PASS):
  - No-drift: all 3 inputs match → 100% matched
  - Shift detected: 5 inputs, 3 drift cases, drift_rate=0.6,
    shift matrix shows accepted→partially_accepted x3
  - Multiple shifts sorted by count desc
  - includeEntries=false skips the per-mismatch list
  - Empty input → all-zero report (no division-by-zero)
  - ScorerVersion stamped on every report

Future drift shapes (deferred to follow-ups, named in package doc):
  - PLAYBOOK drift: re-run playbook queries through current
    matrix-search; recorded answer not in top-K = drift
  - EMBEDDING drift: KS-test on vector distribution at T1 vs T2
  - AUDIT BASELINE drift: matches Rust audit_baselines.jsonl
    longitudinal signal

Pure compute. Materialization layer (read scored-runs jsonl + their
matching evidence jsonl + feed into ComputeScorerDrift) lands with
the distillation materialization commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:06:17 -05:00
..