3-lineage scrum on 434f466..0d4f033 surfaced one convergent finding
(Opus + Kimi) and 3 Opus-only real bugs. All actioned in this
commit. Two false positives (Kimi rollback misreading, Opus stale-
comment claim) verified + rejected — both required manual control-
flow inspection to refute, matching the documented Kimi-truncation
behavior in feedback_cross_lineage_review.md.
Convergent fix — DecodeIndex lost nil-meta items:
- Envelope version bumped 1 → 2.
- New v2 field: IDs []string carries the canonical ID set
explicitly, independent of meta map's nil-vs-{} sparseness.
- DecodeIndex accepts both versions: v2 reads from env.IDs; v1
falls back to meta-key inference (with the documented
limitation that nil-meta items are invisible — preserved for
backward-compat with already-persisted indexes).
- Encode emits v2 going forward.
- 2 new regression tests:
- TestEncodeDecode_NilMetaItemsSurviveRoundTrip: items added
with nil metadata MUST survive Encode → Decode and remain
visible to IDs(). Pre-fix would have yielded IDs() == [].
- TestDecodeIndex_V1BackwardCompat: hand-crafted v1 envelope
still decodes (proves the fallback path).
Opus-only fixes:
- handleMerge: non-ErrIndexNotFound errors at h.reg.Get(name) /
h.reg.Get(req.Dest) now return 500 + log instead of falling
through with nil src/dest pointers (which would panic on the
next deref). Real bug — only the sentinel error was handled.
- internal/drift/drift.go: mathLog wrapper removed; math.Log
inlined. Wrapper added no value (math was already imported).
- internal/distillation/audit_baseline.go: BuildAuditDriftTable's
bubble sort replaced with sort.Slice. Idiomatic + shorter.
Rejected after verification:
- Kimi WARN "missing rollback on partial merge": misread the
control flow. Code at cmd/vectord/main.go:404-414 does NOT
delete from src when dest.Add fails (continue before reaching
src.Delete). Only successful Adds trigger Deletes.
- Opus INFO "TimestampUnixNano comment references missing field":
field exists at scripts/multi_coord_stress/main.go:128. Opus
saw only the diff context, not the full file.
Deferred (no fired trigger):
- Opus WARN "no per-index lock during merge": no concurrent merge
callers today (operators run merge as deliberate one-shot job).
Worth a lock if/when matrixd or chatd start auto-triggering.
Disposition: reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md.
Build + vet + tests green; 2 new regression tests + all prior tests
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
252 lines
8.5 KiB
Go
252 lines
8.5 KiB
Go
package distillation
|
|
|
|
// Audit-baseline lineage — the longitudinal signal that distinguishes
|
|
// "metrics shifted because the world changed" from "metrics shifted
|
|
// because we broke something." Mirrors the Rust audit_full.ts
|
|
// LoadBaseline/AppendBaseline/buildDriftTable shape so a Go-side
|
|
// audit run can be compared against Rust-side baselines and
|
|
// vice-versa during the migration.
|
|
//
|
|
// Storage: data/_kb/audit_baselines.jsonl, one AuditBaseline per
|
|
// line, append-only. The LAST line is the most recent. New runs
|
|
// read the prior baseline, compute drift vs current metrics, then
|
|
// append a fresh entry.
|
|
//
|
|
// Why generic on metric name (vs Rust's pinned p2_evidence_rows
|
|
// etc.): the Rust phase numbering (p0..p7) doesn't translate to Go
|
|
// directly. Operators with mixed Rust+Go pipelines should use the
|
|
// SAME metric names on both sides so the drift table compares
|
|
// like-for-like. Helper constants below pin the Rust-compat names
|
|
// for callers running both runtimes.
|
|
|
|
import (
|
|
"bufio"
|
|
"bytes"
|
|
"encoding/json"
|
|
"errors"
|
|
"fmt"
|
|
"math"
|
|
"os"
|
|
"path/filepath"
|
|
"sort"
|
|
"strings"
|
|
)
|
|
|
|
// AuditBaseline is one entry in the audit_baselines.jsonl
|
|
// longitudinal log. Schema-stable; new metrics land as new keys
|
|
// in the Metrics map (additive — readers tolerate unknown keys).
|
|
type AuditBaseline struct {
|
|
RecordedAt string `json:"recorded_at"` // ISO 8601 UTC
|
|
GitCommit string `json:"git_commit,omitempty"` // sha of the run's HEAD
|
|
Metrics map[string]int64 `json:"metrics"`
|
|
}
|
|
|
|
// AuditBaselineRustCompat lists the metric names the Rust pipeline
|
|
// emits at audit_full.ts. Go-side callers running an equivalent
|
|
// audit should use these names so drift compares across runtimes.
|
|
// Adding new names here requires the Rust side to mint them too.
|
|
var AuditBaselineRustCompat = []string{
|
|
"p2_evidence_rows",
|
|
"p2_evidence_skips",
|
|
"p3_accepted",
|
|
"p3_partial",
|
|
"p3_rejected",
|
|
"p3_human",
|
|
"p4_rag_rows",
|
|
"p4_sft_rows",
|
|
"p4_pref_pairs",
|
|
"p4_total_quarantined",
|
|
}
|
|
|
|
// DefaultBaselinePath returns the canonical audit baselines path
|
|
// rooted at the lakehouse data dir. Match Rust's BASELINE_PATH_FOR.
|
|
func DefaultBaselinePath(root string) string {
|
|
return filepath.Join(root, "data", "_kb", "audit_baselines.jsonl")
|
|
}
|
|
|
|
// LoadLastBaseline reads audit_baselines.jsonl and returns the
|
|
// most recent entry — i.e. the LAST non-empty JSON line. Missing
|
|
// file or empty file returns (nil, nil), not an error: a fresh
|
|
// pipeline has no baseline yet, and the caller should treat that
|
|
// as "first run" via BuildAuditDriftTable's nil-prior handling.
|
|
//
|
|
// Malformed last line returns an error (rather than silently
|
|
// skipping to the previous line) so operators don't lose drift
|
|
// signal under partial-write corruption.
|
|
func LoadLastBaseline(path string) (*AuditBaseline, error) {
|
|
data, err := os.ReadFile(path)
|
|
if os.IsNotExist(err) {
|
|
return nil, nil
|
|
}
|
|
if err != nil {
|
|
return nil, fmt.Errorf("read baselines: %w", err)
|
|
}
|
|
lines := strings.Split(string(data), "\n")
|
|
// Walk back to the last non-empty line.
|
|
for i := len(lines) - 1; i >= 0; i-- {
|
|
s := strings.TrimSpace(lines[i])
|
|
if s == "" {
|
|
continue
|
|
}
|
|
var b AuditBaseline
|
|
if err := json.Unmarshal([]byte(s), &b); err != nil {
|
|
return nil, fmt.Errorf("decode last baseline (line %d): %w", i+1, err)
|
|
}
|
|
return &b, nil
|
|
}
|
|
return nil, nil
|
|
}
|
|
|
|
// AppendBaseline appends one AuditBaseline as a JSON line to
|
|
// audit_baselines.jsonl. Creates the parent directory if missing.
|
|
// Atomic write at the line level: a partial write on disk-full or
|
|
// crash leaves the file with at most one truncated trailing line,
|
|
// which LoadLastBaseline will surface as a decode error.
|
|
func AppendBaseline(path string, b AuditBaseline) error {
|
|
if b.RecordedAt == "" {
|
|
return errors.New("audit_baseline: RecordedAt required")
|
|
}
|
|
if b.Metrics == nil {
|
|
return errors.New("audit_baseline: Metrics required (use empty map for zero-metric run)")
|
|
}
|
|
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
|
return fmt.Errorf("mkdir baseline dir: %w", err)
|
|
}
|
|
line, err := json.Marshal(b)
|
|
if err != nil {
|
|
return fmt.Errorf("encode baseline: %w", err)
|
|
}
|
|
f, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
|
|
if err != nil {
|
|
return fmt.Errorf("open baselines: %w", err)
|
|
}
|
|
defer f.Close()
|
|
w := bufio.NewWriter(f)
|
|
if _, err := w.Write(line); err != nil {
|
|
return fmt.Errorf("write baseline: %w", err)
|
|
}
|
|
if err := w.WriteByte('\n'); err != nil {
|
|
return fmt.Errorf("write newline: %w", err)
|
|
}
|
|
return w.Flush()
|
|
}
|
|
|
|
// AuditDriftFlag categorizes a single metric's drift verdict.
|
|
// Mirrors the Rust DriftRow.flag values exactly.
|
|
type AuditDriftFlag string
|
|
|
|
const (
|
|
AuditDriftFlagFirstRun AuditDriftFlag = "first_run" // no prior baseline → can't compute change
|
|
AuditDriftFlagOK AuditDriftFlag = "ok" // |Δ%| ≤ threshold
|
|
AuditDriftFlagWarn AuditDriftFlag = "warn" // |Δ%| > threshold
|
|
)
|
|
|
|
// DefaultDriftWarnThreshold is 20% — matches Rust's hard-coded
|
|
// `Math.abs(pct) > 0.20`. Operators tuning sensitivity per metric
|
|
// can pass a different value to BuildAuditDriftTable.
|
|
const DefaultDriftWarnThreshold = 0.20
|
|
|
|
// AuditDriftRow is one metric's drift verdict. PctChange is nil
|
|
// when prior baseline was zero (division-by-zero) OR when this is
|
|
// the first run. Encoded as *float64 so JSON omits the field
|
|
// rather than emitting 0.0 for "unknowable" cases.
|
|
type AuditDriftRow struct {
|
|
Metric string `json:"metric"`
|
|
Baseline *int64 `json:"baseline"`
|
|
Current int64 `json:"current"`
|
|
PctChange *float64 `json:"pct_change"`
|
|
Flag AuditDriftFlag `json:"flag"`
|
|
}
|
|
|
|
// BuildAuditDriftTable computes per-metric drift between a prior
|
|
// baseline (nil = first run) and the current metric snapshot. The
|
|
// result is sorted by metric name for stable display.
|
|
//
|
|
// Threshold is the absolute percent-change above which a metric is
|
|
// flagged "warn". Pass DefaultDriftWarnThreshold (0.20 = 20%) to
|
|
// match Rust audit_full.ts. Use a per-metric threshold map by
|
|
// calling BuildAuditDriftTable once per metric subset.
|
|
func BuildAuditDriftTable(prior *AuditBaseline, current map[string]int64, threshold float64) []AuditDriftRow {
|
|
if threshold <= 0 {
|
|
threshold = DefaultDriftWarnThreshold
|
|
}
|
|
// Union of metric names so a metric that disappeared from
|
|
// current still surfaces as "current=0, drifted -100%".
|
|
names := make(map[string]struct{}, len(current))
|
|
for k := range current {
|
|
names[k] = struct{}{}
|
|
}
|
|
if prior != nil {
|
|
for k := range prior.Metrics {
|
|
names[k] = struct{}{}
|
|
}
|
|
}
|
|
rows := make([]AuditDriftRow, 0, len(names))
|
|
for name := range names {
|
|
row := AuditDriftRow{Metric: name, Current: current[name]}
|
|
if prior == nil {
|
|
row.Flag = AuditDriftFlagFirstRun
|
|
rows = append(rows, row)
|
|
continue
|
|
}
|
|
priorVal, hadPrior := prior.Metrics[name]
|
|
if !hadPrior {
|
|
// New metric in current — treat as first-run for THIS metric.
|
|
row.Flag = AuditDriftFlagFirstRun
|
|
rows = append(rows, row)
|
|
continue
|
|
}
|
|
row.Baseline = &priorVal
|
|
if priorVal == 0 {
|
|
// Division-by-zero: leave PctChange nil. If current is
|
|
// also 0 → ok (no change). Otherwise → warn (the metric
|
|
// went from zero to non-zero, which is always notable).
|
|
if current[name] == 0 {
|
|
row.Flag = AuditDriftFlagOK
|
|
} else {
|
|
row.Flag = AuditDriftFlagWarn
|
|
}
|
|
rows = append(rows, row)
|
|
continue
|
|
}
|
|
pct := float64(current[name]-priorVal) / float64(priorVal)
|
|
row.PctChange = &pct
|
|
if math.Abs(pct) > threshold {
|
|
row.Flag = AuditDriftFlagWarn
|
|
} else {
|
|
row.Flag = AuditDriftFlagOK
|
|
}
|
|
rows = append(rows, row)
|
|
}
|
|
// Sort for stable display + deterministic JSON output. Per
|
|
// scrum post_role_gate_v1 (Opus INFO): use sort.Slice — already
|
|
// imported in the package, idiomatic, and shorter.
|
|
sort.Slice(rows, func(i, j int) bool { return rows[i].Metric < rows[j].Metric })
|
|
return rows
|
|
}
|
|
|
|
// FormatAuditDriftTable renders a drift table as a fixed-width
|
|
// text grid — useful for stdout dumps in audit-full runs. Matches
|
|
// the Rust output shape so an operator can grep across runtimes
|
|
// without re-learning the layout.
|
|
func FormatAuditDriftTable(rows []AuditDriftRow) string {
|
|
if len(rows) == 0 {
|
|
return "(no metrics)\n"
|
|
}
|
|
var buf bytes.Buffer
|
|
fmt.Fprintf(&buf, "%-26s %12s %12s %10s %s\n", "metric", "baseline", "current", "Δ%", "flag")
|
|
for _, r := range rows {
|
|
baseline := "-"
|
|
if r.Baseline != nil {
|
|
baseline = fmt.Sprintf("%d", *r.Baseline)
|
|
}
|
|
pct := "-"
|
|
if r.PctChange != nil {
|
|
pct = fmt.Sprintf("%+.1f%%", *r.PctChange*100)
|
|
}
|
|
fmt.Fprintf(&buf, "%-26s %12s %12d %10s %s\n",
|
|
r.Metric, baseline, r.Current, pct, r.Flag)
|
|
}
|
|
return buf.String()
|
|
}
|