vectord: v2 envelope + handleMerge robustness — actions post_role_gate_v1 scrum

3-lineage scrum on 434f466..0d4f033 surfaced one convergent finding
(Opus + Kimi) and 3 Opus-only real bugs. All actioned in this
commit. Two false positives (Kimi rollback misreading, Opus stale-
comment claim) verified + rejected — both required manual control-
flow inspection to refute, matching the documented Kimi-truncation
behavior in feedback_cross_lineage_review.md.

Convergent fix — DecodeIndex lost nil-meta items:
- Envelope version bumped 1 → 2.
- New v2 field: IDs []string carries the canonical ID set
  explicitly, independent of meta map's nil-vs-{} sparseness.
- DecodeIndex accepts both versions: v2 reads from env.IDs; v1
  falls back to meta-key inference (with the documented
  limitation that nil-meta items are invisible — preserved for
  backward-compat with already-persisted indexes).
- Encode emits v2 going forward.
- 2 new regression tests:
  - TestEncodeDecode_NilMetaItemsSurviveRoundTrip: items added
    with nil metadata MUST survive Encode → Decode and remain
    visible to IDs(). Pre-fix would have yielded IDs() == [].
  - TestDecodeIndex_V1BackwardCompat: hand-crafted v1 envelope
    still decodes (proves the fallback path).

Opus-only fixes:
- handleMerge: non-ErrIndexNotFound errors at h.reg.Get(name) /
  h.reg.Get(req.Dest) now return 500 + log instead of falling
  through with nil src/dest pointers (which would panic on the
  next deref). Real bug — only the sentinel error was handled.
- internal/drift/drift.go: mathLog wrapper removed; math.Log
  inlined. Wrapper added no value (math was already imported).
- internal/distillation/audit_baseline.go: BuildAuditDriftTable's
  bubble sort replaced with sort.Slice. Idiomatic + shorter.

Rejected after verification:
- Kimi WARN "missing rollback on partial merge": misread the
  control flow. Code at cmd/vectord/main.go:404-414 does NOT
  delete from src when dest.Add fails (continue before reaching
  src.Delete). Only successful Adds trigger Deletes.
- Opus INFO "TimestampUnixNano comment references missing field":
  field exists at scripts/multi_coord_stress/main.go:128. Opus
  saw only the diff context, not the full file.

Deferred (no fired trigger):
- Opus WARN "no per-index lock during merge": no concurrent merge
  callers today (operators run merge as deliberate one-shot job).
  Worth a lock if/when matrixd or chatd start auto-triggering.

Disposition: reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md.

Build + vet + tests green; 2 new regression tests + all prior tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-01 01:20:37 -05:00
parent 0d4f033b34
commit eb0dfdff04
6 changed files with 124 additions and 31 deletions

View File

@ -269,6 +269,7 @@ a steady state. Future items will land here as production triggers fire.
| (close-2) | **OPEN #2: distillation SFT export substrate**`internal/distillation/sft_export.go`: `IsSftNever` predicate + `ListScoredRunFiles` (data/scored-runs/YYYY/MM/DD walk) + `LoadScoredRunsFromFile` + partial `ExportSft` that wires the firewall but leaves synthesis (instruction/input/response generation) as the next wave. Firewall pinning test fails if `SftNever` set changes without review. 5 new tests. The synthesis port remains on Rust at `scripts/distillation/export_sft.ts`. |
| (close-2 full) | **OPEN #2 fully ported** (2026-05-01): `SynthesizeSft` + `LoadEvidenceByRunID` + `buildInstruction` ported byte-for-byte from `scripts/distillation/export_sft.ts`. All 8 source-class instruction templates (scrum_reviews / mode_experiments / auto_apply / audits / observer_reviews / contract_analyses / outcomes / default) match Rust output exactly so a/b validation between runtimes can diff JSONL byte-for-byte. `ExportSft` writes to `data/distilled/sft/sft_export.jsonl`. 5 additional tests including per-source-class template verification, extraction-rejection, empty-text-rejection, context-assembly, end-to-end fixture write. |
| (close-2 lineage) | **Audit-baselines lineage ported** (2026-05-01): `internal/distillation/audit_baseline.go` mirrors Rust `audit_full.ts`'s LoadBaseline/AppendBaseline/buildDriftTable. `LoadLastBaseline` reads the most recent JSON line from `data/_kb/audit_baselines.jsonl`; `AppendBaseline` appends append-only with bufio. `BuildAuditDriftTable` flags drift `>20%` (configurable); zero-baseline and new-metric edge cases handled (no division-by-zero, no false-stable on zero→nonzero). `FormatAuditDriftTable` for stdout dumps. Generic on metric names so callers running both runtimes can pin Rust-compat names (`AuditBaselineRustCompat` constant lists them). 13 tests including last-line-wins, trailing-blank-tolerance, malformed-line-errors, threshold-boundary, zero-baseline-handling, sort-stability. |
| (scrum) | 3-lineage scrum on `434f466..0d4f033` (post_role_gate_v1). Convergent finding (Opus + Kimi): `DecodeIndex` lost nil-meta items across persistence. **Fixed** by bumping envelope version 1→2 with explicit `IDs []string` field; v1 envelopes still load via meta-key fallback. Opus-only real bugs also actioned: `handleMerge` non-`ErrIndexNotFound` nil-deref, `mathLog` dead wrapper removed, bubble sort → `sort.Slice`. False positives rejected after verification (Kimi rollback misreading + Opus stale-comment claim). 2 new regression tests lock the v2 round-trip + v1 backward-compat. Disposition: `reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md`. |
| (close-3) | **OPEN #3: distribution drift via PSI**`internal/drift/drift.go`: `ComputeDistributionDrift` returns Population Stability Index + verdict tier (stable < 0.10, minor 0.100.25, major 0.25). Equal-width bucketing over combined min/max range, epsilon-clamping for empty buckets, per-bucket breakdown for drilldown. 7 new tests including identical-is-stable, hard-shift-is-major, moderate-detected-not-stable, empty-inputs-safe, all-identical-safe, bucket-counts-conserved, num-buckets-clamping. |
| (close-4) | **OPEN #4: ops nice-to-haves** — (a) Real-time wall-clock for stress harness: per-phase elapsed time logged to stdout as it runs (`[stress] phase NAME starting (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`); `Output.PhaseTimings` + `Output.TotalElapsedMs` written to JSON; (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase calibration: not actioned — no fired trigger yet, would be speculative. Documented as deferred-until-need rather than ignored. |

View File

@ -363,6 +363,13 @@ func (h *handlers) handleMerge(w http.ResponseWriter, r *http.Request) {
if errors.Is(err, vectord.ErrIndexNotFound) {
http.Error(w, "source not found", http.StatusNotFound)
return
} else if err != nil {
// Per scrum post_role_gate_v1 (Opus): non-ErrIndexNotFound
// errors must NOT fall through — src would be nil and the
// next deref panics.
slog.Error("merge: get source", "name", name, "err", err)
http.Error(w, "internal", http.StatusInternalServerError)
return
}
var req mergeRequest
if !decodeJSON(w, r, &req) {
@ -376,6 +383,12 @@ func (h *handlers) handleMerge(w http.ResponseWriter, r *http.Request) {
if errors.Is(err, vectord.ErrIndexNotFound) {
http.Error(w, "dest not found", http.StatusNotFound)
return
} else if err != nil {
// Same fix as source — non-ErrIndexNotFound dest errors
// can't reach the merge body with a nil dest pointer.
slog.Error("merge: get dest", "name", req.Dest, "err", err)
http.Error(w, "internal", http.StatusInternalServerError)
return
}
// Dimension match is non-negotiable — silently moving a 768-d
// vector into a 384-d index would corrupt search forever.

View File

@ -28,6 +28,7 @@ import (
"math"
"os"
"path/filepath"
"sort"
"strings"
)
@ -217,16 +218,10 @@ func BuildAuditDriftTable(prior *AuditBaseline, current map[string]int64, thresh
}
rows = append(rows, row)
}
// Sort for stable display + deterministic JSON output. Bubble-
// sort by name; size is at most a few dozen metrics, so the
// O(n²) cost is irrelevant.
for i := 0; i < len(rows); i++ {
for j := i + 1; j < len(rows); j++ {
if rows[i].Metric > rows[j].Metric {
rows[i], rows[j] = rows[j], rows[i]
}
}
}
// Sort for stable display + deterministic JSON output. Per
// scrum post_role_gate_v1 (Opus INFO): use sort.Slice — already
// imported in the package, idiomatic, and shorter.
sort.Slice(rows, func(i, j int) bool { return rows[i].Metric < rows[j].Metric })
return rows
}

View File

@ -41,10 +41,6 @@ import (
"git.agentview.dev/profit/golangLAKEHOUSE/internal/distillation"
)
// mathLog is wrapped to localize the math import in case future
// drift math swaps in a stable-log variant. Inlined by the compiler.
func mathLog(x float64) float64 { return math.Log(x) }
// ScorerDriftEntry is one mismatch — a historical (record, category)
// pair where the current scorer disagrees with the persisted
// verdict. Reasons captures the current scorer's explanation so
@ -337,7 +333,7 @@ func ComputeDistributionDrift(in DistributionDriftInput) DistributionDriftReport
if cRatio < driftEpsilon {
cRatio = driftEpsilon
}
part := (cRatio - bRatio) * mathLog(cRatio/bRatio)
part := (cRatio - bRatio) * math.Log(cRatio/bRatio)
psi += part
report.Buckets = append(report.Buckets, DistributionDriftBucket{
Lower: minV + float64(i)*width,

View File

@ -403,16 +403,25 @@ func (i *Index) Search(query []float32, k int) ([]Result, error) {
// HNSW graph bytes. params + metadata + format version travel
// together; the graph itself is opaque binary that round-trips
// through hnsw.Graph.Export / Import.
//
// Version history:
// - v1: original; metadata-only ID inference on decode (loses
// items added with nil metadata across persistence).
// - v2: explicit IDs slice closes the v1 gap. Older v1 envelopes
// still load (backward-compat fallback path in DecodeIndex).
type IndexEnvelope struct {
Version int `json:"version"`
Params IndexParams `json:"params"`
Metadata map[string]json.RawMessage `json:"metadata"`
IDs []string `json:"ids,omitempty"` // v2+ — canonical ID set
}
// envelopeVersion bumps when the on-disk JSON shape changes
// incompatibly. Reading a future version returns ErrVersionMismatch
// rather than producing a half-decoded index.
const envelopeVersion = 1
// envelopeVersion is the version this build EMITS. Decoders accept
// any envelope ≤ this version (with version-specific fallbacks for
// older shapes). Bumping this constant means the format changed
// incompatibly going forward; readers without the version-specific
// branch will return ErrVersionMismatch.
const envelopeVersion = 2
// ErrVersionMismatch is returned by DecodeIndex when the envelope
// claims a version this build doesn't understand.
@ -429,10 +438,17 @@ func (i *Index) Encode(envelopeW, graphW io.Writer) error {
i.mu.RLock()
defer i.mu.RUnlock()
// v2: serialize the canonical ID set explicitly so DecodeIndex
// can restore i.ids without depending on meta-key inference.
idList := make([]string, 0, len(i.ids))
for id := range i.ids {
idList = append(idList, id)
}
env := IndexEnvelope{
Version: envelopeVersion,
Params: i.params,
Metadata: i.meta,
IDs: idList,
}
if err := json.NewEncoder(envelopeW).Encode(env); err != nil {
return fmt.Errorf("encode envelope: %w", err)
@ -451,7 +467,10 @@ func DecodeIndex(envelopeR, graphR io.Reader) (*Index, error) {
if err := json.NewDecoder(envelopeR).Decode(&env); err != nil {
return nil, fmt.Errorf("decode envelope: %w", err)
}
if env.Version != envelopeVersion {
// Accept v1 (legacy: ids inferred from meta keys) and v2 (explicit
// ids slice). Future versions are unknown — return mismatch
// rather than half-decoding.
if env.Version < 1 || env.Version > envelopeVersion {
return nil, fmt.Errorf("%w: have %d, got %d",
ErrVersionMismatch, envelopeVersion, env.Version)
}
@ -465,17 +484,20 @@ func DecodeIndex(envelopeR, graphR io.Reader) (*Index, error) {
if env.Metadata != nil {
idx.meta = env.Metadata
}
// Restore the ids tracker from the metadata keyset. Items that
// were Add'd with nil metadata aren't in env.Metadata and won't
// appear in i.ids after reload — IDs() will miss them, and the
// merge endpoint will skip them. This is acceptable because the
// production HTTP path always supplies non-nil metadata (handler
// requires it explicitly; multi_coord_stress always sends an
// object). The edge case is intentionally not closed because
// closing it requires bumping the envelope version, which
// invalidates existing persisted indexes.
for id := range idx.meta {
idx.ids[id] = struct{}{}
// v2: explicit IDs field is the canonical source. v1 fallback:
// derive from meta keys, accepting that nil-meta items will be
// invisible to IDs()/merge until they get re-Add'd. Closes the
// scrum post_role_gate_v1 convergent finding (Opus + Kimi).
if env.Version >= 2 && env.IDs != nil {
for _, id := range env.IDs {
idx.ids[id] = struct{}{}
}
} else {
// v1 backward-compat path. Old envelopes don't carry ids
// explicitly; the metadata keyset is the best signal we have.
for id := range idx.meta {
idx.ids[id] = struct{}{}
}
}
return idx, nil
}

View File

@ -6,6 +6,7 @@ import (
"errors"
"fmt"
"math"
"strings"
"sync"
"testing"
)
@ -189,6 +190,71 @@ func TestDecodeIndex_VersionMismatch(t *testing.T) {
}
}
// TestEncodeDecode_NilMetaItemsSurviveRoundTrip locks the
// post_role_gate_v1 scrum convergent finding (Opus + Kimi):
// items added with nil metadata MUST survive Encode→Decode and
// remain visible to IDs(). Pre-fix v1 envelope inferred ids from
// meta keys, silently dropping nil-meta items. v2 envelope carries
// the IDs slice explicitly. Test creates a worst-case where every
// item has nil metadata — pre-fix would yield IDs() == [].
func TestEncodeDecode_NilMetaItemsSurviveRoundTrip(t *testing.T) {
src, _ := NewIndex(IndexParams{Name: "nil_meta_test", Dimension: 4, Distance: DistanceCosine})
for _, id := range []string{"a", "b", "c"} {
// nil meta — the case Opus + Kimi flagged.
if err := src.Add(id, []float32{1, 0, 0, 0}, nil); err != nil {
t.Fatalf("Add %s: %v", id, err)
}
}
if got := src.IDs(); len(got) != 3 {
t.Fatalf("pre-encode: expected 3 IDs, got %d", len(got))
}
var envBuf, graphBuf bytes.Buffer
if err := src.Encode(&envBuf, &graphBuf); err != nil {
t.Fatalf("Encode: %v", err)
}
dst, err := DecodeIndex(&envBuf, &graphBuf)
if err != nil {
t.Fatalf("DecodeIndex: %v", err)
}
if got := dst.IDs(); len(got) != 3 {
t.Errorf("post-decode: expected 3 IDs (nil-meta items must survive v2 round-trip), got %d %v", len(got), got)
}
}
// TestDecodeIndex_V1BackwardCompat locks the legacy-shape fallback:
// envelope without an explicit "ids" field is still loadable. The
// v2 → v1 fallback path infers ids from meta keys (with the
// documented limitation for nil-meta items, which this test does
// NOT exercise — it only proves v1 envelopes still load).
func TestDecodeIndex_V1BackwardCompat(t *testing.T) {
// Hand-craft a v1 envelope (no IDs field).
envJSON := `{"version":1,"params":{"name":"v1_test","dimension":4,"distance":"cosine","m":16,"ef_search":20},"metadata":{"id1":{"foo":"bar"}}}`
// Empty graph stream — DecodeIndex should still succeed and
// emit an Index with id1 in i.ids inferred from meta.
src, _ := NewIndex(IndexParams{Name: "tmp", Dimension: 4})
_ = src.Add("dummy", []float32{1, 0, 0, 0}, json.RawMessage(`{"x":1}`))
var graphBuf bytes.Buffer
if err := src.g.Export(&graphBuf); err != nil {
t.Fatalf("export tmp graph for v1 fixture: %v", err)
}
dst, err := DecodeIndex(strings.NewReader(envJSON), &graphBuf)
if err != nil {
t.Fatalf("v1 envelope must still load, got %v", err)
}
// ids should contain "id1" (from the v1 metadata-key fallback).
hasID1 := false
for _, id := range dst.IDs() {
if id == "id1" {
hasID1 = true
break
}
}
if !hasID1 {
t.Errorf("v1 fallback didn't restore id from meta keys, got IDs=%v", dst.IDs())
}
}
func TestRegistry_CreateGetDelete(t *testing.T) {
r := NewRegistry()
idx, err := r.Create(IndexParams{Name: "workers", Dimension: 4})