3 Commits

Author SHA1 Message Date
root
eb0dfdff04 vectord: v2 envelope + handleMerge robustness — actions post_role_gate_v1 scrum
3-lineage scrum on 434f466..0d4f033 surfaced one convergent finding
(Opus + Kimi) and 3 Opus-only real bugs. All actioned in this
commit. Two false positives (Kimi rollback misreading, Opus stale-
comment claim) verified + rejected — both required manual control-
flow inspection to refute, matching the documented Kimi-truncation
behavior in feedback_cross_lineage_review.md.

Convergent fix — DecodeIndex lost nil-meta items:
- Envelope version bumped 1 → 2.
- New v2 field: IDs []string carries the canonical ID set
  explicitly, independent of meta map's nil-vs-{} sparseness.
- DecodeIndex accepts both versions: v2 reads from env.IDs; v1
  falls back to meta-key inference (with the documented
  limitation that nil-meta items are invisible — preserved for
  backward-compat with already-persisted indexes).
- Encode emits v2 going forward.
- 2 new regression tests:
  - TestEncodeDecode_NilMetaItemsSurviveRoundTrip: items added
    with nil metadata MUST survive Encode → Decode and remain
    visible to IDs(). Pre-fix would have yielded IDs() == [].
  - TestDecodeIndex_V1BackwardCompat: hand-crafted v1 envelope
    still decodes (proves the fallback path).

Opus-only fixes:
- handleMerge: non-ErrIndexNotFound errors at h.reg.Get(name) /
  h.reg.Get(req.Dest) now return 500 + log instead of falling
  through with nil src/dest pointers (which would panic on the
  next deref). Real bug — only the sentinel error was handled.
- internal/drift/drift.go: mathLog wrapper removed; math.Log
  inlined. Wrapper added no value (math was already imported).
- internal/distillation/audit_baseline.go: BuildAuditDriftTable's
  bubble sort replaced with sort.Slice. Idiomatic + shorter.

Rejected after verification:
- Kimi WARN "missing rollback on partial merge": misread the
  control flow. Code at cmd/vectord/main.go:404-414 does NOT
  delete from src when dest.Add fails (continue before reaching
  src.Delete). Only successful Adds trigger Deletes.
- Opus INFO "TimestampUnixNano comment references missing field":
  field exists at scripts/multi_coord_stress/main.go:128. Opus
  saw only the diff context, not the full file.

Deferred (no fired trigger):
- Opus WARN "no per-index lock during merge": no concurrent merge
  callers today (operators run merge as deliberate one-shot job).
  Worth a lock if/when matrixd or chatd start auto-triggering.

Disposition: reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md.

Build + vet + tests green; 2 new regression tests + all prior tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:20:37 -05:00
root
b216b7e5b6 fix the other 4: close all OPEN-list items in one wave
Substantial wave addressing all 4 prior OPEN items. Three closed in
full, one partially (the speculative half deliberately deferred).

OPEN #1 — Periodic fresh→main index merge (FULL):
- POST /v1/vectors/index/{src}/merge with {dest, clear_source}
- Idempotent on re-runs (existing-in-dest items skipped)
- internal/vectord/index.go: new Index.IDs() snapshot method +
  i.ids tracker field as canonical ID set, independent of meta
  map's nil-vs-{} sparseness (was a real bug — IDs() backed by meta
  alone missed items added with nil metadata)
- 4 cmd-level integration tests (happy path drain+clear, dim
  mismatch, dest not found, self-merge rejection) + 1 unit test
- DecodeIndex backward-compat: old envelopes restore i.ids from
  meta keys (best effort; new items going forward use the tracker)

OPEN #2 — Distillation SFT export (SUBSTRATE):
- internal/distillation/sft_export.go ports the load-bearing half:
  IsSftNever predicate + ListScoredRunFiles (data/scored-runs/YYYY/
  MM/DD walk) + LoadScoredRunsFromFile + partial ExportSft.
- Synthesis (instruction/input/response generation) deferred to a
  separate wave — too big for this session, but the substrate
  makes the next wave a port-not-design exercise.
- TestSftNever_PinsExpectedSet locks the contamination firewall
  set: if a future commit adds/removes from SftNever, this test
  fails — forcing the change through review.
- 5 new tests; firewall fires end-to-end through the partial port.

OPEN #3 — Distribution drift via PSI (FULL):
- internal/drift/drift.go: ComputeDistributionDrift via Population
  Stability Index. Standard finance/risk metric, well-defined
  verdict tiers (stable < 0.10, minor 0.10–0.25, major ≥ 0.25).
- Equal-width bucketing over combined min/max so neither dist
  falls outside; epsilon-clamping for empty buckets so log doesn't
  blow up. Per-bucket breakdown for drilldown.
- Pairs with the existing ComputeScorerDrift: scorer drift is
  categorical, distribution drift is continuous. Different shapes,
  same package.
- 7 new tests covering identical-is-stable, hard-shift-is-major,
  moderate-detected-not-stable, empty-inputs-safe, all-identical-
  safe, bucket-counts-conserved, num-buckets-clamping.

OPEN #4 — Ops nice-to-haves (PARTIAL — wall-clock done, others
deferred):
- (a) Real-time wall-clock for stress harness: per-phase elapsed
  time logged to stdout as it runs (`[stress] phase NAME starting
  (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`).
  Output.PhaseTimings + Output.TotalElapsedMs in JSON.
- (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase
  calibration: not actioned — no fired trigger, would be
  speculative. Documented as deferred-until-need rather than
  ignored. Per the project's discipline ("don't add features
  beyond what the task requires").

OPEN list now empty / steady-state. Future items will land as
production triggers fire.

Build + vet + tests green; 18 new tests across the 4 closures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:42:11 -05:00
root
be65f85f17 F: drift quantification — scorer drift first
PRD's 5-loop substrate names "drift" as loop 5: quantify when
historical decisions stop matching current reality. Distinct from
the rating+distillation loop because drift is MEASUREMENT, not
LEARNING. The learning loop says "this match worked, remember it";
the drift loop says "this 4-month-old playbook entry — does it
still match what the substrate would surface today?"

First-shipped drift shape: SCORER drift. When the deterministic
scorer's ScorerVersion bumps, historical ScoredRuns may no longer
match what the current scorer produces on the same EvidenceRecord.

internal/drift/drift.go:
  - ScorerDriftInput  — (EvidenceRecord, persisted_category) pair
  - ScorerDriftEntry  — one mismatch with current reasons attached
  - CategoryShift     — (from, to, count) cell in the shift matrix
  - ScorerDriftReport — summary + sorted shift matrix + optional entries
  - ComputeScorerDrift(inputs, includeEntries) — pure function;
    re-runs ScoreRecord over each input and reports mismatches

Why this matters: without a drift quantifier, a scorer-rule change
silently invalidates the historical training data feeding the
learning loop. With drift quantification, a rule change surfaces
a concrete number ("847 of 4701 historical ScoredRuns now
disagree") that triggers a re-score-and-retrain cycle rather than
letting the substrate quietly rot.

Tests (6/6 PASS):
  - No-drift: all 3 inputs match → 100% matched
  - Shift detected: 5 inputs, 3 drift cases, drift_rate=0.6,
    shift matrix shows accepted→partially_accepted x3
  - Multiple shifts sorted by count desc
  - includeEntries=false skips the per-mismatch list
  - Empty input → all-zero report (no division-by-zero)
  - ScorerVersion stamped on every report

Future drift shapes (deferred to follow-ups, named in package doc):
  - PLAYBOOK drift: re-run playbook queries through current
    matrix-search; recorded answer not in top-K = drift
  - EMBEDDING drift: KS-test on vector distribution at T1 vs T2
  - AUDIT BASELINE drift: matches Rust audit_baselines.jsonl
    longitudinal signal

Pure compute. Materialization layer (read scored-runs jsonl + their
matching evidence jsonl + feed into ComputeScorerDrift) lands with
the distillation materialization commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:06:17 -05:00