Follow-up to b216b7e (which shipped the SFT export substrate). This
commit ports the synthesis logic, completing the migration:
- SynthesizeSft(scored, ev, recordedAt, sftID) → *SftSample
Mirrors the Rust synthesizeSft byte-for-byte. Returns nil for
extraction-class records + empty-text records (same skip
semantics as Rust).
- LoadEvidenceByRunID(scoredPath, cache) reads the paired evidence
JSONL (path derived by /scored-runs/ → /evidence/ replacement).
Per-call cache so multiple scored-runs files in the same dir
don't reload the same evidence.
- buildInstruction maps source_file stem → per-class instruction
template. All 8 templates (scrum_reviews, mode_experiments,
auto_apply, audits, observer_reviews, contract_analyses,
outcomes, default) match Rust output exactly so a/b validation
between runtimes can diff JSONL byte-for-byte.
- stemFromSourceFile strips data/_kb/ prefix + .jsonl suffix.
- ExportSft now writes data/distilled/sft/sft_export.jsonl with
the synthesized samples (DryRun=true skips file write).
Per-class templates verified by 8-case sub-test:
- scrum_reviews → "Review the file '...' against the PRD..."
- mode_experiments → "Run task_class='...' for file..."
- auto_apply → "Auto-apply: emit a 6-line surgical patch..."
- audits with phase: prefix → strips to bare phase name
- observer_reviews → "Observer-review the latest attempt..."
- contract_analyses with permit: prefix → strips to permit ID
- outcomes → "Run scenario; report per-event outcome..."
- unknown source → "Source 'X' run; produce the appropriate output"
Caveat documented inline: contract_analyses uses ev.metadata.contractor
in Rust to produce "Analyze contractor 'X' for permit 'Y'" when
present. Go's EvidenceRecord doesn't carry a free-form metadata bag
yet, so we always emit the no-contractor form. Operators needing
contractor-aware instructions can extend EvidenceRecord with an
explicit Metadata field (separate ADR).
Test additions (5 new):
- TestSynthesizeSft_PerSourceClass: 8 sub-cases, one per template
- TestSynthesizeSft_RejectsExtraction: extraction-role records skipped
- TestSynthesizeSft_RejectsEmptyText: empty/whitespace text skipped
- TestSynthesizeSft_ContextAssembly: matrix + pathway + model
context string formatting matches Rust " · " join
- TestExportSft_FullPort_WritesJSONL: end-to-end fixture, asserts
output contains expected instruction + omits firewalled records
Pre-existing TestExportSft_PartialPort_FirewallFires renamed +
updated to TestExportSft_FirewallFiresBeforeEvidenceLoad — reflects
the new contract that records passing the firewall but lacking
evidence land in "not-instructable" rather than being silently
exported. Honest semantics shift documented in the test.
OPEN #2 now fully closed (was: substrate-only). The synthesis path
no longer requires the Rust pipeline to be invoked — Go-side
operators can run the full distillation export end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Substantial wave addressing all 4 prior OPEN items. Three closed in
full, one partially (the speculative half deliberately deferred).
OPEN #1 — Periodic fresh→main index merge (FULL):
- POST /v1/vectors/index/{src}/merge with {dest, clear_source}
- Idempotent on re-runs (existing-in-dest items skipped)
- internal/vectord/index.go: new Index.IDs() snapshot method +
i.ids tracker field as canonical ID set, independent of meta
map's nil-vs-{} sparseness (was a real bug — IDs() backed by meta
alone missed items added with nil metadata)
- 4 cmd-level integration tests (happy path drain+clear, dim
mismatch, dest not found, self-merge rejection) + 1 unit test
- DecodeIndex backward-compat: old envelopes restore i.ids from
meta keys (best effort; new items going forward use the tracker)
OPEN #2 — Distillation SFT export (SUBSTRATE):
- internal/distillation/sft_export.go ports the load-bearing half:
IsSftNever predicate + ListScoredRunFiles (data/scored-runs/YYYY/
MM/DD walk) + LoadScoredRunsFromFile + partial ExportSft.
- Synthesis (instruction/input/response generation) deferred to a
separate wave — too big for this session, but the substrate
makes the next wave a port-not-design exercise.
- TestSftNever_PinsExpectedSet locks the contamination firewall
set: if a future commit adds/removes from SftNever, this test
fails — forcing the change through review.
- 5 new tests; firewall fires end-to-end through the partial port.
OPEN #3 — Distribution drift via PSI (FULL):
- internal/drift/drift.go: ComputeDistributionDrift via Population
Stability Index. Standard finance/risk metric, well-defined
verdict tiers (stable < 0.10, minor 0.10–0.25, major ≥ 0.25).
- Equal-width bucketing over combined min/max so neither dist
falls outside; epsilon-clamping for empty buckets so log doesn't
blow up. Per-bucket breakdown for drilldown.
- Pairs with the existing ComputeScorerDrift: scorer drift is
categorical, distribution drift is continuous. Different shapes,
same package.
- 7 new tests covering identical-is-stable, hard-shift-is-major,
moderate-detected-not-stable, empty-inputs-safe, all-identical-
safe, bucket-counts-conserved, num-buckets-clamping.
OPEN #4 — Ops nice-to-haves (PARTIAL — wall-clock done, others
deferred):
- (a) Real-time wall-clock for stress harness: per-phase elapsed
time logged to stdout as it runs (`[stress] phase NAME starting
(T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`).
Output.PhaseTimings + Output.TotalElapsedMs in JSON.
- (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase
calibration: not actioned — no fired trigger, would be
speculative. Documented as deferred-until-need rather than
ignored. Per the project's discipline ("don't add features
beyond what the task requires").
OPEN list now empty / steady-state. Future items will land as
production triggers fire.
Build + vet + tests green; 18 new tests across the 4 closures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of the Rust v1.0.0 distillation substrate (e7636f2)
ported to Go per ADR-001 #4 (port LOGIC, not bit-identical
reproducibility). This commit lands the LOAD-BEARING pieces named
in project_distillation_substrate.md memory:
- The deterministic Success Scorer (8 sub-scorers + dispatch)
- The contamination firewall on SFT samples (the "non-negotiable"
spec property: rejected/needs_human_review NEVER ship to SFT)
- All on-wire types + validators for ScoredRun, SftSample,
EvidenceRecord with Provenance
Files:
internal/distillation/types.go — types + ScorerVersion + SftNever
+ ValidateScoredRun + ValidateSftSample
internal/distillation/scorer.go — ScoreRecord + 8 class scorers +
BuildScoredRun (deterministic)
internal/distillation/scorer_test.go — ~40 test cases:
- source-class dispatch (verdict / telemetry / extraction)
- scrum_review (4 attempt cases)
- observer_review (5 verdict cases)
- audit (legacy + severity, 9 cases)
- auto_apply (4 cases)
- outcomes / mode_experiment / extraction
- CONTAMINATION FIREWALL: ErrSftContamination sentinel fires
on rejected/needs_human_review, distinct from typo errors
- empty-pair guard (instruction/response trim != "")
- reasons-required ScoredRun validation
- deterministic sig_hash on identical input
- purity check (input not mutated, repeatable output)
Per the 2026-04-29 cross-lineage scrum's discipline: false-positive
findings would be dismissed inline (none in this commit). Real
findings would be addressed before merge — but this is greenfield
port code reviewed against its Rust source line-by-line, which the
test suite encodes as truth tables.
Explicitly DEFERRED to follow-up commits:
- Materialization layer (jsonl read/write, date-partitioned
storage in data/scored-runs/YYYY/MM/DD/, evidence index)
- SFT exporter (file iteration + filtering — the SCORING firewall
is here; the EXPORT firewall is the next layer)
- export_preference, export_rag (other export shapes)
- Acceptance harness (16/16 acceptance gate that locks v1.0.0)
- replay, receipts, build_evidence_index, transforms
The scorer + firewall validator are pure functions — operational
tooling layers on top without changing the deterministic logic the
downstream learning loop depends on. The Go ScorerVersion stays at
v1.0.0 to match the Rust e7636f2 baseline; bumping in the Go
materialization commit is reserved for the next scoring-rule
change, NOT the port itself.
15-smoke regression all green. vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>