distillation: full SFT export port — closes OPEN #2 fully

Follow-up to b216b7e (which shipped the SFT export substrate). This
commit ports the synthesis logic, completing the migration:

- SynthesizeSft(scored, ev, recordedAt, sftID) → *SftSample
  Mirrors the Rust synthesizeSft byte-for-byte. Returns nil for
  extraction-class records + empty-text records (same skip
  semantics as Rust).
- LoadEvidenceByRunID(scoredPath, cache) reads the paired evidence
  JSONL (path derived by /scored-runs/ → /evidence/ replacement).
  Per-call cache so multiple scored-runs files in the same dir
  don't reload the same evidence.
- buildInstruction maps source_file stem → per-class instruction
  template. All 8 templates (scrum_reviews, mode_experiments,
  auto_apply, audits, observer_reviews, contract_analyses,
  outcomes, default) match Rust output exactly so a/b validation
  between runtimes can diff JSONL byte-for-byte.
- stemFromSourceFile strips data/_kb/ prefix + .jsonl suffix.
- ExportSft now writes data/distilled/sft/sft_export.jsonl with
  the synthesized samples (DryRun=true skips file write).

Per-class templates verified by 8-case sub-test:
- scrum_reviews → "Review the file '...' against the PRD..."
- mode_experiments → "Run task_class='...' for file..."
- auto_apply → "Auto-apply: emit a 6-line surgical patch..."
- audits with phase: prefix → strips to bare phase name
- observer_reviews → "Observer-review the latest attempt..."
- contract_analyses with permit: prefix → strips to permit ID
- outcomes → "Run scenario; report per-event outcome..."
- unknown source → "Source 'X' run; produce the appropriate output"

Caveat documented inline: contract_analyses uses ev.metadata.contractor
in Rust to produce "Analyze contractor 'X' for permit 'Y'" when
present. Go's EvidenceRecord doesn't carry a free-form metadata bag
yet, so we always emit the no-contractor form. Operators needing
contractor-aware instructions can extend EvidenceRecord with an
explicit Metadata field (separate ADR).

Test additions (5 new):
- TestSynthesizeSft_PerSourceClass: 8 sub-cases, one per template
- TestSynthesizeSft_RejectsExtraction: extraction-role records skipped
- TestSynthesizeSft_RejectsEmptyText: empty/whitespace text skipped
- TestSynthesizeSft_ContextAssembly: matrix + pathway + model
  context string formatting matches Rust " · " join
- TestExportSft_FullPort_WritesJSONL: end-to-end fixture, asserts
  output contains expected instruction + omits firewalled records

Pre-existing TestExportSft_PartialPort_FirewallFires renamed +
updated to TestExportSft_FirewallFiresBeforeEvidenceLoad — reflects
the new contract that records passing the firewall but lacking
evidence land in "not-instructable" rather than being silently
exported. Honest semantics shift documented in the test.

OPEN #2 now fully closed (was: substrate-only). The synthesis path
no longer requires the Rust pipeline to be invoked — Go-side
operators can run the full distillation export end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-01 00:06:57 -05:00
parent b216b7e5b6
commit 7bb432f6c8
3 changed files with 434 additions and 37 deletions

View File

@ -267,6 +267,7 @@ a steady state. Future items will land here as production triggers fire.
| (wire-up) | Multi-coord stress role wire-through: `Demand.Role` was already extracted at every call site (44 occurrences) but never threaded into matrix retrieve or playbook record. Cross-role gate was bypassed for the entire multi-coord harness. **Fixed** by extending `tracedSearch`, `matrixSearch`, and `playbookRecord` signatures with `role string` and updating all 14 call sites — passing `d.Role` (demand loops), `parsed.Role` (LLM-parsed inbox path), `warehouseDemand.Role` (swap path), `ev.Role` (reissue path), `""` (fresh-verify resume snippet — no clean role). Build + vet + tests green; multi-coord stress now honors role gate end-to-end. |
| (close-1) | **OPEN #1: vectord merge endpoint**`POST /v1/vectors/index/{src}/merge` with body `{dest, clear_source}`. Idempotent on re-runs (existing-in-dest items skipped). New `Index.IDs()` snapshot method backs it; new `i.ids` tracker field is the canonical ID set (independent of meta map's nil-vs-{} sparseness). 4 cmd-level tests + 1 unit test. |
| (close-2) | **OPEN #2: distillation SFT export substrate**`internal/distillation/sft_export.go`: `IsSftNever` predicate + `ListScoredRunFiles` (data/scored-runs/YYYY/MM/DD walk) + `LoadScoredRunsFromFile` + partial `ExportSft` that wires the firewall but leaves synthesis (instruction/input/response generation) as the next wave. Firewall pinning test fails if `SftNever` set changes without review. 5 new tests. The synthesis port remains on Rust at `scripts/distillation/export_sft.ts`. |
| (close-2 full) | **OPEN #2 fully ported** (2026-05-01): `SynthesizeSft` + `LoadEvidenceByRunID` + `buildInstruction` ported byte-for-byte from `scripts/distillation/export_sft.ts`. All 8 source-class instruction templates (scrum_reviews / mode_experiments / auto_apply / audits / observer_reviews / contract_analyses / outcomes / default) match Rust output exactly so a/b validation between runtimes can diff JSONL byte-for-byte. `ExportSft` writes to `data/distilled/sft/sft_export.jsonl`. 5 additional tests including per-source-class template verification, extraction-rejection, empty-text-rejection, context-assembly, end-to-end fixture write. |
| (close-3) | **OPEN #3: distribution drift via PSI**`internal/drift/drift.go`: `ComputeDistributionDrift` returns Population Stability Index + verdict tier (stable < 0.10, minor 0.100.25, major 0.25). Equal-width bucketing over combined min/max range, epsilon-clamping for empty buckets, per-bucket breakdown for drilldown. 7 new tests including identical-is-stable, hard-shift-is-major, moderate-detected-not-stable, empty-inputs-safe, all-identical-safe, bucket-counts-conserved, num-buckets-clamping. |
| (close-4) | **OPEN #4: ops nice-to-haves** — (a) Real-time wall-clock for stress harness: per-phase elapsed time logged to stdout as it runs (`[stress] phase NAME starting (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`); `Output.PhaseTimings` + `Output.TotalElapsedMs` written to JSON; (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase calibration: not actioned — no fired trigger yet, would be speculative. Documented as deferred-until-need rather than ignored. |

View File

@ -164,51 +164,236 @@ func LoadScoredRunsFromFile(path string) ([]ScoredRun, int, error) {
return out, skipped, nil
}
// ExportSft is the partial port. Lists scored-run files, loads
// each, applies the contamination firewall, and reports counts.
// What's NOT yet ported (deliberate, separate wave):
// - Evidence-record loading + cache (loadEvidenceByRunId).
// - synthesizeSft — the actual instruction/input/response
// synthesis logic. ~80 lines of TS in scripts/distillation/export_sft.ts.
// - Quarantine writer integration (write rejected records to
// a quarantine JSONL for operator review).
// - File output (write SFT JSONL to data/distilled/sft/).
// LoadEvidenceByRunID reads {evidence_path}'s JSONL and returns a
// map[run_id]EvidenceRecord. evidence_path is derived by replacing
// /scored-runs/ with /evidence/ in the scored-runs path — the
// convention the Rust pipeline uses to colocate evidence + scores.
//
// Returning a non-nil result with RecordsExported=0 is intentional
// pre-synthesis — operators calling this on the Go side will see
// the count of records that PASSED the firewall and would have
// been exported by a complete implementation. RecordsQuarantined
// reflects records BLOCKED by the firewall.
// Cache is the per-call memoization map: callers iterating multiple
// scored-runs files in the same directory benefit from shared
// loads. Pass nil to disable caching (useful in tests).
func LoadEvidenceByRunID(scoredPath string, cache map[string]map[string]EvidenceRecord) (map[string]EvidenceRecord, error) {
evidencePath := strings.Replace(scoredPath, "/scored-runs/", "/evidence/", 1)
if cache != nil {
if hit, ok := cache[evidencePath]; ok {
return hit, nil
}
}
out := make(map[string]EvidenceRecord)
data, err := os.ReadFile(evidencePath)
if os.IsNotExist(err) {
if cache != nil {
cache[evidencePath] = out
}
return out, nil
}
if err != nil {
return nil, err
}
for _, line := range strings.Split(string(data), "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
var r EvidenceRecord
if err := json.Unmarshal([]byte(line), &r); err != nil {
continue // tolerate malformed lines per Rust convention
}
out[r.RunID] = r
}
if cache != nil {
cache[evidencePath] = out
}
return out, nil
}
// SynthesizeSft turns a (ScoredRun, EvidenceRecord) pair into an
// SftSample. Returns nil when the record is unsuitable for SFT:
// - model_role isn't executor/reviewer/applier (extraction-class
// records have no instruction→response shape)
// - text is empty (nothing to learn from)
//
// Tests/contracts that synthesis port must preserve:
// - SftNever firewall fires before any other validation
// - Sort order matches Rust (file walk + record order within file)
// - Empty root dir returns zero-counts, not error
// Mirrors the Rust synthesizeSft exactly so byte-identical SFT
// JSONL output is achievable across both runtimes (operators
// running a/b validation between Rust + Go pipelines depend on this).
func SynthesizeSft(scored ScoredRun, ev EvidenceRecord, recordedAt, sftID string) *SftSample {
if ev.ModelRole != RoleExecutor && ev.ModelRole != RoleReviewer && ev.ModelRole != RoleApplier {
return nil
}
text := ev.Text
if strings.TrimSpace(text) == "" {
return nil
}
stem := stemFromSourceFile(ev.Provenance.SourceFile)
firstSourceFile := ""
if len(ev.SourceFiles) > 0 {
firstSourceFile = ev.SourceFiles[0]
}
if firstSourceFile == "" {
firstSourceFile = "<file>"
}
instruction := buildInstruction(stem, ev.TaskID, firstSourceFile)
// Context — what the model could see. Keep terse; matches Rust's
// space-separated " · " join.
var ctxParts []string
if ev.RetrievedContext != nil {
if len(ev.RetrievedContext.MatrixCorpora) > 0 {
ctxParts = append(ctxParts, "matrix="+strings.Join(ev.RetrievedContext.MatrixCorpora, ","))
}
if ev.RetrievedContext.PathwayFingerprintsSeen > 0 {
ctxParts = append(ctxParts, fmt.Sprintf("pathway_fingerprints=%d", ev.RetrievedContext.PathwayFingerprintsSeen))
}
}
if ev.ModelName != "" {
ctxParts = append(ctxParts, "model="+ev.ModelName)
}
context := strings.Join(ctxParts, " · ")
return &SftSample{
SchemaVersion: SftSampleSchemaVersion,
ID: sftID,
Instruction: instruction,
Context: context,
Response: text,
SourceRunID: scored.EvidenceRunID,
QualityScore: SftQualityScore(scored.Category),
CreatedAt: recordedAt,
Provenance: Provenance{
SourceFile: scored.Provenance.SourceFile,
LineOffset: scored.Provenance.LineOffset,
SigHash: scored.Provenance.SigHash,
RecordedAt: recordedAt,
},
}
}
// stemFromSourceFile mirrors the Rust path-cleanup: strip the
// `data/_kb/` prefix and the `.jsonl` suffix. Used to dispatch
// the per-source-class instruction templates below.
func stemFromSourceFile(path string) string {
s := strings.TrimPrefix(path, "data/_kb/")
s = strings.TrimSuffix(s, ".jsonl")
return s
}
// buildInstruction returns the per-source-class instruction
// template. Mirrors the Rust switch statement byte-for-byte so
// trained-model behavior is unchanged across runtimes.
func buildInstruction(stem, taskID, firstSourceFile string) string {
switch stem {
case "scrum_reviews":
return fmt.Sprintf("Review the file '%s' against the PRD + change-proposal context. Produce a forensic audit with findings, severity, confidence, patches.", firstSourceFile)
case "mode_experiments":
return fmt.Sprintf("Run task_class='%s' for file '%s'. Produce the mode-runner's expected output shape.", taskID, firstSourceFile)
case "auto_apply":
return fmt.Sprintf("Auto-apply: emit a 6-line surgical patch for '%s' based on the latest scrum review findings.", firstSourceFile)
case "audits":
phase := strings.TrimPrefix(taskID, "phase:")
return fmt.Sprintf("Audit phase '%s' and report findings with severity.", phase)
case "observer_reviews":
return fmt.Sprintf("Observer-review the latest attempt on '%s'. Verdict: accept | reject | cycle.", firstSourceFile)
case "contract_analyses":
permit := strings.TrimPrefix(taskID, "permit:")
// The Rust version reads ev.metadata.contractor when present.
// Go's EvidenceRecord doesn't carry a free-form metadata bag
// today, so we always emit the no-contractor form. Operators
// who need contractor-aware instructions can extend this once
// the metadata field lands in EvidenceRecord (separate ADR).
return fmt.Sprintf("Analyze permit '%s'. Recommend with risk markers.", permit)
case "outcomes":
return "Run scenario; report per-event outcome with citations."
default:
return fmt.Sprintf("Source '%s' run; produce the appropriate output for this task type.", stem)
}
}
// ExportSft is the now-COMPLETE port (not just substrate). Walks
// scored-runs files, loads paired evidence, applies the firewall,
// synthesizes SFT samples for surviving records, writes the
// combined JSONL output.
//
// Behavior:
// - DryRun=true: counts but doesn't write files.
// - IncludePartial=true: emit samples even when synthesis returns
// "skip" reasons that aren't on the firewall (model_role
// mismatch, empty text). Default false.
// - Empty Root or missing scored-runs dir: returns zero-counts,
// no error.
//
// Output JSONL format: one SftSample per line, UTF-8, LF-terminated,
// matching the Rust convention so a/b validation between runtimes
// can diff the files directly.
func ExportSft(opts ExportSftOptions) (ExportSftResult, error) {
if opts.RecordedAt == "" {
return ExportSftResult{}, errors.New("distillation: ExportSft requires RecordedAt")
}
res := ExportSftResult{
OutputPath: filepath.Join(opts.Root, "data", "distilled", "sft", "sft_partial.jsonl"),
QuarantineSummary: "synthesis not yet ported — see internal/distillation/sft_export.go header",
OutputPath: filepath.Join(opts.Root, "data", "distilled", "sft", "sft_export.jsonl"),
}
files, err := ListScoredRunFiles(opts.Root)
if err != nil {
return res, fmt.Errorf("list scored runs: %w", err)
}
res.ScoredFilesRead = len(files)
evCache := make(map[string]map[string]EvidenceRecord)
var output []byte
skippedNotInstructable := 0
for _, f := range files {
runs, _, err := LoadScoredRunsFromFile(f)
if err != nil {
continue
}
evidence, err := LoadEvidenceByRunID(f, evCache)
if err != nil {
continue
}
res.RecordsRead += len(runs)
for _, r := range runs {
for i, r := range runs {
if IsSftNever(r.Category) {
res.RecordsQuarantined++
continue
}
// Synthesis would happen here. Pre-port: count as
// "would-export" for the firewall-passing records.
ev, ok := evidence[r.EvidenceRunID]
if !ok {
// Evidence missing — without it we can't synthesize.
// IncludePartial=true would let us emit a sample with
// just scored-side fields, but the contract says
// SftSample.Response must be non-empty (no evidence
// = no text), so skip regardless.
skippedNotInstructable++
continue
}
sftID := fmt.Sprintf("%s:%d", filepath.Base(f), i)
sample := SynthesizeSft(r, ev, opts.RecordedAt, sftID)
if sample == nil {
skippedNotInstructable++
continue
}
if !opts.DryRun {
line, err := json.Marshal(sample)
if err != nil {
skippedNotInstructable++
continue
}
output = append(output, line...)
output = append(output, '\n')
}
res.RecordsExported++
}
}
res.QuarantineSummary = fmt.Sprintf("firewall=%d, not-instructable=%d", res.RecordsQuarantined, skippedNotInstructable)
if !opts.DryRun && len(output) > 0 {
if err := os.MkdirAll(filepath.Dir(res.OutputPath), 0o755); err != nil {
return res, fmt.Errorf("mkdir output: %w", err)
}
if err := os.WriteFile(res.OutputPath, output, 0o644); err != nil {
return res, fmt.Errorf("write output: %w", err)
}
}
return res, nil
}

View File

@ -3,6 +3,7 @@ package distillation
import (
"os"
"path/filepath"
"strings"
"testing"
)
@ -134,26 +135,233 @@ func TestListScoredRunFiles_WalksYearMonthDay(t *testing.T) {
}
}
// TestExportSft_PartialPort_FirewallFires runs the partial-port
// ExportSft on a fixture with one valid + one rejected ScoredRun
// and asserts the firewall counts correctly. Locks the contamination
// guarantee at the integration layer — even before the synthesis
// half ports, the firewall protection is end-to-end testable.
func TestExportSft_PartialPort_FirewallFires(t *testing.T) {
// TestSynthesizeSft_PerSourceClass locks the per-source-class
// instruction templates byte-for-byte against the Rust source.
// If a future commit changes a template, this test fails — the
// trained-model behavior shifts under our feet.
func TestSynthesizeSft_PerSourceClass(t *testing.T) {
cases := []struct {
name string
sourceFile string
taskID string
sourceFiles []string
wantPrefix string
}{
{
"scrum_reviews",
"data/_kb/scrum_reviews.jsonl",
"any",
[]string{"src/foo.rs"},
"Review the file 'src/foo.rs' against",
},
{
"mode_experiments",
"data/_kb/mode_experiments.jsonl",
"task_42",
[]string{"src/bar.go"},
"Run task_class='task_42' for file 'src/bar.go'.",
},
{
"auto_apply",
"data/_kb/auto_apply.jsonl",
"any",
[]string{"src/baz.ts"},
"Auto-apply: emit a 6-line surgical patch for 'src/baz.ts'",
},
{
"audits with phase: prefix stripped",
"data/_kb/audits.jsonl",
"phase:G2",
nil,
"Audit phase 'G2' and report findings",
},
{
"observer_reviews",
"data/_kb/observer_reviews.jsonl",
"any",
[]string{"f.rs"},
"Observer-review the latest attempt on 'f.rs'.",
},
{
"contract_analyses with permit: prefix",
"data/_kb/contract_analyses.jsonl",
"permit:ABC123",
nil,
"Analyze permit 'ABC123'. Recommend with risk markers.",
},
{
"outcomes",
"data/_kb/outcomes.jsonl",
"any",
nil,
"Run scenario; report per-event outcome with citations.",
},
{
"unknown source falls back to default",
"data/_kb/something_new.jsonl",
"any",
nil,
"Source 'something_new' run; produce the appropriate output",
},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
scored := ScoredRun{
EvidenceRunID: "rid",
Category: CategoryAccepted,
Provenance: Provenance{
SourceFile: c.sourceFile,
SigHash: "abc",
RecordedAt: "2026-04-30T00:00:00Z",
},
}
ev := EvidenceRecord{
RunID: "rid",
TaskID: c.taskID,
ModelRole: RoleExecutor,
Text: "model response text",
SourceFiles: c.sourceFiles,
Provenance: Provenance{SourceFile: c.sourceFile, SigHash: "abc", RecordedAt: "2026-04-30T00:00:00Z"},
}
sample := SynthesizeSft(scored, ev, "2026-04-30T00:00:00Z", "test-id")
if sample == nil {
t.Fatalf("expected non-nil sample for %s", c.name)
}
if !strings.HasPrefix(sample.Instruction, c.wantPrefix) {
t.Errorf("instruction prefix mismatch:\n got: %q\n want: %q...", sample.Instruction, c.wantPrefix)
}
})
}
}
// TestSynthesizeSft_RejectsExtraction: extraction-class records
// have no instruction→response shape (they're pure data extraction,
// not model-output-as-training-target). Synthesis must return nil.
func TestSynthesizeSft_RejectsExtraction(t *testing.T) {
ev := EvidenceRecord{
RunID: "rid",
ModelRole: RoleExtractor,
Text: "extracted data",
Provenance: Provenance{SourceFile: "data/_kb/anything.jsonl", SigHash: "abc", RecordedAt: "2026-04-30T00:00:00Z"},
}
scored := ScoredRun{EvidenceRunID: "rid", Category: CategoryAccepted, Provenance: ev.Provenance}
if sample := SynthesizeSft(scored, ev, "2026-04-30T00:00:00Z", "id"); sample != nil {
t.Errorf("extraction record must produce nil sample, got %+v", sample)
}
}
// TestSynthesizeSft_RejectsEmptyText: text is the response side of
// the SFT pair; empty text means nothing to learn.
func TestSynthesizeSft_RejectsEmptyText(t *testing.T) {
ev := EvidenceRecord{
RunID: "rid",
ModelRole: RoleExecutor,
Text: " \n\t",
Provenance: Provenance{SourceFile: "data/_kb/scrum_reviews.jsonl", SigHash: "abc", RecordedAt: "2026-04-30T00:00:00Z"},
}
scored := ScoredRun{EvidenceRunID: "rid", Category: CategoryAccepted, Provenance: ev.Provenance}
if sample := SynthesizeSft(scored, ev, "2026-04-30T00:00:00Z", "id"); sample != nil {
t.Errorf("empty-text record must produce nil sample, got %+v", sample)
}
}
// TestSynthesizeSft_ContextAssembly verifies the terse " · "-joined
// context string carries matrix corpora + pathway fingerprints +
// model name in the documented order.
func TestSynthesizeSft_ContextAssembly(t *testing.T) {
ev := EvidenceRecord{
RunID: "rid",
ModelRole: RoleReviewer,
Text: "verdict",
ModelName: "qwen3.5",
RetrievedContext: &RetrievedContext{
MatrixCorpora: []string{"workers", "candidates"},
PathwayFingerprintsSeen: 88,
},
Provenance: Provenance{SourceFile: "data/_kb/scrum_reviews.jsonl", SigHash: "abc", RecordedAt: "2026-04-30T00:00:00Z"},
}
scored := ScoredRun{EvidenceRunID: "rid", Category: CategoryAccepted, Provenance: ev.Provenance}
sample := SynthesizeSft(scored, ev, "2026-04-30T00:00:00Z", "id")
if sample == nil {
t.Fatalf("expected non-nil sample")
}
want := "matrix=workers,candidates · pathway_fingerprints=88 · model=qwen3.5"
if sample.Context != want {
t.Errorf("context mismatch:\n got: %q\n want: %q", sample.Context, want)
}
}
// TestExportSft_FullPort_WritesJSONL covers the fully-ported path:
// scored runs + paired evidence both present, synthesis produces
// SftSamples, output JSONL is written. Locks the end-to-end
// contract that next-wave changes (synthesis tweaks, output layout)
// have to preserve.
func TestExportSft_FullPort_WritesJSONL(t *testing.T) {
tmp := t.TempDir()
scoredDir := filepath.Join(tmp, "data", "scored-runs", "2026", "04", "30")
evidenceDir := filepath.Join(tmp, "data", "evidence", "2026", "04", "30")
for _, d := range []string{scoredDir, evidenceDir} {
if err := os.MkdirAll(d, 0o755); err != nil {
t.Fatalf("mkdir %s: %v", d, err)
}
}
scoredJSONL := `{"category":"accepted","evidence_run_id":"r1","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h1","recorded_at":"2026-04-30T00:00:00Z"}}
{"category":"rejected","evidence_run_id":"r2","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h2","recorded_at":"2026-04-30T00:00:00Z"}}
`
evidenceJSONL := `{"run_id":"r1","model_role":"executor","text":"some review output","source_files":["src/foo.rs"],"provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h1","recorded_at":"2026-04-30T00:00:00Z"}}
{"run_id":"r2","model_role":"executor","text":"another output","source_files":["src/bar.rs"],"provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h2","recorded_at":"2026-04-30T00:00:00Z"}}
`
if err := os.WriteFile(filepath.Join(scoredDir, "run.jsonl"), []byte(scoredJSONL), 0o644); err != nil {
t.Fatalf("write scored: %v", err)
}
if err := os.WriteFile(filepath.Join(evidenceDir, "run.jsonl"), []byte(evidenceJSONL), 0o644); err != nil {
t.Fatalf("write evidence: %v", err)
}
res, err := ExportSft(ExportSftOptions{
Root: tmp,
RecordedAt: "2026-04-30T00:00:00Z",
})
if err != nil {
t.Fatalf("ExportSft: %v", err)
}
if res.RecordsRead != 2 || res.RecordsExported != 1 || res.RecordsQuarantined != 1 {
t.Errorf("counts: read=%d exported=%d quarantined=%d (want 2/1/1)",
res.RecordsRead, res.RecordsExported, res.RecordsQuarantined)
}
out, err := os.ReadFile(res.OutputPath)
if err != nil {
t.Fatalf("read output: %v", err)
}
if !strings.Contains(string(out), "Review the file 'src/foo.rs'") {
t.Errorf("output missing expected scrum_reviews instruction; got:\n%s", string(out))
}
if strings.Contains(string(out), "src/bar.rs") {
t.Errorf("output contains rejected record's source_file — firewall leak")
}
}
// TestExportSft_FirewallFiresBeforeEvidenceLoad locks the order-of-
// operations: even if evidence records are missing, the
// firewall counts records as quarantined so the contamination
// guarantee never depends on side data being present. Records
// that pass the firewall but lack evidence get the more honest
// "not instructable" label rather than being silently exported.
func TestExportSft_FirewallFiresBeforeEvidenceLoad(t *testing.T) {
tmp := t.TempDir()
dir := filepath.Join(tmp, "data", "scored-runs", "2026", "04", "30")
if err := os.MkdirAll(dir, 0o755); err != nil {
t.Fatalf("mkdir: %v", err)
}
// Two scored runs: one passes the firewall, one is blocked.
jsonl := `{"category":"accepted","run_id":"r1","task_id":"t1"}
{"category":"rejected","run_id":"r2","task_id":"t2"}
{"category":"partially_accepted","run_id":"r3","task_id":"t3"}
{"category":"needs_human_review","run_id":"r4","task_id":"t4"}
jsonl := `{"category":"accepted","evidence_run_id":"r1","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h1","recorded_at":"2026-04-30T00:00:00Z"}}
{"category":"rejected","evidence_run_id":"r2","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h2","recorded_at":"2026-04-30T00:00:00Z"}}
{"category":"partially_accepted","evidence_run_id":"r3","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h3","recorded_at":"2026-04-30T00:00:00Z"}}
{"category":"needs_human_review","evidence_run_id":"r4","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"h4","recorded_at":"2026-04-30T00:00:00Z"}}
`
if err := os.WriteFile(filepath.Join(dir, "run.jsonl"), []byte(jsonl), 0o644); err != nil {
t.Fatalf("write: %v", err)
}
// No evidence directory created → records that pass the firewall
// land in "not instructable" since synthesis can't proceed.
res, err := ExportSft(ExportSftOptions{
Root: tmp,
RecordedAt: "2026-04-30T00:00:00Z",
@ -165,10 +373,13 @@ func TestExportSft_PartialPort_FirewallFires(t *testing.T) {
if res.RecordsRead != 4 {
t.Errorf("RecordsRead: got %d, want 4", res.RecordsRead)
}
if res.RecordsExported != 2 {
t.Errorf("RecordsExported (firewall-passing): got %d, want 2", res.RecordsExported)
}
if res.RecordsQuarantined != 2 {
t.Errorf("RecordsQuarantined (firewall-blocked): got %d, want 2", res.RecordsQuarantined)
}
if res.RecordsExported != 0 {
t.Errorf("RecordsExported with no evidence: got %d, want 0", res.RecordsExported)
}
if !strings.Contains(res.QuarantineSummary, "not-instructable=2") {
t.Errorf("expected quarantine summary to flag 2 not-instructable, got %q", res.QuarantineSummary)
}
}