Auto-generated by `./scripts/distill release-freeze` — RELEASE-READY (6/6 gates). Captures the v1.0.0 manifest + the latest acceptance + audit reports re-run during the freeze. reports/distillation/release-freeze.md human-readable manifest reports/distillation/release-manifest.json machine-readable manifest reports/distillation/phase6-acceptance-report.md re-run during freeze (22/22 invariants) reports/distillation/phase8-full-audit-report.md re-run during freeze (16/16 required) Pre-tag state: branch: scrum/auto-apply-19814 head: <prior commit before this one> full pipeline: 145 distillation tests pass · 0 fail acceptance: 22/22 invariants on fixture, bit-identical reproducibility audit-full: 16/16 required across Phases 0-7 Tag command awaiting operator confirmation: git tag -a distillation-v1.0.0 -m "distillation v1.0.0 — 8-phase substrate frozen" git push origin distillation-v1.0.0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.4 KiB
Phase 6 — Acceptance Gate Report
Run: 2026-04-27T04:54:32.225Z
Fixture: tests/fixtures/distillation/acceptance/
Temp root: /tmp/distillation_phase6_acceptance
Pipeline run_ids: acceptance-run-1-stable (first) + acceptance-run-2-stable (second / hash reproducibility)
Result: PASS ✓
Pipeline counts (first run)
- collect: 14 records out · 1 skipped
- score: accepted=6 rejected=4 quarantined=4
- export-rag: 7 rows
- export-sft: 5 rows
- export-preference: 2 pairs
Invariant checks (expected vs actual)
| # | Check | Expected | Actual | Status |
|---|---|---|---|---|
| 1 | receipts: all 5 stages emitted | collect,score,export-rag,export-sft,export-preference | all present | ✓ |
| 2 | summary.json exists | exists | exists | ✓ |
| 3 | summary.md exists | exists | exists | ✓ |
| 4 | drift.json exists | exists | exists | ✓ |
| 5 | every StageReceipt validates against schema | 0 invalid | 0 invalid | ✓ |
| 6 | RunSummary validates | valid | valid | ✓ |
| 7 | DriftReport validates | valid | valid | ✓ |
| 8 | SFT: ≥1 accepted record exported | >=1 | 5 | ✓ |
| 9 | SFT contamination firewall: no rejected/needs_human_review | 0 | 0 | ✓ |
| 10 | SFT default mode: 0 partial leaks (no --include-partial used) | 0 | 0 | ✓ |
| 11 | RAG: 0 rejected leaks | 0 | 0 | ✓ |
| 12 | RAG: ≥1 partially_accepted accepted (RAG accepts partial) | >=1 | 2 | ✓ |
| 13 | Preference: ≥1 valid pair exported | >=1 | 2 | ✓ |
| 14 | Preference: 0 self-pairs (chosen_run_id != rejected_run_id) | 0 | 0 | ✓ |
| 15 | Preference: 0 identical-text pairs | 0 | 0 | ✓ |
| 16 | every export row has valid sha256 provenance.sig_hash | 0 missing | 0 missing | ✓ |
| 17 | Phase 2 collect: missing-provenance fixture row skipped to distillation_skips.jsonl | ≥1 skip recorded | 1 skip(s) | ✓ |
| 18 | SFT quarantine: rejected/needs_human caught at unsafe_sft_category gate | ≥1 | 6 | ✓ |
| 19 | scratchpad/tree-split case: fixture row materialized into evidence | found | found | ✓ |
| 20 | PRD drift case: fixture row materialized | found | found | ✓ |
| 21 | hash reproducibility: per-stage output_hash identical across runs | 0 mismatches | all match | ✓ |
| 22 | hash reproducibility: run_hash identical | 3ea12b160ee9099a... | 3ea12b160ee9099a... | ✓ |
Hash reproducibility detail
run 1 run_hash: 3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2
run 2 run_hash: 3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2
Bit-for-bit identical. Two runs of the entire pipeline on the same fixture with the same recorded_at produce the same outputs. Distillation is deterministic.
Leak prevention confirmation
- SFT rows with rejected/needs_human_review quality_score: 0 (must be 0)
- SFT rows with partially_accepted quality_score (default mode): 0 (must be 0; would only appear with --include-partial)
- RAG rows with rejected success_score: 0 (must be 0)
- Preference self-pairs (chosen_run_id == rejected_run_id): 0 (must be 0)
- Preference identical-text pairs: 0 (must be 0)
What this proves
The distillation pipeline is safe, reproducible, and gated. Accepted data flows through; rejected/needs_human_review data is quarantined with reasons; preference pairs are real, not fabricated; every output traces to source via canonical sha256; running the whole pipeline twice on the same fixture produces byte-identical outputs.