Phase 6 acceptance + Phase 8 full-audit reports re-run; bit-for-bit reproducibility property still holds (run 1 hash == run 2 hash), just at a new value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.4 KiB
Phase 6 — Acceptance Gate Report
Run: 2026-04-27T15:43:37.943Z
Fixture: tests/fixtures/distillation/acceptance/
Temp root: /tmp/distillation_phase6_acceptance
Pipeline run_ids: acceptance-run-1-stable (first) + acceptance-run-2-stable (second / hash reproducibility)
Result: PASS ✓
Pipeline counts (first run)
- collect: 14 records out · 1 skipped
- score: accepted=6 rejected=4 quarantined=4
- export-rag: 7 rows
- export-sft: 5 rows
- export-preference: 2 pairs
Invariant checks (expected vs actual)
| # | Check | Expected | Actual | Status |
|---|---|---|---|---|
| 1 | receipts: all 5 stages emitted | collect,score,export-rag,export-sft,export-preference | all present | ✓ |
| 2 | summary.json exists | exists | exists | ✓ |
| 3 | summary.md exists | exists | exists | ✓ |
| 4 | drift.json exists | exists | exists | ✓ |
| 5 | every StageReceipt validates against schema | 0 invalid | 0 invalid | ✓ |
| 6 | RunSummary validates | valid | valid | ✓ |
| 7 | DriftReport validates | valid | valid | ✓ |
| 8 | SFT: ≥1 accepted record exported | >=1 | 5 | ✓ |
| 9 | SFT contamination firewall: no rejected/needs_human_review | 0 | 0 | ✓ |
| 10 | SFT default mode: 0 partial leaks (no --include-partial used) | 0 | 0 | ✓ |
| 11 | RAG: 0 rejected leaks | 0 | 0 | ✓ |
| 12 | RAG: ≥1 partially_accepted accepted (RAG accepts partial) | >=1 | 2 | ✓ |
| 13 | Preference: ≥1 valid pair exported | >=1 | 2 | ✓ |
| 14 | Preference: 0 self-pairs (chosen_run_id != rejected_run_id) | 0 | 0 | ✓ |
| 15 | Preference: 0 identical-text pairs | 0 | 0 | ✓ |
| 16 | every export row has valid sha256 provenance.sig_hash | 0 missing | 0 missing | ✓ |
| 17 | Phase 2 collect: missing-provenance fixture row skipped to distillation_skips.jsonl | ≥1 skip recorded | 1 skip(s) | ✓ |
| 18 | SFT quarantine: rejected/needs_human caught at unsafe_sft_category gate | ≥1 | 6 | ✓ |
| 19 | scratchpad/tree-split case: fixture row materialized into evidence | found | found | ✓ |
| 20 | PRD drift case: fixture row materialized | found | found | ✓ |
| 21 | hash reproducibility: per-stage output_hash identical across runs | 0 mismatches | all match | ✓ |
| 22 | hash reproducibility: run_hash identical | 8dfdacee62380ec2... | 8dfdacee62380ec2... | ✓ |
Hash reproducibility detail
run 1 run_hash: 8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0
run 2 run_hash: 8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0
Bit-for-bit identical. Two runs of the entire pipeline on the same fixture with the same recorded_at produce the same outputs. Distillation is deterministic.
Leak prevention confirmation
- SFT rows with rejected/needs_human_review quality_score: 0 (must be 0)
- SFT rows with partially_accepted quality_score (default mode): 0 (must be 0; would only appear with --include-partial)
- RAG rows with rejected success_score: 0 (must be 0)
- Preference self-pairs (chosen_run_id == rejected_run_id): 0 (must be 0)
- Preference identical-text pairs: 0 (must be 0)
What this proves
The distillation pipeline is safe, reproducible, and gated. Accepted data flows through; rejected/needs_human_review data is quarantined with reasons; preference pairs are real, not fabricated; every output traces to source via canonical sha256; running the whole pipeline twice on the same fixture produces byte-identical outputs.