root e7636f202b

lakehouse/auditor 13 blocking issues: cloud: claim not backed — "Phase 8 done-criteria (per spec):"

distillation: regenerate v1.0.0 release artifacts

Auto-generated by `./scripts/distill release-freeze` — RELEASE-READY (6/6 gates).
Captures the v1.0.0 manifest + the latest acceptance + audit reports
re-run during the freeze.

reports/distillation/release-freeze.md       human-readable manifest
reports/distillation/release-manifest.json   machine-readable manifest
reports/distillation/phase6-acceptance-report.md  re-run during freeze (22/22 invariants)
reports/distillation/phase8-full-audit-report.md  re-run during freeze (16/16 required)

Pre-tag state:
  branch: scrum/auto-apply-19814
  head:   <prior commit before this one>
  full pipeline: 145 distillation tests pass · 0 fail
  acceptance:    22/22 invariants on fixture, bit-identical reproducibility
  audit-full:    16/16 required across Phases 0-7

Tag command awaiting operator confirmation:
  git tag -a distillation-v1.0.0 -m "distillation v1.0.0 — 8-phase substrate frozen"
  git push origin distillation-v1.0.0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-26 23:54:44 -05:00

3.4 KiB

Raw Blame History

Phase 6 — Acceptance Gate Report

Run: 2026-04-27T04:54:32.225Z Fixture: tests/fixtures/distillation/acceptance/ Temp root: /tmp/distillation_phase6_acceptance Pipeline run_ids: acceptance-run-1-stable (first) + acceptance-run-2-stable (second / hash reproducibility)

Result: PASS ✓

Pipeline counts (first run)

collect: 14 records out · 1 skipped
score: accepted=6 rejected=4 quarantined=4
export-rag: 7 rows
export-sft: 5 rows
export-preference: 2 pairs

Invariant checks (expected vs actual)

#	Check	Expected	Actual	Status
1	receipts: all 5 stages emitted	collect,score,export-rag,export-sft,export-preference	all present	✓
2	summary.json exists	exists	exists	✓
3	summary.md exists	exists	exists	✓
4	drift.json exists	exists	exists	✓
5	every StageReceipt validates against schema	0 invalid	0 invalid	✓
6	RunSummary validates	valid	valid	✓
7	DriftReport validates	valid	valid	✓
8	SFT: ≥1 accepted record exported	>=1	5	✓
9	SFT contamination firewall: no rejected/needs_human_review	0	0	✓
10	SFT default mode: 0 partial leaks (no --include-partial used)	0	0	✓
11	RAG: 0 rejected leaks	0	0	✓
12	RAG: ≥1 partially_accepted accepted (RAG accepts partial)	>=1	2	✓
13	Preference: ≥1 valid pair exported	>=1	2	✓
14	Preference: 0 self-pairs (chosen_run_id != rejected_run_id)	0	0	✓
15	Preference: 0 identical-text pairs	0	0	✓
16	every export row has valid sha256 provenance.sig_hash	0 missing	0 missing	✓
17	Phase 2 collect: missing-provenance fixture row skipped to distillation_skips.jsonl	≥1 skip recorded	1 skip(s)	✓
18	SFT quarantine: rejected/needs_human caught at unsafe_sft_category gate	≥1	6	✓
19	scratchpad/tree-split case: fixture row materialized into evidence	found	found	✓
20	PRD drift case: fixture row materialized	found	found	✓
21	hash reproducibility: per-stage output_hash identical across runs	0 mismatches	all match	✓
22	hash reproducibility: run_hash identical	3ea12b160ee9099a...	3ea12b160ee9099a...	✓

Hash reproducibility detail

run 1 run_hash: 3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2

run 2 run_hash: 3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2

Bit-for-bit identical. Two runs of the entire pipeline on the same fixture with the same recorded_at produce the same outputs. Distillation is deterministic.

Leak prevention confirmation

SFT rows with rejected/needs_human_review quality_score: 0 (must be 0)
SFT rows with partially_accepted quality_score (default mode): 0 (must be 0; would only appear with --include-partial)
RAG rows with rejected success_score: 0 (must be 0)
Preference self-pairs (chosen_run_id == rejected_run_id): 0 (must be 0)
Preference identical-text pairs: 0 (must be 0)

What this proves

The distillation pipeline is safe, reproducible, and gated. Accepted data flows through; rejected/needs_human_review data is quarantined with reasons; preference pairs are real, not fabricated; every output traces to source via canonical sha256; running the whole pipeline twice on the same fixture produces byte-identical outputs.

3.4 KiB Raw Blame History