lakehouse/reports/distillation/phase6-acceptance-report.md
root 3d068681f5
Some checks failed
lakehouse/auditor 17 blocking issues: cloud: claim not backed — "Verified end-to-end via playwright on devop.live/lakehouse:"
distillation: regenerated acceptance + audit reports (run_hash refresh)
Phase 6 acceptance + Phase 8 full-audit reports re-run; bit-for-bit
reproducibility property still holds (run 1 hash == run 2 hash),
just at a new value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:13:17 -05:00

3.4 KiB

Phase 6 — Acceptance Gate Report

Run: 2026-04-27T15:43:37.943Z Fixture: tests/fixtures/distillation/acceptance/ Temp root: /tmp/distillation_phase6_acceptance Pipeline run_ids: acceptance-run-1-stable (first) + acceptance-run-2-stable (second / hash reproducibility)

Result: PASS

Pipeline counts (first run)

  • collect: 14 records out · 1 skipped
  • score: accepted=6 rejected=4 quarantined=4
  • export-rag: 7 rows
  • export-sft: 5 rows
  • export-preference: 2 pairs

Invariant checks (expected vs actual)

# Check Expected Actual Status
1 receipts: all 5 stages emitted collect,score,export-rag,export-sft,export-preference all present
2 summary.json exists exists exists
3 summary.md exists exists exists
4 drift.json exists exists exists
5 every StageReceipt validates against schema 0 invalid 0 invalid
6 RunSummary validates valid valid
7 DriftReport validates valid valid
8 SFT: ≥1 accepted record exported >=1 5
9 SFT contamination firewall: no rejected/needs_human_review 0 0
10 SFT default mode: 0 partial leaks (no --include-partial used) 0 0
11 RAG: 0 rejected leaks 0 0
12 RAG: ≥1 partially_accepted accepted (RAG accepts partial) >=1 2
13 Preference: ≥1 valid pair exported >=1 2
14 Preference: 0 self-pairs (chosen_run_id != rejected_run_id) 0 0
15 Preference: 0 identical-text pairs 0 0
16 every export row has valid sha256 provenance.sig_hash 0 missing 0 missing
17 Phase 2 collect: missing-provenance fixture row skipped to distillation_skips.jsonl ≥1 skip recorded 1 skip(s)
18 SFT quarantine: rejected/needs_human caught at unsafe_sft_category gate ≥1 6
19 scratchpad/tree-split case: fixture row materialized into evidence found found
20 PRD drift case: fixture row materialized found found
21 hash reproducibility: per-stage output_hash identical across runs 0 mismatches all match
22 hash reproducibility: run_hash identical 8dfdacee62380ec2... 8dfdacee62380ec2...

Hash reproducibility detail

run 1 run_hash: 8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0

run 2 run_hash: 8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0

Bit-for-bit identical. Two runs of the entire pipeline on the same fixture with the same recorded_at produce the same outputs. Distillation is deterministic.

Leak prevention confirmation

  • SFT rows with rejected/needs_human_review quality_score: 0 (must be 0)
  • SFT rows with partially_accepted quality_score (default mode): 0 (must be 0; would only appear with --include-partial)
  • RAG rows with rejected success_score: 0 (must be 0)
  • Preference self-pairs (chosen_run_id == rejected_run_id): 0 (must be 0)
  • Preference identical-text pairs: 0 (must be 0)

What this proves

The distillation pipeline is safe, reproducible, and gated. Accepted data flows through; rejected/needs_human_review data is quarantined with reasons; preference pairs are real, not fabricated; every output traces to source via canonical sha256; running the whole pipeline twice on the same fixture produces byte-identical outputs.