2026-05-03 05:16:17 +00:00
2 changed files with 12 additions and 12 deletions
--- a/reports/distillation/phase6-acceptance-report.md
+++ b/reports/distillation/phase6-acceptance-report.md
@ -1,6 +1,6 @@
 # Phase 6 — Acceptance Gate Report

-**Run:** 2026-04-27T04:54:32.225Z
+**Run:** 2026-04-27T15:43:37.943Z
 **Fixture:** `tests/fixtures/distillation/acceptance/`
 **Temp root:** `/tmp/distillation_phase6_acceptance`
 **Pipeline run_ids:** `acceptance-run-1-stable` (first) + `acceptance-run-2-stable` (second / hash reproducibility)
@ -40,13 +40,13 @@
 | 19 | scratchpad/tree-split case: fixture row materialized into evidence | found | found | ✓ |
 | 20 | PRD drift case: fixture row materialized | found | found | ✓ |
 | 21 | hash reproducibility: per-stage output_hash identical across runs | 0 mismatches | all match | ✓ |
-| 22 | hash reproducibility: run_hash identical | 3ea12b160ee9099a... | 3ea12b160ee9099a... | ✓ |
+| 22 | hash reproducibility: run_hash identical | 8dfdacee62380ec2... | 8dfdacee62380ec2... | ✓ |

 ## Hash reproducibility detail

-run 1 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
+run 1 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`

-run 2 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
+run 2 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`

 **Bit-for-bit identical.** Two runs of the entire pipeline on the same fixture with the same `recorded_at` produce the same outputs. Distillation is deterministic.

--- a/reports/distillation/phase8-full-audit-report.md
+++ b/reports/distillation/phase8-full-audit-report.md
@ -1,8 +1,8 @@
 # Phase 8 — Full System Audit Report

-**Run:** 2026-04-27T04:54:32.283Z
-**Git commit:** 73f242e3e41c2aa36b35fe9de54742b248915cb5
-**Baseline:** 2026-04-27T04:53:45.796Z (5bdd159966e6)
+**Run:** 2026-04-27T15:43:38.021Z
+**Git commit:** ca7375ea2b178159a0c61bbf62788a2ffa2390e9
+**Baseline:** 2026-04-27T10:31:44.043Z (d11632a6fae6)

 ## Result: **PASS** ✓

@ -26,7 +26,7 @@
 | 1 | P0 | recon doc exists | Y | docs/recon/local-distillation-recon.md present | present | ✓ |
 | 2 | P0 | tier-1 source streams present | — | all 4 tier-1 jsonls on disk | all present | ✓ |
 | 3 | P1 | schema validators pass on fixtures | Y | ≥40 tests, 0 fail | 51 pass, 0 fail | ✓ |
-| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1073 read · 16 written · 2 skipped | ✓ |
+| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1139 read · 82 written · 2 skipped | ✓ |
 | 5 | P2 | tier-1 sources each materialize ≥1 row | — | 4/4: distilled_facts, scrum_reviews, audit_facts, mode_experiments | 1/4 hit (mode_experiments) | ✓ |
 | 6 | P3 | on-disk scored-runs distribution non-empty | Y | >=1 accepted | acc=386 part=132 rej=57 hum=480 | ✓ |
 | 7 | P3 | scored-runs distribution sums positive | — | >0 total | 1055 total | ✓ |
@ -38,19 +38,19 @@
 | 13 | P5 | latest run (3fa51d66-784c-4c7d-843d-6c48328a608c) has all 5 stage receipts | Y | collect,score,export-rag,export-sft,export-preference | all present | ✓ |
 | 14 | P5 | every stage receipt validates against schema | Y | 0 invalid | 0 invalid | ✓ |
 | 15 | P5 | RunSummary validates | Y | valid | valid | ✓ |
-| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: 73f242e3e41c...) | ✓ |
+| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: ca7375ea2b17...) | ✓ |
 | 17 | P5 | run_hash is sha256 | Y | /^[0-9a-f]{64}$/ | 2336b96c3638982d... | ✓ |
 | 18 | P6 | acceptance gate passes 22/22 invariants on fixture | Y | PASS — 22/22 | 22/22 (exit=0) | ✓ |
 | 19 | P7 | replay validation passes on 3/3 dry-run sample tasks | Y | 3/3 | 3/3 | ✓ |
 | 20 | P7 | replay retrieval surfaces ≥1 playbook on each task (when corpus present) | — | ≥1 task with retrieval | 3/3 | ✓ |
 | 21 | P7 | escalation loop guard: no path > 2 models | Y | 0 loops | 0 | ✓ |
-| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 21 rows total | ✓ |
+| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 27 rows total | ✓ |

 ## Drift vs prior baseline

 | Metric | Baseline | Current | Δ% | Flag |
 |---|---|---|---|---|
-| p2_evidence_rows | 15 | 16 | 7% | ok |
+| p2_evidence_rows | 25 | 82 | 228% | warn |
 | p2_evidence_skips | 2 | 2 | 0% | ok |
 | p3_accepted | 386 | 386 | 0% | ok |
 | p3_partial | 132 | 132 | 0% | ok |
@ -61,7 +61,7 @@
 | p4_pref_pairs | 83 | 83 | 0% | ok |
 | p4_total_quarantined | 1325 | 1325 | 0% | ok |

-All metrics within 20% of baseline — pipeline stable across runs.
+**1 metric(s) drifted >20% from baseline.** Investigate before treating outputs as stable.

 ## System health status