diff --git a/reports/distillation/phase6-acceptance-report.md b/reports/distillation/phase6-acceptance-report.md index a0665f3..aa9afba 100644 --- a/reports/distillation/phase6-acceptance-report.md +++ b/reports/distillation/phase6-acceptance-report.md @@ -1,6 +1,6 @@ # Phase 6 — Acceptance Gate Report -**Run:** 2026-04-27T04:54:32.225Z +**Run:** 2026-04-27T15:43:37.943Z **Fixture:** `tests/fixtures/distillation/acceptance/` **Temp root:** `/tmp/distillation_phase6_acceptance` **Pipeline run_ids:** `acceptance-run-1-stable` (first) + `acceptance-run-2-stable` (second / hash reproducibility) @@ -40,13 +40,13 @@ | 19 | scratchpad/tree-split case: fixture row materialized into evidence | found | found | ✓ | | 20 | PRD drift case: fixture row materialized | found | found | ✓ | | 21 | hash reproducibility: per-stage output_hash identical across runs | 0 mismatches | all match | ✓ | -| 22 | hash reproducibility: run_hash identical | 3ea12b160ee9099a... | 3ea12b160ee9099a... | ✓ | +| 22 | hash reproducibility: run_hash identical | 8dfdacee62380ec2... | 8dfdacee62380ec2... | ✓ | ## Hash reproducibility detail -run 1 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2` +run 1 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0` -run 2 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2` +run 2 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0` **Bit-for-bit identical.** Two runs of the entire pipeline on the same fixture with the same `recorded_at` produce the same outputs. Distillation is deterministic. diff --git a/reports/distillation/phase8-full-audit-report.md b/reports/distillation/phase8-full-audit-report.md index 59f3eb1..7d01c69 100644 --- a/reports/distillation/phase8-full-audit-report.md +++ b/reports/distillation/phase8-full-audit-report.md @@ -1,8 +1,8 @@ # Phase 8 — Full System Audit Report -**Run:** 2026-04-27T04:54:32.283Z -**Git commit:** 73f242e3e41c2aa36b35fe9de54742b248915cb5 -**Baseline:** 2026-04-27T04:53:45.796Z (5bdd159966e6) +**Run:** 2026-04-27T15:43:38.021Z +**Git commit:** ca7375ea2b178159a0c61bbf62788a2ffa2390e9 +**Baseline:** 2026-04-27T10:31:44.043Z (d11632a6fae6) ## Result: **PASS** ✓ @@ -26,7 +26,7 @@ | 1 | P0 | recon doc exists | Y | docs/recon/local-distillation-recon.md present | present | ✓ | | 2 | P0 | tier-1 source streams present | — | all 4 tier-1 jsonls on disk | all present | ✓ | | 3 | P1 | schema validators pass on fixtures | Y | ≥40 tests, 0 fail | 51 pass, 0 fail | ✓ | -| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1073 read · 16 written · 2 skipped | ✓ | +| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1139 read · 82 written · 2 skipped | ✓ | | 5 | P2 | tier-1 sources each materialize ≥1 row | — | 4/4: distilled_facts, scrum_reviews, audit_facts, mode_experiments | 1/4 hit (mode_experiments) | ✓ | | 6 | P3 | on-disk scored-runs distribution non-empty | Y | >=1 accepted | acc=386 part=132 rej=57 hum=480 | ✓ | | 7 | P3 | scored-runs distribution sums positive | — | >0 total | 1055 total | ✓ | @@ -38,19 +38,19 @@ | 13 | P5 | latest run (3fa51d66-784c-4c7d-843d-6c48328a608c) has all 5 stage receipts | Y | collect,score,export-rag,export-sft,export-preference | all present | ✓ | | 14 | P5 | every stage receipt validates against schema | Y | 0 invalid | 0 invalid | ✓ | | 15 | P5 | RunSummary validates | Y | valid | valid | ✓ | -| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: 73f242e3e41c...) | ✓ | +| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: ca7375ea2b17...) | ✓ | | 17 | P5 | run_hash is sha256 | Y | /^[0-9a-f]{64}$/ | 2336b96c3638982d... | ✓ | | 18 | P6 | acceptance gate passes 22/22 invariants on fixture | Y | PASS — 22/22 | 22/22 (exit=0) | ✓ | | 19 | P7 | replay validation passes on 3/3 dry-run sample tasks | Y | 3/3 | 3/3 | ✓ | | 20 | P7 | replay retrieval surfaces ≥1 playbook on each task (when corpus present) | — | ≥1 task with retrieval | 3/3 | ✓ | | 21 | P7 | escalation loop guard: no path > 2 models | Y | 0 loops | 0 | ✓ | -| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 21 rows total | ✓ | +| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 27 rows total | ✓ | ## Drift vs prior baseline | Metric | Baseline | Current | Δ% | Flag | |---|---|---|---|---| -| p2_evidence_rows | 15 | 16 | 7% | ok | +| p2_evidence_rows | 25 | 82 | 228% | warn | | p2_evidence_skips | 2 | 2 | 0% | ok | | p3_accepted | 386 | 386 | 0% | ok | | p3_partial | 132 | 132 | 0% | ok | @@ -61,7 +61,7 @@ | p4_pref_pairs | 83 | 83 | 0% | ok | | p4_total_quarantined | 1325 | 1325 | 0% | ok | -All metrics within 20% of baseline — pipeline stable across runs. +**1 metric(s) drifted >20% from baseline.** Investigate before treating outputs as stable. ## System health status