Post-PR-#11 polish: demo UI, staffer console, face pool, icons, contractor profile (24 commits) #12
@ -1,6 +1,6 @@
|
||||
# Phase 6 — Acceptance Gate Report
|
||||
|
||||
**Run:** 2026-04-27T04:54:32.225Z
|
||||
**Run:** 2026-04-27T15:43:37.943Z
|
||||
**Fixture:** `tests/fixtures/distillation/acceptance/`
|
||||
**Temp root:** `/tmp/distillation_phase6_acceptance`
|
||||
**Pipeline run_ids:** `acceptance-run-1-stable` (first) + `acceptance-run-2-stable` (second / hash reproducibility)
|
||||
@ -40,13 +40,13 @@
|
||||
| 19 | scratchpad/tree-split case: fixture row materialized into evidence | found | found | ✓ |
|
||||
| 20 | PRD drift case: fixture row materialized | found | found | ✓ |
|
||||
| 21 | hash reproducibility: per-stage output_hash identical across runs | 0 mismatches | all match | ✓ |
|
||||
| 22 | hash reproducibility: run_hash identical | 3ea12b160ee9099a... | 3ea12b160ee9099a... | ✓ |
|
||||
| 22 | hash reproducibility: run_hash identical | 8dfdacee62380ec2... | 8dfdacee62380ec2... | ✓ |
|
||||
|
||||
## Hash reproducibility detail
|
||||
|
||||
run 1 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
|
||||
run 1 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`
|
||||
|
||||
run 2 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
|
||||
run 2 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`
|
||||
|
||||
**Bit-for-bit identical.** Two runs of the entire pipeline on the same fixture with the same `recorded_at` produce the same outputs. Distillation is deterministic.
|
||||
|
||||
|
||||
@ -1,8 +1,8 @@
|
||||
# Phase 8 — Full System Audit Report
|
||||
|
||||
**Run:** 2026-04-27T04:54:32.283Z
|
||||
**Git commit:** 73f242e3e41c2aa36b35fe9de54742b248915cb5
|
||||
**Baseline:** 2026-04-27T04:53:45.796Z (5bdd159966e6)
|
||||
**Run:** 2026-04-27T15:43:38.021Z
|
||||
**Git commit:** ca7375ea2b178159a0c61bbf62788a2ffa2390e9
|
||||
**Baseline:** 2026-04-27T10:31:44.043Z (d11632a6fae6)
|
||||
|
||||
## Result: **PASS** ✓
|
||||
|
||||
@ -26,7 +26,7 @@
|
||||
| 1 | P0 | recon doc exists | Y | docs/recon/local-distillation-recon.md present | present | ✓ |
|
||||
| 2 | P0 | tier-1 source streams present | — | all 4 tier-1 jsonls on disk | all present | ✓ |
|
||||
| 3 | P1 | schema validators pass on fixtures | Y | ≥40 tests, 0 fail | 51 pass, 0 fail | ✓ |
|
||||
| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1073 read · 16 written · 2 skipped | ✓ |
|
||||
| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1139 read · 82 written · 2 skipped | ✓ |
|
||||
| 5 | P2 | tier-1 sources each materialize ≥1 row | — | 4/4: distilled_facts, scrum_reviews, audit_facts, mode_experiments | 1/4 hit (mode_experiments) | ✓ |
|
||||
| 6 | P3 | on-disk scored-runs distribution non-empty | Y | >=1 accepted | acc=386 part=132 rej=57 hum=480 | ✓ |
|
||||
| 7 | P3 | scored-runs distribution sums positive | — | >0 total | 1055 total | ✓ |
|
||||
@ -38,19 +38,19 @@
|
||||
| 13 | P5 | latest run (3fa51d66-784c-4c7d-843d-6c48328a608c) has all 5 stage receipts | Y | collect,score,export-rag,export-sft,export-preference | all present | ✓ |
|
||||
| 14 | P5 | every stage receipt validates against schema | Y | 0 invalid | 0 invalid | ✓ |
|
||||
| 15 | P5 | RunSummary validates | Y | valid | valid | ✓ |
|
||||
| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: 73f242e3e41c...) | ✓ |
|
||||
| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: ca7375ea2b17...) | ✓ |
|
||||
| 17 | P5 | run_hash is sha256 | Y | /^[0-9a-f]{64}$/ | 2336b96c3638982d... | ✓ |
|
||||
| 18 | P6 | acceptance gate passes 22/22 invariants on fixture | Y | PASS — 22/22 | 22/22 (exit=0) | ✓ |
|
||||
| 19 | P7 | replay validation passes on 3/3 dry-run sample tasks | Y | 3/3 | 3/3 | ✓ |
|
||||
| 20 | P7 | replay retrieval surfaces ≥1 playbook on each task (when corpus present) | — | ≥1 task with retrieval | 3/3 | ✓ |
|
||||
| 21 | P7 | escalation loop guard: no path > 2 models | Y | 0 loops | 0 | ✓ |
|
||||
| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 21 rows total | ✓ |
|
||||
| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 27 rows total | ✓ |
|
||||
|
||||
## Drift vs prior baseline
|
||||
|
||||
| Metric | Baseline | Current | Δ% | Flag |
|
||||
|---|---|---|---|---|
|
||||
| p2_evidence_rows | 15 | 16 | 7% | ok |
|
||||
| p2_evidence_rows | 25 | 82 | 228% | warn |
|
||||
| p2_evidence_skips | 2 | 2 | 0% | ok |
|
||||
| p3_accepted | 386 | 386 | 0% | ok |
|
||||
| p3_partial | 132 | 132 | 0% | ok |
|
||||
@ -61,7 +61,7 @@
|
||||
| p4_pref_pairs | 83 | 83 | 0% | ok |
|
||||
| p4_total_quarantined | 1325 | 1325 | 0% | ok |
|
||||
|
||||
All metrics within 20% of baseline — pipeline stable across runs.
|
||||
**1 metric(s) drifted >20% from baseline.** Investigate before treating outputs as stable.
|
||||
|
||||
## System health status
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user