Final phase. Adds:
scripts/distillation/release_freeze.ts ~330 lines, 6 release gates
docs/distillation/operator-handoff.md durable cold-start operator doc
docs/distillation/recovery-runbook.md failure-mode runbook by symptom
scripts/distillation/distill.ts +release-freeze subcommand
The release_freeze orchestrator runs every gate the system has:
1. Clean git state (tolerates auto-regenerated reports)
2. Full test suite (bun test tests/distillation auditor/schemas/distillation)
3. Phase commit verification (every Phase 0-8 commit resolves)
4. Acceptance gate (22-invariant fixture E2E)
5. audit-full (Phases 0-7 verified + drift detection)
6. Tag availability check (distillation-v1.0.0 not yet existing)
Outputs:
reports/distillation/release-freeze.md human-readable manifest
reports/distillation/release-manifest.json machine-readable manifest
Manifest captures:
- git_head + git_branch + released_at
- phase→commit map for all 9 commits (Phase 0+1+2 scaffold through Phase 8 audit)
- dataset counts at freeze (RAG/SFT/Preference/evidence/scored/quarantined)
- latest audit baseline row
- per-gate pass/fail with detail
Operator handoff doc covers:
- phase map with commits + report locations
- known-good commands
- how to rerun audit-full + inspect drift
- how to restore from last-good (git checkout distillation-v1.0.0)
- how to add future phases without contaminating corpus
- what NOT to modify casually (with file:reason mapping)
- cumulative commits at v1.0.0
Recovery runbook covers, by symptom:
- audit-full exit non-zero (per-phase diagnostics)
- drift table flags warn (intentional vs regression)
- acceptance fail vs audit-full pass divergence
- run-all empty exports (counter-bisection order)
- hash mismatch on identical input (determinism violation; CRITICAL)
- replay logs growing unbounded (rotation guidance)
- nuclear restore via git checkout distillation-v1.0.0
Spec constraints (per now.md Phase 9):
- DO NOT add new intelligence features ✓ (zero new logic)
- DO NOT change scoring/export logic ✓ (zero touches)
- DO NOT weaken gates ✓ (gates only added, never relaxed beyond the
auto-regen tolerance documented in checkCleanGit)
- DO NOT retrain anything ✓ (no model touches)
CLI:
./scripts/distill release-freeze # exit 0 = release-ready
Tag creation deferred to operator confirmation (the release-freeze
report prints the exact `git tag` command). Per CLAUDE.md guidance,
destructive/visible operations like tags require explicit user
authorization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.5 KiB
Distillation System — Recovery Runbook
Version: v1.0.0 Audience: Future operator (or future Claude session) inheriting this system in a broken state.
This is the failure-mode runbook. Read top-down, stop at the first symptom that matches.
Symptom 1: audit-full exits non-zero
A required check failed. The report at reports/distillation/phase8-full-audit-report.md will name the failing check verbatim. Map by phase:
P0 — recon doc missing
Cause: repo state corrupted; the recon doc has never existed at this commit. Fix:
git checkout distillation-v1.0.0 -- docs/recon/local-distillation-recon.md
P0 — tier-1 source stream missing
Cause: fresh-clone or post-rotation environment without source data. Severity: informational only (audit-full reports as required=false). Pipeline will produce 0 rows but won't fail. Fix: populate the source stream OR accept reduced output and note in the next report.
P1 — schema validators fail
Cause: somebody modified a Phase 1 schema and broke a validator. Diagnostic:
bun test auditor/schemas/distillation/ 2>&1 | grep -A2 "fail"
Fix: revert the schema change. Phase 1 schemas are versioned (_SCHEMA_VERSION = 1); they bump deliberately, never silently.
git diff distillation-v1.0.0 -- auditor/schemas/distillation/ # see what changed
git checkout distillation-v1.0.0 -- auditor/schemas/distillation/<changed_file>
P2 — materializer dry-run fails / writes 0
Cause: every source jsonl is empty OR every transform is broken. Diagnostic:
ls -la data/_kb/*.jsonl # confirm sources have content
bun run scripts/distillation/build_evidence_index.ts --dry-run
# inspect skip reasons in data/_kb/distillation_skips.jsonl
Fix: identify the broken transform via per-source row counts. If a real-data shape changed (e.g. a new field name in a source jsonl), update the matching transform in scripts/distillation/transforms.ts. Add a fixture row to auditor/schemas/distillation/realdata.test.ts covering the new shape.
P3 — scored-runs distribution empty
Cause: data/scored-runs/ is missing OR the score categories are not landing.
Fix:
./scripts/distill score # re-runs scorer if data/evidence/ is populated
If still empty, the scorer rules in scripts/distillation/scorer.ts may have changed. Re-run bun test tests/distillation/scorer.test.ts — if those fail, revert.
P4 — SFT contamination firewall caught a leak (CRITICAL)
Severity: alert. A rejected or needs_human_review row landed in SFT. This is a non-negotiable spec violation.
Stop immediately:
mv exports/sft/instruction_response.jsonl exports/sft/instruction_response.jsonl.QUARANTINED-$(date +%s)
Diagnostic: the audit report's "P4 SFT contamination firewall" check shows the count. The forbidden row is in the file you just renamed:
jq 'select(.quality_score != "accepted" and .quality_score != "partially_accepted")' exports/sft/instruction_response.jsonl.QUARANTINED-*
Root cause is one of:
- SftSample schema validator was loosened — check
auditor/schemas/distillation/sft_sample.tsagainst v1.0.0 - Exporter
SFT_NEVERconstant was loosened — checkscripts/distillation/export_sft.ts - Schema validation was bypassed (someone called
appendFileSyncdirectly) — find the offending caller viagit log --oneline -p exports/sft/
Recovery: revert the offending change. Re-run ./scripts/distill export-sft. The fresh output should pass the audit's leak check.
P4 — Preference self-pair leaked
Cause: export_preference's pairing logic produced chosen_run_id == rejected_run_id. Schema validator should catch this.
Diagnostic:
jq 'select(.chosen_run_id == .rejected_run_id) | .id' exports/preference/chosen_rejected.jsonl
Fix: check scripts/distillation/export_preference.ts::buildPair — the equality guard at the top must remain. Revert if missing.
P5 — RunSummary fails to validate
Cause: receipts harness emitted a malformed summary. Diagnostic:
LATEST=$(ls -1t reports/distillation/*/summary.json | head -1)
bun -e "import {validateRunSummary} from './auditor/schemas/distillation/run_summary'; const s = JSON.parse(await Bun.file('$LATEST').text()); console.log(validateRunSummary(s))"
Fix: the field that failed validation is named in the validator output. Either fix the harness in scripts/distillation/receipts.ts or revert.
P6 — acceptance gate fails an invariant
Cause: something in Phases 1-5 changed in a way the fixture catches. Diagnostic: run acceptance directly to see all 22 checks:
bun run scripts/distillation/acceptance.ts
# scan output; the failing check names what broke
cat reports/distillation/phase6-acceptance-report.md
Fix: the report's "Failures" section names the invariant. The 22 invariants are documented at the top of scripts/distillation/acceptance.ts. Locate the affected phase code, revert.
P7 — replay validation regressed
Cause: local model output failed the structural validator OR retrieval found 0 playbooks. Diagnostic:
./scripts/distill replay --task "<task that previously passed>" --local-only
# inspect the validation_result.reasons
Fix:
- If validation reasons mention "filler/hedge phrase": the local model regressed (model swap?) — revert
LH_REPLAY_LOCAL_MODELto default - If retrieval is empty:
exports/rag/playbooks.jsonlis empty — re-run export-rag
Symptom 2: drift table flags warn
A metric moved >20% from baseline. This is a SOFT alert — not a failure, but worth investigating before treating outputs as stable.
Diagnostic:
jq -r '.metrics' data/_kb/audit_baselines.jsonl | tail -5
# see the recent baseline trajectory
cat reports/distillation/phase8-full-audit-report.md
# read the drift table
Common causes:
- New source data → record counts grow → expected, not a regression
- Scorer rules changed → category distribution shifted → confirm intentional
- Exporter filter loosened → SFT/RAG counts grow → CHECK contamination firewall first
If the drift is intentional, write a row to data/_kb/audit_baselines.jsonl documenting why (no schema for this — just append a JSON line with a notes field). Future audits will treat the new value as the new baseline.
Symptom 3: acceptance exits non-zero but audit-full doesn't
This is rare — acceptance is stricter (22 invariants on a fixture vs audit-full's 16 required checks on real data). The failing acceptance check usually points to a broken assumption that real data hides.
Diagnostic:
bun run scripts/distillation/acceptance.ts 2>&1 | head -50
ls /tmp/distillation_phase6_acceptance/data/evidence/ # the failed run leaves its temp root
Fix: the acceptance script keeps the temp root on fail (cleans only on pass). Inspect /tmp/distillation_phase6_acceptance/ to see what the pipeline produced vs expected. The fixture rows themselves are the contract — change the fixture deliberately, never to make a test pass.
Symptom 4: run-all produces empty exports
Cause: Phase 2 or Phase 3 ran on empty input.
Diagnostic order:
ls data/_kb/*.jsonl— sources present?find data/evidence -name "*.jsonl" | xargs wc -l— any rows materialized?find data/scored-runs -name "*.jsonl" | xargs wc -l— any rows scored?wc -l data/_kb/distillation_skips.jsonl data/_kb/scoring_skips.jsonl— anything skipped?
The first counter that's 0 names the broken phase.
Symptom 5: hash mismatch on identical input
Cause: determinism violation. Same input → different output. This is a CRITICAL bug.
Diagnostic:
# Wipe outputs but keep sources, run twice with same recorded_at, compare run_hash
rm -rf data/evidence data/scored-runs exports
RA="2026-04-27T00:00:00.000Z"
./scripts/distill run-all # captures run_id_1
LATEST_1=$(ls -1t reports/distillation/ | grep -v phase | head -1)
rm -rf data/evidence data/scored-runs exports
./scripts/distill run-all
LATEST_2=$(ls -1t reports/distillation/ | grep -v phase | head -1)
jq .run_hash reports/distillation/$LATEST_1/summary.json
jq .run_hash reports/distillation/$LATEST_2/summary.json
# These MUST match if recorded_at is fixed.
If they don't match: something in the pipeline introduced non-determinism. Common causes:
- A
Date.now()baked into output (other than the explicitrecorded_at) Math.random()orrandomUUID()in a path that should be deterministic- A
Mapiteration order issue (rare in V8) - Concurrent writes to the same file
Bisect against distillation-v1.0.0 — find the commit that introduced the non-determinism, revert.
Symptom 6: replay logs growing unbounded
Phase 7 replay appends to data/_kb/replay_runs.jsonl with no rotation. Acceptable until file >100MB, then:
# Move and start fresh
mv data/_kb/replay_runs.jsonl data/_kb/replay_runs.archive.$(date +%s).jsonl
gzip data/_kb/replay_runs.archive.*.jsonl
This is documented as a Phase 7 carry-over. A future Phase 10+ could add rotation.
Last resort: nuclear restore
If nothing works:
git fetch --tags
git stash # save uncommitted work
git checkout distillation-v1.0.0
./scripts/distill audit-full # confirm 16/16 pass at v1.0.0
./scripts/distill acceptance # confirm 22/22 pass
# Now diff against the broken state to see what changed
git diff distillation-v1.0.0..scrum/auto-apply-19814 -- scripts/distillation/ auditor/schemas/distillation/
The diff is your bug.