Final phase. Adds:
scripts/distillation/release_freeze.ts ~330 lines, 6 release gates
docs/distillation/operator-handoff.md durable cold-start operator doc
docs/distillation/recovery-runbook.md failure-mode runbook by symptom
scripts/distillation/distill.ts +release-freeze subcommand
The release_freeze orchestrator runs every gate the system has:
1. Clean git state (tolerates auto-regenerated reports)
2. Full test suite (bun test tests/distillation auditor/schemas/distillation)
3. Phase commit verification (every Phase 0-8 commit resolves)
4. Acceptance gate (22-invariant fixture E2E)
5. audit-full (Phases 0-7 verified + drift detection)
6. Tag availability check (distillation-v1.0.0 not yet existing)
Outputs:
reports/distillation/release-freeze.md human-readable manifest
reports/distillation/release-manifest.json machine-readable manifest
Manifest captures:
- git_head + git_branch + released_at
- phase→commit map for all 9 commits (Phase 0+1+2 scaffold through Phase 8 audit)
- dataset counts at freeze (RAG/SFT/Preference/evidence/scored/quarantined)
- latest audit baseline row
- per-gate pass/fail with detail
Operator handoff doc covers:
- phase map with commits + report locations
- known-good commands
- how to rerun audit-full + inspect drift
- how to restore from last-good (git checkout distillation-v1.0.0)
- how to add future phases without contaminating corpus
- what NOT to modify casually (with file:reason mapping)
- cumulative commits at v1.0.0
Recovery runbook covers, by symptom:
- audit-full exit non-zero (per-phase diagnostics)
- drift table flags warn (intentional vs regression)
- acceptance fail vs audit-full pass divergence
- run-all empty exports (counter-bisection order)
- hash mismatch on identical input (determinism violation; CRITICAL)
- replay logs growing unbounded (rotation guidance)
- nuclear restore via git checkout distillation-v1.0.0
Spec constraints (per now.md Phase 9):
- DO NOT add new intelligence features ✓ (zero new logic)
- DO NOT change scoring/export logic ✓ (zero touches)
- DO NOT weaken gates ✓ (gates only added, never relaxed beyond the
auto-regen tolerance documented in checkCleanGit)
- DO NOT retrain anything ✓ (no model touches)
CLI:
./scripts/distill release-freeze # exit 0 = release-ready
Tag creation deferred to operator confirmation (the release-freeze
report prints the exact `git tag` command). Per CLAUDE.md guidance,
destructive/visible operations like tags require explicit user
authorization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
209 lines
9.5 KiB
Markdown
209 lines
9.5 KiB
Markdown
# Distillation System — Recovery Runbook
|
|
|
|
**Version:** v1.0.0
|
|
**Audience:** Future operator (or future Claude session) inheriting this system in a broken state.
|
|
|
|
This is the failure-mode runbook. Read top-down, stop at the first symptom that matches.
|
|
|
|
## Symptom 1: `audit-full` exits non-zero
|
|
|
|
A required check failed. The report at `reports/distillation/phase8-full-audit-report.md` will name the failing check verbatim. Map by phase:
|
|
|
|
### P0 — recon doc missing
|
|
**Cause:** repo state corrupted; the recon doc has never existed at this commit.
|
|
**Fix:**
|
|
```bash
|
|
git checkout distillation-v1.0.0 -- docs/recon/local-distillation-recon.md
|
|
```
|
|
|
|
### P0 — tier-1 source stream missing
|
|
**Cause:** fresh-clone or post-rotation environment without source data.
|
|
**Severity:** informational only (audit-full reports as required=false). Pipeline will produce 0 rows but won't fail.
|
|
**Fix:** populate the source stream OR accept reduced output and note in the next report.
|
|
|
|
### P1 — schema validators fail
|
|
**Cause:** somebody modified a Phase 1 schema and broke a validator.
|
|
**Diagnostic:**
|
|
```bash
|
|
bun test auditor/schemas/distillation/ 2>&1 | grep -A2 "fail"
|
|
```
|
|
**Fix:** revert the schema change. Phase 1 schemas are versioned (`_SCHEMA_VERSION = 1`); they bump deliberately, never silently.
|
|
```bash
|
|
git diff distillation-v1.0.0 -- auditor/schemas/distillation/ # see what changed
|
|
git checkout distillation-v1.0.0 -- auditor/schemas/distillation/<changed_file>
|
|
```
|
|
|
|
### P2 — materializer dry-run fails / writes 0
|
|
**Cause:** every source jsonl is empty OR every transform is broken.
|
|
**Diagnostic:**
|
|
```bash
|
|
ls -la data/_kb/*.jsonl # confirm sources have content
|
|
bun run scripts/distillation/build_evidence_index.ts --dry-run
|
|
# inspect skip reasons in data/_kb/distillation_skips.jsonl
|
|
```
|
|
**Fix:** identify the broken transform via per-source row counts. If a real-data shape changed (e.g. a new field name in a source jsonl), update the matching transform in `scripts/distillation/transforms.ts`. Add a fixture row to `auditor/schemas/distillation/realdata.test.ts` covering the new shape.
|
|
|
|
### P3 — scored-runs distribution empty
|
|
**Cause:** `data/scored-runs/` is missing OR the score categories are not landing.
|
|
**Fix:**
|
|
```bash
|
|
./scripts/distill score # re-runs scorer if data/evidence/ is populated
|
|
```
|
|
If still empty, the scorer rules in `scripts/distillation/scorer.ts` may have changed. Re-run `bun test tests/distillation/scorer.test.ts` — if those fail, revert.
|
|
|
|
### P4 — SFT contamination firewall caught a leak (CRITICAL)
|
|
**Severity:** alert. A `rejected` or `needs_human_review` row landed in SFT. This is a non-negotiable spec violation.
|
|
**Stop immediately:**
|
|
```bash
|
|
mv exports/sft/instruction_response.jsonl exports/sft/instruction_response.jsonl.QUARANTINED-$(date +%s)
|
|
```
|
|
**Diagnostic:** the audit report's "P4 SFT contamination firewall" check shows the count. The forbidden row is in the file you just renamed:
|
|
```bash
|
|
jq 'select(.quality_score != "accepted" and .quality_score != "partially_accepted")' exports/sft/instruction_response.jsonl.QUARANTINED-*
|
|
```
|
|
**Root cause is one of:**
|
|
1. SftSample schema validator was loosened — check `auditor/schemas/distillation/sft_sample.ts` against v1.0.0
|
|
2. Exporter `SFT_NEVER` constant was loosened — check `scripts/distillation/export_sft.ts`
|
|
3. Schema validation was bypassed (someone called `appendFileSync` directly) — find the offending caller via `git log --oneline -p exports/sft/`
|
|
|
|
**Recovery:** revert the offending change. Re-run `./scripts/distill export-sft`. The fresh output should pass the audit's leak check.
|
|
|
|
### P4 — Preference self-pair leaked
|
|
**Cause:** export_preference's pairing logic produced `chosen_run_id == rejected_run_id`. Schema validator should catch this.
|
|
**Diagnostic:**
|
|
```bash
|
|
jq 'select(.chosen_run_id == .rejected_run_id) | .id' exports/preference/chosen_rejected.jsonl
|
|
```
|
|
**Fix:** check `scripts/distillation/export_preference.ts::buildPair` — the equality guard at the top must remain. Revert if missing.
|
|
|
|
### P5 — RunSummary fails to validate
|
|
**Cause:** receipts harness emitted a malformed summary.
|
|
**Diagnostic:**
|
|
```bash
|
|
LATEST=$(ls -1t reports/distillation/*/summary.json | head -1)
|
|
bun -e "import {validateRunSummary} from './auditor/schemas/distillation/run_summary'; const s = JSON.parse(await Bun.file('$LATEST').text()); console.log(validateRunSummary(s))"
|
|
```
|
|
**Fix:** the field that failed validation is named in the validator output. Either fix the harness in `scripts/distillation/receipts.ts` or revert.
|
|
|
|
### P6 — acceptance gate fails an invariant
|
|
**Cause:** something in Phases 1-5 changed in a way the fixture catches.
|
|
**Diagnostic:** run acceptance directly to see all 22 checks:
|
|
```bash
|
|
bun run scripts/distillation/acceptance.ts
|
|
# scan output; the failing check names what broke
|
|
cat reports/distillation/phase6-acceptance-report.md
|
|
```
|
|
**Fix:** the report's "Failures" section names the invariant. The 22 invariants are documented at the top of `scripts/distillation/acceptance.ts`. Locate the affected phase code, revert.
|
|
|
|
### P7 — replay validation regressed
|
|
**Cause:** local model output failed the structural validator OR retrieval found 0 playbooks.
|
|
**Diagnostic:**
|
|
```bash
|
|
./scripts/distill replay --task "<task that previously passed>" --local-only
|
|
# inspect the validation_result.reasons
|
|
```
|
|
**Fix:**
|
|
- If validation reasons mention "filler/hedge phrase": the local model regressed (model swap?) — revert `LH_REPLAY_LOCAL_MODEL` to default
|
|
- If retrieval is empty: `exports/rag/playbooks.jsonl` is empty — re-run export-rag
|
|
|
|
## Symptom 2: drift table flags `warn`
|
|
|
|
A metric moved >20% from baseline. This is a SOFT alert — not a failure, but worth investigating before treating outputs as stable.
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
jq -r '.metrics' data/_kb/audit_baselines.jsonl | tail -5
|
|
# see the recent baseline trajectory
|
|
cat reports/distillation/phase8-full-audit-report.md
|
|
# read the drift table
|
|
```
|
|
|
|
**Common causes:**
|
|
- New source data → record counts grow → expected, not a regression
|
|
- Scorer rules changed → category distribution shifted → confirm intentional
|
|
- Exporter filter loosened → SFT/RAG counts grow → CHECK contamination firewall first
|
|
|
|
**If the drift is intentional**, write a row to `data/_kb/audit_baselines.jsonl` documenting why (no schema for this — just append a JSON line with a `notes` field). Future audits will treat the new value as the new baseline.
|
|
|
|
## Symptom 3: `acceptance` exits non-zero but `audit-full` doesn't
|
|
|
|
This is rare — acceptance is stricter (22 invariants on a fixture vs audit-full's 16 required checks on real data). The failing acceptance check usually points to a broken assumption that real data hides.
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
bun run scripts/distillation/acceptance.ts 2>&1 | head -50
|
|
ls /tmp/distillation_phase6_acceptance/data/evidence/ # the failed run leaves its temp root
|
|
```
|
|
|
|
**Fix:** the acceptance script keeps the temp root on fail (cleans only on pass). Inspect `/tmp/distillation_phase6_acceptance/` to see what the pipeline produced vs expected. The fixture rows themselves are the contract — change the fixture deliberately, never to make a test pass.
|
|
|
|
## Symptom 4: `run-all` produces empty exports
|
|
|
|
**Cause:** Phase 2 or Phase 3 ran on empty input.
|
|
|
|
**Diagnostic order:**
|
|
1. `ls data/_kb/*.jsonl` — sources present?
|
|
2. `find data/evidence -name "*.jsonl" | xargs wc -l` — any rows materialized?
|
|
3. `find data/scored-runs -name "*.jsonl" | xargs wc -l` — any rows scored?
|
|
4. `wc -l data/_kb/distillation_skips.jsonl data/_kb/scoring_skips.jsonl` — anything skipped?
|
|
|
|
The first counter that's 0 names the broken phase.
|
|
|
|
## Symptom 5: hash mismatch on identical input
|
|
|
|
**Cause:** determinism violation. Same input → different output. This is a CRITICAL bug.
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Wipe outputs but keep sources, run twice with same recorded_at, compare run_hash
|
|
rm -rf data/evidence data/scored-runs exports
|
|
RA="2026-04-27T00:00:00.000Z"
|
|
./scripts/distill run-all # captures run_id_1
|
|
LATEST_1=$(ls -1t reports/distillation/ | grep -v phase | head -1)
|
|
|
|
rm -rf data/evidence data/scored-runs exports
|
|
./scripts/distill run-all
|
|
LATEST_2=$(ls -1t reports/distillation/ | grep -v phase | head -1)
|
|
|
|
jq .run_hash reports/distillation/$LATEST_1/summary.json
|
|
jq .run_hash reports/distillation/$LATEST_2/summary.json
|
|
# These MUST match if recorded_at is fixed.
|
|
```
|
|
|
|
**If they don't match:** something in the pipeline introduced non-determinism. Common causes:
|
|
- A `Date.now()` baked into output (other than the explicit `recorded_at`)
|
|
- `Math.random()` or `randomUUID()` in a path that should be deterministic
|
|
- A `Map` iteration order issue (rare in V8)
|
|
- Concurrent writes to the same file
|
|
|
|
Bisect against `distillation-v1.0.0` — find the commit that introduced the non-determinism, revert.
|
|
|
|
## Symptom 6: replay logs growing unbounded
|
|
|
|
Phase 7 replay appends to `data/_kb/replay_runs.jsonl` with no rotation. Acceptable until file >100MB, then:
|
|
|
|
```bash
|
|
# Move and start fresh
|
|
mv data/_kb/replay_runs.jsonl data/_kb/replay_runs.archive.$(date +%s).jsonl
|
|
gzip data/_kb/replay_runs.archive.*.jsonl
|
|
```
|
|
|
|
This is documented as a Phase 7 carry-over. A future Phase 10+ could add rotation.
|
|
|
|
## Last resort: nuclear restore
|
|
|
|
If nothing works:
|
|
|
|
```bash
|
|
git fetch --tags
|
|
git stash # save uncommitted work
|
|
git checkout distillation-v1.0.0
|
|
./scripts/distill audit-full # confirm 16/16 pass at v1.0.0
|
|
./scripts/distill acceptance # confirm 22/22 pass
|
|
|
|
# Now diff against the broken state to see what changed
|
|
git diff distillation-v1.0.0..scrum/auto-apply-19814 -- scripts/distillation/ auditor/schemas/distillation/
|
|
```
|
|
|
|
The diff is your bug.
|