lakehouse/docs/distillation/operator-handoff.md

# Distillation System — Operator Handoff

**Version:** v1.0.0
**Branch:** `scrum/auto-apply-19814`
**Tag:** `distillation-v1.0.0`
**Audit baseline:** `data/_kb/audit_baselines.jsonl` (auto-grown per audit-full run)

This is the operator-level handoff for the distillation system. If you are picking this up cold, **read this doc first**, then `docs/recon/local-distillation-recon.md`. Skim the per-phase reports under `reports/distillation/` only when you need detail.

## What this system does

Turns real Lakehouse execution traces (1052 records sampled at v1.0.0 freeze) into clean, gated training datasets:

- **RAG corpus** — 446 grounded examples for retrieval-augmentation
- **SFT corpus** — 351 instruction→response pairs (strict accepted-only)
- **Preference corpus** — 83 chosen/rejected pairs (zero self-pairs, zero identical-text)

It is **NOT** a model trainer. It is a **knowledge refinery** that produces training-safe substrate. The local-model "replay" runtime (Phase 7) demonstrates that retrieval against this substrate makes a 7B-class model behave like the system instead of fabricating audit verdicts.

## Phase map

| Phase | What it does | Commit | Report |
|---|---|---|---|
| 0 | Recon doc — inventory of source streams + integration plan | 27b1d27 | `docs/recon/local-distillation-recon.md` |
| 1 | 9 schemas + 51 schema tests + foundation types | 27b1d27 | (in commit body) |
| 2 | Materializer: 12 source jsonls → unified EvidenceRecord at `data/evidence/YYYY/MM/DD/` | 1ea8029 | (in commit body) |
| 3 | Deterministic Success Scorer: EvidenceRecord → ScoredRun (4 categories, no LLM) | c989253 | (in commit body) |
| 4 | RAG/SFT/Preference exports + quarantine system | 68b6697 | `reports/distillation/phase4-export-report.md` |
| 5 | Receipts harness — per-stage StageReceipt + RunSummary + DriftReport | 2cf359a | `reports/distillation/phase5-receipts-report.md` |
| 6 | Acceptance gate — fixture-driven 22-invariant E2E test | 1b433a9 | `reports/distillation/phase6-acceptance-report.md` |
| 7 | Replay layer — retrieval-driven local-model bootstrap | 681f39d | `reports/distillation/phase7-replay-report.md` |
| 8 | Full system audit + drift baseline | 5bdd159 | `reports/distillation/phase8-full-audit-report.md` |

The auditor rebuild (commit 20a039c) is wired to use the Phase 5 substrate: it now calls `lakehouse_answers_v1` matrix retrieval instead of tree-split shard summaries. Per-audit cost: 50× fewer cloud calls, 17× faster wall-clock.

## Known-good commands

All commands run from `/home/profit/lakehouse`. Use `bun run scripts/distillation/distill.ts <command>` or `./scripts/distill <command>` if symlinked.

```bash
# Build everything end-to-end with structured receipts
./scripts/distill run-all

# Read a specific run's summary + drift
./scripts/distill receipts --run-id <uuid>

# Verify the system end-to-end on a deterministic fixture
./scripts/distill acceptance

# Audit Phases 0-7 + drift detection vs prior baseline
./scripts/distill audit-full

# Test a task through the replay layer (local model with retrieval)
./scripts/distill replay --task "<input>"
./scripts/distill replay --task "<input>" --no-retrieval     # baseline / A/B
./scripts/distill replay --task "<input>" --allow-escalation # try deepseek if local fails

# Per-stage one-shot (rare — prefer run-all for receipts)
./scripts/distill build-evidence
./scripts/distill score
./scripts/distill export-rag
./scripts/distill export-sft        # strict accepted-only
./scripts/distill export-sft --include-partial   # opens to partially_accepted
./scripts/distill export-preference
./scripts/distill export-all
```

## How to rerun the full audit

```bash
./scripts/distill audit-full
```

Reads:
- on-disk `data/evidence/`, `data/scored-runs/`, `exports/{rag,sft,preference}/*`
- the most recent run_id under `reports/distillation/`
- the prior audit baseline at `data/_kb/audit_baselines.jsonl`

Writes:
- `reports/distillation/phase8-full-audit-report.md`
- a new row to `data/_kb/audit_baselines.jsonl` (auto-grown — never overwrite)

Exit code 0 = pass (every required check held). Non-zero = at least one required check failed.

## How to inspect drift

Two levels:

1. **Per-run drift** — every `run-all` writes `reports/distillation/<run_id>/drift.json`. Compares to the most recent prior run. Severity `ok | warn | alert`.

2. **Cross-run baseline drift** — `audit-full` reads the latest baseline row from `data/_kb/audit_baselines.jsonl` and compares 10 tracked metrics (record counts, category distribution, export sizes, quarantine totals). Drift table appears in `reports/distillation/phase8-full-audit-report.md` with `>20%` flagged as `warn`.

The baseline file is **append-only**. Don't truncate it — its value grows with the longitudinal record. If a metric flips `warn` after a code change, the row before that change is the diagnostic anchor.

## How to restore from last good state

```bash
git fetch --tags
git checkout distillation-v1.0.0
./scripts/distill audit-full   # confirm 16/16 required pass at v1.0.0
```

If you've made changes that broke the system, hard reset to v1.0.0:

```bash
git reset --hard distillation-v1.0.0   # destructive — loses uncommitted work
./scripts/distill acceptance           # confirm 22/22 fixture invariants
./scripts/distill audit-full           # confirm baseline match
```

## How to add future phases without contaminating the corpus

The corpus = `exports/rag/playbooks.jsonl` + `exports/sft/instruction_response.jsonl` + `exports/preference/chosen_rejected.jsonl`. These are training-safe **only if** every gate held. To add Phase 10+:

1. Add code under `scripts/distillation/<your_phase>.ts`. Do NOT modify Phases 0-8.
2. If your phase produces evidence, append to `data/_kb/<your_stream>.jsonl` and add a transform in `scripts/distillation/transforms.ts`. The materializer picks it up automatically.
3. If your phase needs a new schema, create `auditor/schemas/distillation/<your_schema>.ts` with `_SCHEMA_VERSION = 1` constant + validator + tests in `auditor/schemas/distillation/schemas.test.ts` (positive + negative fixtures).
4. Run `./scripts/distill audit-full` BEFORE merging. Confirm 16/16 still passes.
5. Run `./scripts/distill acceptance`. Confirm 22/22 still passes.
6. Re-run `./scripts/distill run-all`. Inspect drift in the new run's `drift.json`. Anything `>20%` in record counts means your phase moved the corpus — explain it in the commit.

## What NOT to modify casually

These have explicit firewalls. Touching them = potentially weakening contamination prevention:

| File | Why fragile |
|---|---|
| `auditor/schemas/distillation/sft_sample.ts` | The `quality_score` enum literally enforces "no rejected/needs_human_review in SFT". Loosening it = silent leak |
| `scripts/distillation/export_sft.ts` `SFT_NEVER` constant | Second-layer defense. If schema fails, this catches it |
| `scripts/distillation/export_sft.ts` re-read validation | Third layer — re-reads on-disk SFT and fails LOUD if forbidden quality_score appears |
| `scripts/distillation/scorer.ts` category mapping | Changing rules → silent corpus shift. Run `audit-full` after any change to see drift |
| `tests/fixtures/distillation/acceptance/` | The fixture is the gate. Changing it = changing the bar |
| `data/_kb/audit_baselines.jsonl` | Append-only. Truncating loses longitudinal drift signal |

If you must change one of these, run `audit-full` BEFORE and AFTER. The drift table will tell you exactly what your change moved.

## Receipt-vs-drift quick reference

If `audit-full` flags a metric:
- `>20%` swing in `p3_accepted` → scorer rules changed OR source data shifted
- `>20%` swing in `p4_sft_rows` → SFT eligibility changed (check exporter filter)
- `>20%` swing in `p4_total_quarantined` → either source data is dirtier OR a gate got tighter
- Hash mismatch on identical input → determinism violation; revert immediately

If `acceptance` fails:
- 22 invariants are pinned in `scripts/distillation/acceptance.ts`. The failing one names what broke.
- Spec invariants (1-22) are documented in `reports/distillation/phase6-acceptance-report.md`.

## Pointers to non-distillation systems

The auditor (`auditor/`) and the gateway (`crates/gateway/`) are the consumers of the distillation substrate. They use it but are not part of it:

- Auditor's `pr_audit` mode (`crates/gateway/src/v1/mode.rs`) retrieves from `lakehouse_answers_v1`. If you regenerate the RAG export, the auditor's context auto-improves on next call.
- The gateway's `/v1/chat` is the entry point all model calls flow through. Receipts capture provider, model, latency, prompt+completion tokens.

## Provenance

Every export row → traces to `data/scored-runs/.../<source>.jsonl` line N → traces to `data/evidence/.../<source>.jsonl` line N → traces to `data/_kb/<source>.jsonl` line N. The `provenance.sig_hash` field (canonical sha256 of the source row, sorted keys) is the join key.

If a downstream consumer asks "where did this SFT row come from", run:

```bash
jq 'select(.id == "<sft_id>") | .provenance' exports/sft/instruction_response.jsonl
# returns {source_file, line_offset, sig_hash, recorded_at}
# Then:
sed -n "$((<line_offset> + 1))p" data/scored-runs/<source_file>
# And so on back to data/_kb/<original_source>.jsonl
```

## Test discipline

```bash
bun test tests/distillation/ auditor/schemas/distillation/
```

At v1.0.0: **145 tests, 0 fail, 372 expect() calls, ~600ms.** Any new phase must keep this at 0 fail.

## Cumulative commits at v1.0.0

```
27b1d27  distillation: Phase 0 recon + Phase 1 schemas + Phase 2 transforms scaffold
1ea8029  distillation: Phase 2 — Evidence View materializer + health audit
c989253  distillation: Phase 3 — deterministic Success Scorer
68b6697  distillation: Phase 4 — dataset export layer
2cf359a  distillation: Phase 5 — receipts harness (system-level observability)
1b433a9  distillation: Phase 6 — acceptance gate suite
20a039c  auditor: rebuild on mode runner + drop tree-split (use distillation substrate)
681f39d  distillation: Phase 7 — replay-driven local model bootstrapping
5bdd159  distillation: Phase 8 — full system audit
<this>   distillation: Phase 9 — release freeze and operator handoff
```