Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Build the contamination firewall: RAG, SFT, and Preference exporters
that turn scored evidence into clean training datasets without
leaking rejected, unvalidated, hallucinated, or provenance-free
records.
Files (8 new + 4 schema updates):
scripts/distillation/quarantine.ts shared QuarantineWriter, 11-reason taxonomy
scripts/distillation/export_rag.ts RAG exporter (--include-review opt-in)
scripts/distillation/export_sft.ts SFT exporter (--include-partial opt-in, SFT_NEVER constant)
scripts/distillation/export_preference.ts preference exporter, same task_id pairing
scripts/distillation/distill.ts CLI dispatcher (build-evidence/score/export-*)
tests/distillation/exports.test.ts 15 contamination-firewall tests
reports/distillation/phase4-export-report.md acceptance report
Schema field-name alignment with now.md:
rag_sample.ts +source_category, exported_at→created_at
sft_sample.ts +id, exported_at→created_at, partially_accepted at schema (CLI gates)
preference_sample.ts +id, source_run_ids→chosen_run_id+rejected_run_id, +created_at
Test metrics: 117 distillation tests pass · 0 fail · 315 expects · 327ms
Real-data export run (1052 scored input rows):
RAG: 446 exported (351 acc + 95 partial), 606 quarantined
SFT: 351 exported (all 'accepted'), 701 quarantined
Preference: 83 pairs exported, 16 quarantined
CONTAMINATION FIREWALL — verified held on real data:
- SFT output: 351/351 quality_score='accepted' (ZERO leaked)
- RAG output: 351 acc + 95 partial (ZERO rejected leaked)
- Preference: 0 self-pairs (chosen_run_id != rejected_run_id)
- 536 rejected+needs_human_review records caught at unsafe_sft_category
gate, exact match to scored-runs forbidden-category total
Defense in depth (the firewall is two layers, not one):
1. Schema layer (Phase 1): SftSample.quality_score enum forbids
rejected/needs_human at write time
2. Exporter layer: SFT_NEVER constant in export_sft.ts checks
category before synthesis. Even if synthesis produced a row
with quality_score=rejected, validateSftSample would reject it.
Quarantine reasons (11): missing_provenance, missing_source_run_id,
empty_content, schema_violation, unsafe_sft_category,
unsafe_rag_category, invalid_preference_pairing,
hallucinated_file_path, duplicate_id, self_pairing,
category_disallowed.
Bug surfaced + fixed during testing: module-level evidenceCache
shared state across test runs (tests wipe TMP, cache holds stale
empty Map). Moved cache to per-call scope. Same pattern bit Phase 2
materializer would have hit if its tests had multiple runs sharing
state — preventive fix.
Pairing logic v1: same task_id with category gap. accepted×rejected
preferred, accepted×partially_accepted as fallback. MAX_PAIRS_PER_TASK=5
cap prevents one hot task from dominating. Future: cross-source
pairing (scrum_reviews chosen vs observer_reviews rejected on same
file) to grow dataset beyond 83.
CLI: ./scripts/distill.ts {build-evidence|score|export-rag|export-sft|export-preference|export-all|health}
Flags: --dry-run, --include-partial (SFT only), --include-review (RAG only)
Carry-overs to Phase 5 (Receipts Harness):
- Each exporter currently writes results but no per-stage receipt.json.
Phase 5 wraps build_evidence_index + score_runs + export_* in a
withReceipt() helper that captures git_sha + sha256 of inputs/outputs
+ record_counts + validation_pass.
- reports/distillation/latest.md aggregating most-recent run of each stage.
Carry-overs to Phase 3 v2:
- mode_experiments scoring (168 needs_human_review): derive markers from
validation_results.grounded_fraction
- extraction-class JOIN: distilled_*/audit_facts/observer_escalations
→ JOIN to verdict-bearing parent by task_id
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
101 lines
4.9 KiB
TypeScript
101 lines
4.9 KiB
TypeScript
// distill.ts — single-entry CLI dispatcher for the distillation
|
|
// pipeline. Mirrors the spec's `./scripts/distill <command>` shape.
|
|
//
|
|
// USAGE
|
|
// bun run scripts/distillation/distill.ts <command> [flags]
|
|
//
|
|
// COMMANDS
|
|
// build-evidence materialize EvidenceRecord rows from data/_kb/*.jsonl
|
|
// score run deterministic Success Scorer
|
|
// export-rag RAG export (--include-review opt-in)
|
|
// export-sft SFT export (--include-partial opt-in)
|
|
// export-preference preference export
|
|
// export-all RAG + SFT + preference (no opt-ins by default)
|
|
// health evidence health audit
|
|
//
|
|
// All commands accept --dry-run.
|
|
|
|
import { materializeAll } from "./build_evidence_index";
|
|
import { scoreAll } from "./score_runs";
|
|
import { exportRag } from "./export_rag";
|
|
import { exportSft } from "./export_sft";
|
|
import { exportPreference } from "./export_preference";
|
|
import { TRANSFORMS } from "./transforms";
|
|
|
|
const DEFAULT_ROOT = process.env.LH_DISTILL_ROOT ?? "/home/profit/lakehouse";
|
|
|
|
async function main() {
|
|
const cmd = process.argv[2];
|
|
const dry_run = process.argv.includes("--dry-run");
|
|
const include_partial = process.argv.includes("--include-partial");
|
|
const include_review = process.argv.includes("--include-review");
|
|
const recorded_at = new Date().toISOString();
|
|
|
|
switch (cmd) {
|
|
case "build-evidence": {
|
|
const r = await materializeAll({ root: DEFAULT_ROOT, transforms: TRANSFORMS, recorded_at, dry_run });
|
|
console.log(`[build-evidence] in=${r.totals.rows_read} out=${r.totals.rows_written} skip=${r.totals.rows_skipped} dedup=${r.totals.rows_deduped}`);
|
|
if (!dry_run) console.log(`[build-evidence] receipt: ${r.receipt_path}`);
|
|
if (!r.receipt.validation_pass) process.exit(1);
|
|
break;
|
|
}
|
|
case "score": {
|
|
const r = await scoreAll({ root: DEFAULT_ROOT, recorded_at, dry_run });
|
|
const c = r.totals.by_category;
|
|
console.log(`[score] in=${r.totals.rows_read} out=${r.totals.rows_written} acc=${c.accepted ?? 0} part=${c.partially_accepted ?? 0} rej=${c.rejected ?? 0} hum=${c.needs_human_review ?? 0}`);
|
|
if (!dry_run) console.log(`[score] receipt: ${r.receipt_path}`);
|
|
break;
|
|
}
|
|
case "export-rag": {
|
|
const r = await exportRag({ root: DEFAULT_ROOT, recorded_at, include_review, dry_run });
|
|
console.log(`[export-rag] in=${r.records_read} out=${r.records_exported} ${r.quarantine_summary}`);
|
|
console.log(`[export-rag] output: ${r.output_path}${include_review ? " (review included)" : ""}`);
|
|
break;
|
|
}
|
|
case "export-sft": {
|
|
const r = await exportSft({ root: DEFAULT_ROOT, recorded_at, include_partial, dry_run });
|
|
console.log(`[export-sft] in=${r.records_read} out=${r.records_exported} ${r.quarantine_summary}`);
|
|
console.log(`[export-sft] output: ${r.output_path}${include_partial ? " (partial included)" : ""}`);
|
|
break;
|
|
}
|
|
case "export-preference": {
|
|
const r = await exportPreference({ root: DEFAULT_ROOT, recorded_at, dry_run });
|
|
console.log(`[export-preference] in=${r.records_read} pairs=${r.pairs_exported} task_ids_paired=${r.task_ids_with_pairs} ${r.quarantine_summary}`);
|
|
console.log(`[export-preference] output: ${r.output_path}`);
|
|
break;
|
|
}
|
|
case "export-all": {
|
|
const rRag = await exportRag({ root: DEFAULT_ROOT, recorded_at, include_review, dry_run });
|
|
const rSft = await exportSft({ root: DEFAULT_ROOT, recorded_at, include_partial, dry_run });
|
|
const rPref = await exportPreference({ root: DEFAULT_ROOT, recorded_at, dry_run });
|
|
console.log("");
|
|
console.log("─── export-all summary ───");
|
|
console.log(` RAG: in=${rRag.records_read} out=${rRag.records_exported} ${rRag.quarantine_summary}`);
|
|
console.log(` SFT: in=${rSft.records_read} out=${rSft.records_exported} ${rSft.quarantine_summary}`);
|
|
console.log(` Preference: in=${rPref.records_read} pairs=${rPref.pairs_exported} ${rPref.quarantine_summary}`);
|
|
break;
|
|
}
|
|
case "health":
|
|
case "help":
|
|
case undefined: {
|
|
console.log("Usage: bun run scripts/distillation/distill.ts <command> [flags]");
|
|
console.log("");
|
|
console.log("Commands:");
|
|
console.log(" build-evidence materialize EvidenceRecord rows");
|
|
console.log(" score run deterministic Success Scorer");
|
|
console.log(" export-rag RAG export (--include-review opt-in)");
|
|
console.log(" export-sft SFT export (--include-partial opt-in)");
|
|
console.log(" export-preference preference export");
|
|
console.log(" export-all RAG + SFT + preference");
|
|
console.log("");
|
|
console.log("Flags: --dry-run, --include-partial, --include-review");
|
|
break;
|
|
}
|
|
default:
|
|
console.error(`unknown command: ${cmd}. Try 'help'.`);
|
|
process.exit(2);
|
|
}
|
|
}
|
|
|
|
main().catch(e => { console.error(e); process.exit(1); });
|