2 Commits

Author SHA1 Message Date
root
1b433a9308 distillation: Phase 6 — acceptance gate suite
End-to-end fixture-driven gate. Runs the entire pipeline (collect →
score → export-rag → export-sft → export-preference) on a deterministic
fixture, asserts 22 invariants, runs a SECOND time with the same
recorded_at, and verifies hash reproducibility. Exits non-zero on any
failure. Pure observability — no scoring/filtering/schema changes.

Files (3 new + 1 modified + 6 fixture jsonls):
  scripts/distillation/acceptance.ts                    330 lines, runner + 22 checks
  reports/distillation/phase6-acceptance-report.md       autogenerated by run
  scripts/distillation/distill.ts                        +run-all, +receipts, +acceptance subcommands

  tests/fixtures/distillation/acceptance/data/_kb/
    scrum_reviews.jsonl    5 rows (accepted/partial/needs_human/scratchpad/missing-provenance)
    audits.jsonl           3 rows (info/high+PRD-drift/medium severity)
    auto_apply.jsonl       2 rows (committed, build_red_reverted)
    contract_analyses.jsonl 2 rows (accept, reject)
    observer_reviews.jsonl 2 rows (accept, reject — pair candidates)
    distilled_facts.jsonl  1 extraction-class row

Spec cases covered (now.md Phase 6):
  ✓ accepted          — Row #1 scrum, #6 audit-info, #11 contract-accept, #14 obs-accept
  ✓ partially_accepted — Row #2 scrum (3 attempts), #8 audit-medium
  ✓ rejected           — #7 audit-high, #10 auto_apply build_red, #12 contract-reject, #15 obs-reject
  ✓ needs_human_review — #3 scrum (no markers), #13 distilled extraction-class
  ✓ missing provenance — Row #5 scrum (no reviewed_at) → routed to skips
  ✓ valid preference pair — observer_reviews accept+reject on same file
  ✓ invalid preference pair — quarantine reasons populated when generated
  ✓ scratchpad / tree-split — Row #4 scrum tree_split_fired=true with multi-shard text
  ✓ PRD drift — Row #7 audit severity=high, topic="PRD drift: circuit breaker shipped claim"

Acceptance run results (run_id: acceptance-run-1-stable):
  22/22 invariants PASS
  Pipeline counts:
    collect:           14 records out, 1 skipped (missing-provenance fixture)
    score:             accepted=6 rejected=4 quarantined=4
    export-rag:        7 rows (5 acc + 2 partial, ZERO rejected)
    export-sft:        5 rows (all 'accepted', ZERO partial without --include-partial)
    export-preference: 2 pairs (zero self-pairs, zero identical-text)

Hash reproducibility — bit-for-bit identical:
  run_hash: 3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2
  Two runs of the entire pipeline on the same fixture with the same
  recorded_at produce byte-identical outputs.

The 22 invariants:
  1-4.  Receipts + summary.json + summary.md + drift.json exist
  5-7.  StageReceipt + RunSummary + DriftReport schemas all valid
  8-10. SFT contains accepted only — no rejected/needs_human/partial leak
  11-12. RAG contains accepted+partial — zero rejected
  13-15. Preference: ≥1 pair, zero self-pairs, zero identical text
  16.   Every export row has 64-char hex provenance.sig_hash
  17.   Phase 2 missing-provenance row routed to distillation_skips.jsonl
  18.   SFT quarantine populated (6 unsafe_sft_category entries)
  19.   Scratchpad/tree-split fixture row materialized
  20.   PRD drift fixture row materialized
  21.   Per-stage output_hash identical across runs (0 mismatches)
  22.   run_hash identical across runs (bit-for-bit)

CLI:
  ./scripts/distill.ts acceptance     # exits 0 on pass, 1 on fail
  ./scripts/distill.ts run-all        # full pipeline with receipts
  ./scripts/distill.ts receipts --run-id <id>

Cumulative test metrics:
  135 distillation tests pass · 0 fail · 353 expect() calls · 1411ms
  (Phase 6 adds the runtime acceptance gate, not new unit tests —
   the acceptance script IS the integration test, callable from CI.)

What this proves:
- Distillation pipeline is SAFE (contamination firewall held under
  adversarial fixture)
- Distillation pipeline is REPRODUCIBLE (identical input → bit-identical
  output across two runs)
- Distillation pipeline is GATED (every now.md invariant has a
  deterministic assertion that exits non-zero on failure)

The 6-phase distillation substrate is now training-safe. RAG (446),
SFT (351 strict-accepted), and Preference (83 paired) datasets on
real lakehouse data each carry full provenance back to source rows
through the verified Phase 2 → Phase 3 → Phase 4 chain, with Phase 5
receipts capturing every input/output sha256 + per-stage validation,
and Phase 6 proving the whole chain is gate-tight on a deterministic
fixture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:19:56 -05:00
root
68b6697bcb distillation: Phase 4 — dataset export layer
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Build the contamination firewall: RAG, SFT, and Preference exporters
that turn scored evidence into clean training datasets without
leaking rejected, unvalidated, hallucinated, or provenance-free
records.

Files (8 new + 4 schema updates):
  scripts/distillation/quarantine.ts      shared QuarantineWriter, 11-reason taxonomy
  scripts/distillation/export_rag.ts      RAG exporter (--include-review opt-in)
  scripts/distillation/export_sft.ts      SFT exporter (--include-partial opt-in, SFT_NEVER constant)
  scripts/distillation/export_preference.ts preference exporter, same task_id pairing
  scripts/distillation/distill.ts         CLI dispatcher (build-evidence/score/export-*)
  tests/distillation/exports.test.ts      15 contamination-firewall tests
  reports/distillation/phase4-export-report.md  acceptance report

Schema field-name alignment with now.md:
  rag_sample.ts        +source_category, exported_at→created_at
  sft_sample.ts        +id, exported_at→created_at, partially_accepted at schema (CLI gates)
  preference_sample.ts +id, source_run_ids→chosen_run_id+rejected_run_id, +created_at

Test metrics: 117 distillation tests pass · 0 fail · 315 expects · 327ms

Real-data export run (1052 scored input rows):
  RAG:        446 exported (351 acc + 95 partial), 606 quarantined
  SFT:        351 exported (all 'accepted'),       701 quarantined
  Preference:  83 pairs exported,                   16 quarantined

CONTAMINATION FIREWALL — verified held on real data:
  - SFT output: 351/351 quality_score='accepted' (ZERO leaked)
  - RAG output: 351 acc + 95 partial (ZERO rejected leaked)
  - Preference: 0 self-pairs (chosen_run_id != rejected_run_id)
  - 536 rejected+needs_human_review records caught at unsafe_sft_category
    gate, exact match to scored-runs forbidden-category total

Defense in depth (the firewall is two layers, not one):
  1. Schema layer (Phase 1): SftSample.quality_score enum forbids
     rejected/needs_human at write time
  2. Exporter layer: SFT_NEVER constant in export_sft.ts checks
     category before synthesis. Even if synthesis produced a row
     with quality_score=rejected, validateSftSample would reject it.

Quarantine reasons (11): missing_provenance, missing_source_run_id,
empty_content, schema_violation, unsafe_sft_category,
unsafe_rag_category, invalid_preference_pairing,
hallucinated_file_path, duplicate_id, self_pairing,
category_disallowed.

Bug surfaced + fixed during testing: module-level evidenceCache
shared state across test runs (tests wipe TMP, cache holds stale
empty Map). Moved cache to per-call scope. Same pattern bit Phase 2
materializer would have hit if its tests had multiple runs sharing
state — preventive fix.

Pairing logic v1: same task_id with category gap. accepted×rejected
preferred, accepted×partially_accepted as fallback. MAX_PAIRS_PER_TASK=5
cap prevents one hot task from dominating. Future: cross-source
pairing (scrum_reviews chosen vs observer_reviews rejected on same
file) to grow dataset beyond 83.

CLI: ./scripts/distill.ts {build-evidence|score|export-rag|export-sft|export-preference|export-all|health}
Flags: --dry-run, --include-partial (SFT only), --include-review (RAG only)

Carry-overs to Phase 5 (Receipts Harness):
- Each exporter currently writes results but no per-stage receipt.json.
  Phase 5 wraps build_evidence_index + score_runs + export_* in a
  withReceipt() helper that captures git_sha + sha256 of inputs/outputs
  + record_counts + validation_pass.
- reports/distillation/latest.md aggregating most-recent run of each stage.

Carry-overs to Phase 3 v2:
- mode_experiments scoring (168 needs_human_review): derive markers from
  validation_results.grounded_fraction
- extraction-class JOIN: distilled_*/audit_facts/observer_escalations
  → JOIN to verdict-bearing parent by task_id

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:57:40 -05:00