diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index bda8764..62ba215 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -270,6 +270,7 @@ a steady state. Future items will land here as production triggers fire. | (close-2 full) | **OPEN #2 fully ported** (2026-05-01): `SynthesizeSft` + `LoadEvidenceByRunID` + `buildInstruction` ported byte-for-byte from `scripts/distillation/export_sft.ts`. All 8 source-class instruction templates (scrum_reviews / mode_experiments / auto_apply / audits / observer_reviews / contract_analyses / outcomes / default) match Rust output exactly so a/b validation between runtimes can diff JSONL byte-for-byte. `ExportSft` writes to `data/distilled/sft/sft_export.jsonl`. 5 additional tests including per-source-class template verification, extraction-rejection, empty-text-rejection, context-assembly, end-to-end fixture write. | | (close-2 lineage) | **Audit-baselines lineage ported** (2026-05-01): `internal/distillation/audit_baseline.go` mirrors Rust `audit_full.ts`'s LoadBaseline/AppendBaseline/buildDriftTable. `LoadLastBaseline` reads the most recent JSON line from `data/_kb/audit_baselines.jsonl`; `AppendBaseline` appends append-only with bufio. `BuildAuditDriftTable` flags drift `>20%` (configurable); zero-baseline and new-metric edge cases handled (no division-by-zero, no false-stable on zero→nonzero). `FormatAuditDriftTable` for stdout dumps. Generic on metric names so callers running both runtimes can pin Rust-compat names (`AuditBaselineRustCompat` constant lists them). 13 tests including last-line-wins, trailing-blank-tolerance, malformed-line-errors, threshold-boundary, zero-baseline-handling, sort-stability. | | (scrum) | 3-lineage scrum on `434f466..0d4f033` (post_role_gate_v1). Convergent finding (Opus + Kimi): `DecodeIndex` lost nil-meta items across persistence. **Fixed** by bumping envelope version 1→2 with explicit `IDs []string` field; v1 envelopes still load via meta-key fallback. Opus-only real bugs also actioned: `handleMerge` non-`ErrIndexNotFound` nil-deref, `mathLog` dead wrapper removed, bubble sort → `sort.Slice`. False positives rejected after verification (Kimi rollback misreading + Opus stale-comment claim). 2 new regression tests lock the v2 round-trip + v1 backward-compat. Disposition: `reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md`. | +| (audit-full port) | **Audit-FULL pipeline** (phases 0/3/4) ported from `scripts/distillation/audit_full.ts`. `internal/distillation/audit_full.go` + `cmd/audit_full` CLI. 6 ported required-check classes; 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust pieces (materializer / replay / run-summaries) not yet ported. **Cross-runtime byte-equal verdict on live data**: Go-side audit-full against `/home/profit/lakehouse` produced p3_*/p4_* metrics IDENTICAL to the last Rust-emitted `audit_baselines.jsonl` entry (all 8 metrics match: p3_accepted=386, p3_partial=132, p3_rejected=57, p3_human=480, p4_sft_rows=353, p4_rag_rows=448, p4_pref_pairs=83, p4_total_quarantined=1325). 6 new tests + the live-data probe captured in `reports/cutover/audit_full_go_vs_rust.md`. | | (close-3) | **OPEN #3: distribution drift via PSI** — `internal/drift/drift.go`: `ComputeDistributionDrift` returns Population Stability Index + verdict tier (stable < 0.10, minor 0.10–0.25, major ≥ 0.25). Equal-width bucketing over combined min/max range, epsilon-clamping for empty buckets, per-bucket breakdown for drilldown. 7 new tests including identical-is-stable, hard-shift-is-major, moderate-detected-not-stable, empty-inputs-safe, all-identical-safe, bucket-counts-conserved, num-buckets-clamping. | | (close-4) | **OPEN #4: ops nice-to-haves** — (a) Real-time wall-clock for stress harness: per-phase elapsed time logged to stdout as it runs (`[stress] phase NAME starting (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`); `Output.PhaseTimings` + `Output.TotalElapsedMs` written to JSON; (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase calibration: not actioned — no fired trigger yet, would be speculative. Documented as deferred-until-need rather than ignored. | diff --git a/cmd/audit_full/main.go b/cmd/audit_full/main.go new file mode 100644 index 0000000..ed6cd59 --- /dev/null +++ b/cmd/audit_full/main.go @@ -0,0 +1,105 @@ +// audit_full — Go-side audit-full runner. Calls into +// internal/distillation.RunAuditFull, dumps the Markdown report to +// stdout (or a file), and optionally appends an AuditBaseline entry +// to data/_kb/audit_baselines.jsonl for the longitudinal log. +// +// Usage: +// audit_full # report only +// audit_full -root /home/profit/lakehouse # custom root +// audit_full -append-baseline # also append to audit_baselines.jsonl +// audit_full -out reports/distillation/run.md # write report file +// +// Designed to live alongside the Rust scripts/distillation/audit_full.ts +// — operators can run either runtime against the same root and the +// audit_baselines.jsonl entries are interchangeable. +package main + +import ( + "encoding/json" + "flag" + "fmt" + "log" + "os" + "os/exec" + "strings" + "time" + + "git.agentview.dev/profit/golangLAKEHOUSE/internal/distillation" +) + +func main() { + root := flag.String("root", "", "lakehouse data root (defaults to $LH_DISTILL_ROOT or /home/profit/lakehouse)") + out := flag.String("out", "", "write Markdown report to this path (default: stdout)") + appendBaseline := flag.Bool("append-baseline", false, "append an AuditBaseline entry to data/_kb/audit_baselines.jsonl after the run") + jsonOut := flag.Bool("json", false, "emit the full PhaseCheckReport as JSON instead of Markdown") + flag.Parse() + + gitHEAD := resolveGitHEAD() + report := distillation.RunAuditFull(distillation.AuditFullOptions{ + Root: *root, + GitHEAD: gitHEAD, + }) + + var body []byte + if *jsonOut { + body = mustJSON(report) + } else { + body = []byte(distillation.FormatAuditFullReport(report)) + } + + if *out == "" { + _, _ = os.Stdout.Write(body) + } else { + if err := os.WriteFile(*out, body, 0o644); err != nil { + log.Fatalf("write %s: %v", *out, err) + } + fmt.Fprintf(os.Stderr, "wrote %s (%d bytes)\n", *out, len(body)) + } + + if *appendBaseline { + // Resolve the same path the Rust pipeline uses so both + // runtimes share the audit_baselines.jsonl log. + resolvedRoot := *root + if resolvedRoot == "" { + if env := os.Getenv("LH_DISTILL_ROOT"); env != "" { + resolvedRoot = env + } else { + resolvedRoot = "/home/profit/lakehouse" + } + } + bp := distillation.DefaultBaselinePath(resolvedRoot) + err := distillation.AppendBaseline(bp, distillation.AuditBaseline{ + RecordedAt: time.Now().UTC().Format(time.RFC3339), + GitCommit: gitHEAD, + Metrics: report.Metrics, + }) + if err != nil { + log.Fatalf("append baseline: %v", err) + } + fmt.Fprintf(os.Stderr, "appended baseline to %s\n", bp) + } + + if report.Failed > 0 { + os.Exit(1) + } +} + +// resolveGitHEAD returns the current commit SHA if the Go repo is a +// git checkout. Falls back to "" rather than failing — the audit +// runs even on a fresh clone without git. +func resolveGitHEAD() string { + cmd := exec.Command("git", "rev-parse", "HEAD") + bs, err := cmd.Output() + if err != nil { + return "" + } + return strings.TrimSpace(string(bs)) +} + +func mustJSON(v any) []byte { + bs, err := json.MarshalIndent(v, "", " ") + if err != nil { + log.Fatalf("json marshal: %v", err) + } + return append(bs, '\n') +} diff --git a/internal/distillation/audit_full.go b/internal/distillation/audit_full.go new file mode 100644 index 0000000..0fa1807 --- /dev/null +++ b/internal/distillation/audit_full.go @@ -0,0 +1,445 @@ +package distillation + +// Audit-FULL pipeline — Go port of scripts/distillation/audit_full.ts +// (Rust legacy). Runs the metric-collection passes that produce +// audit_baselines.jsonl entries. Pure observability: never modifies +// pipeline data, only reads and tallies. +// +// Phase coverage in this port: +// - Phase 0 (file presence) ✓ ported +// - Phase 1 (schema validators) ✗ skipped — Go's `go test` +// equivalent runs as part of +// `just verify`, no need to +// re-invoke from here. +// - Phase 2 (materializer dry-run) ✗ deferred — depends on the +// Go-side materializer port +// (transforms + build_evidence +// _index) which isn't yet +// done. Surfaces as TODO. +// - Phase 3 (scored-runs distribution) ✓ ported +// - Phase 4 (contamination firewall) ✓ ported +// - Phase 5 (receipts validation) ✗ deferred — depends on the +// Go pipeline emitting +// run-summary JSON, not yet. +// - Phase 6 (replay sanity) ✗ deferred — Go-side replay +// tool not ported. +// - Phase 7 (run summary lineage) ✗ deferred — same. +// +// The phases that ARE ported are sufficient to produce the +// AuditBaseline metrics (p3_*, p4_*) that drift across runs. p2_* +// metrics will remain at zero until the materializer ports. +// +// Output: a structured PhaseCheckReport plus a Markdown summary. +// Operators run this from cmd/audit_full to validate a Go-side +// distillation pipeline run produced sane outputs. + +import ( + "encoding/json" + "fmt" + "os" + "path/filepath" + "regexp" + "strings" +) + +// PhaseCheck is one observable check within a phase. Mirrors the +// Rust shape exactly — Markdown rendering uses the same column +// layout so cross-runtime diff'ing is meaningful. +type PhaseCheck struct { + Phase int `json:"phase"` + Name string `json:"name"` + Expected string `json:"expected"` + Actual string `json:"actual"` + Passed bool `json:"passed"` + Required bool `json:"required"` // false → informational only, doesn't fail audit + Notes []string `json:"notes,omitempty"` +} + +// PhaseCheckReport is the aggregate result of one audit-full run. +// Metrics is the AuditBaseline-shape metric snapshot that the +// caller can pass to AppendBaseline to grow the longitudinal log. +type PhaseCheckReport struct { + Checks []PhaseCheck `json:"checks"` + Metrics map[string]int64 `json:"metrics"` + Failed int `json:"failed"` // count of REQUIRED checks that failed + Skipped int `json:"deferred_phases"` // phases not yet ported + GitHEAD string `json:"git_head,omitempty"` +} + +// AuditFullOptions controls a single audit-full run. Root is the +// data dir (defaults to LH_DISTILL_ROOT or /home/profit/lakehouse +// to keep operators running both runtimes hitting the same paths). +type AuditFullOptions struct { + Root string + GitHEAD string // optional — caller resolves and passes through +} + +// RunAuditFull orchestrates the ported phases (0, 3, 4) and +// returns the aggregated report. Each phase is independent; a +// phase that errors is recorded as a failed check rather than +// aborting the run, matching Rust's "always emit a report" stance. +func RunAuditFull(opts AuditFullOptions) PhaseCheckReport { + if opts.Root == "" { + if env := os.Getenv("LH_DISTILL_ROOT"); env != "" { + opts.Root = env + } else { + opts.Root = "/home/profit/lakehouse" + } + } + report := PhaseCheckReport{ + Metrics: make(map[string]int64), + GitHEAD: opts.GitHEAD, + Skipped: 4, // phases 1, 2, 5, 6, 7 all skipped — see header comment + } + auditPhase0(opts.Root, &report) + auditPhase3(opts.Root, &report) + auditPhase4(opts.Root, &report) + for _, c := range report.Checks { + if c.Required && !c.Passed { + report.Failed++ + } + } + return report +} + +// ── Phase 0: file presence ───────────────────────────────────────── + +func auditPhase0(root string, report *PhaseCheckReport) { + // The recon doc is Rust-specific (docs/recon/local-distillation- + // recon.md); a Go-side equivalent would live in the + // golangLAKEHOUSE repo. For audit-full's purposes, we treat its + // presence as informational rather than required when running + // against a non-Rust root. + reconPath := filepath.Join(root, "docs", "recon", "local-distillation-recon.md") + exists := fileExists(reconPath) + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 0, Name: "recon doc exists", + Expected: "docs/recon/local-distillation-recon.md present", + Actual: fmt.Sprintf("%v", exists), + Passed: exists, Required: false, // informational on Go-side runs + }) + + tier1 := []string{ + "data/_kb/distilled_facts.jsonl", + "data/_kb/scrum_reviews.jsonl", + "data/_kb/audit_facts.jsonl", + "data/_kb/mode_experiments.jsonl", + } + missing := []string{} + for _, p := range tier1 { + if !fileExists(filepath.Join(root, p)) { + missing = append(missing, p) + } + } + notes := []string{} + if len(missing) > 0 { + notes = append(notes, "fresh-clone or post-rotation environment — Phase 2 will tally as rows_present=false; not a hard fail") + } + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 0, Name: "tier-1 source streams present", + Expected: "all 4 tier-1 jsonls on disk", + Actual: func() string { + if len(missing) == 0 { + return "all present" + } + return "missing: " + strings.Join(missing, ", ") + }(), + Passed: len(missing) == 0, Required: false, + Notes: notes, + }) +} + +// ── Phase 3: scored-runs distribution ────────────────────────────── + +func auditPhase3(root string, report *PhaseCheckReport) { + scoredDir := filepath.Join(root, "data", "scored-runs") + if !fileExists(scoredDir) { + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 3, Name: "scored-runs on disk", + Expected: "data/scored-runs/ populated", + Actual: "missing", + Passed: false, Required: true, + Notes: []string{"run scoring before audit-full (Go: scripts/distillation/score; Rust: ./scripts/distill score)"}, + }) + return + } + + counts := map[string]int64{ + "accepted": 0, + "partially_accepted": 0, + "rejected": 0, + "needs_human_review": 0, + } + files, err := ListScoredRunFiles(root) + if err != nil { + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 3, Name: "scored-runs walk", + Expected: "no error", Actual: err.Error(), + Passed: false, Required: true, + }) + return + } + for _, f := range files { + runs, _, err := LoadScoredRunsFromFile(f) + if err != nil { + continue + } + for _, r := range runs { + if _, ok := counts[string(r.Category)]; ok { + counts[string(r.Category)]++ + } + } + } + total := counts["accepted"] + counts["partially_accepted"] + counts["rejected"] + counts["needs_human_review"] + + report.Metrics["p3_accepted"] = counts["accepted"] + report.Metrics["p3_partial"] = counts["partially_accepted"] + report.Metrics["p3_rejected"] = counts["rejected"] + report.Metrics["p3_human"] = counts["needs_human_review"] + + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 3, Name: "on-disk scored-runs distribution non-empty", + Expected: ">=1 accepted", + Actual: fmt.Sprintf("acc=%d part=%d rej=%d hum=%d", counts["accepted"], counts["partially_accepted"], counts["rejected"], counts["needs_human_review"]), + Passed: counts["accepted"] >= 1, Required: true, + }) + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 3, Name: "scored-runs distribution sums positive", + Expected: ">0 total", Actual: fmt.Sprintf("%d total", total), + Passed: total > 0, Required: false, + }) +} + +// ── Phase 4: contamination firewall + provenance ─────────────────── + +// sigHashRe pre-compiled match for the canonical sig_hash shape: +// 64 lowercase hex characters (sha256 hex). Used per-row in the +// provenance check. +var sigHashRe = regexp.MustCompile(`^[0-9a-f]{64}$`) + +func auditPhase4(root string, report *PhaseCheckReport) { + sftPath := filepath.Join(root, "exports", "sft", "instruction_response.jsonl") + ragPath := filepath.Join(root, "exports", "rag", "playbooks.jsonl") + prefPath := filepath.Join(root, "exports", "preference", "chosen_rejected.jsonl") + + sftRows := readJSONLLines(sftPath) + ragRows := readJSONLLines(ragPath) + prefRows := readJSONLLines(prefPath) + + report.Metrics["p4_sft_rows"] = int64(len(sftRows)) + report.Metrics["p4_rag_rows"] = int64(len(ragRows)) + report.Metrics["p4_pref_pairs"] = int64(len(prefRows)) + + // SFT contamination firewall: 0 forbidden quality_scores. The + // only legal SFT quality scores are accepted + partially_accepted. + sftForbidden := 0 + for _, line := range sftRows { + var r struct { + QualityScore string `json:"quality_score"` + } + if err := json.Unmarshal([]byte(line), &r); err != nil { + continue // tolerate malformed (matches Rust) + } + if r.QualityScore != "accepted" && r.QualityScore != "partially_accepted" { + sftForbidden++ + } + } + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 4, Name: "SFT contamination firewall: 0 forbidden quality_scores", + Expected: "0", Actual: fmt.Sprintf("%d", sftForbidden), + Passed: sftForbidden == 0, Required: true, + Notes: []string{"this is the spec non-negotiable — rejected/needs_human_review must NEVER appear in SFT"}, + }) + + // RAG firewall: 0 rejected leaks + ragRejected := 0 + for _, line := range ragRows { + var r struct { + SuccessScore string `json:"success_score"` + } + if err := json.Unmarshal([]byte(line), &r); err != nil { + continue + } + if r.SuccessScore == "rejected" { + ragRejected++ + } + } + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 4, Name: "RAG firewall: 0 rejected leaks", + Expected: "0", Actual: fmt.Sprintf("%d", ragRejected), + Passed: ragRejected == 0, Required: true, + }) + + // Preference: 0 self-pairs + 0 identical-text pairs. + prefSelfPairs, prefIdenticalText := 0, 0 + for _, line := range prefRows { + var r struct { + ChosenRunID string `json:"chosen_run_id"` + RejectedRunID string `json:"rejected_run_id"` + Chosen string `json:"chosen"` + Rejected string `json:"rejected"` + } + if err := json.Unmarshal([]byte(line), &r); err != nil { + continue + } + if r.ChosenRunID == r.RejectedRunID { + prefSelfPairs++ + } + if r.Chosen == r.Rejected { + prefIdenticalText++ + } + } + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 4, Name: "Preference: 0 self-pairs (chosen_run_id != rejected_run_id)", + Expected: "0", Actual: fmt.Sprintf("%d", prefSelfPairs), + Passed: prefSelfPairs == 0, Required: true, + }) + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 4, Name: "Preference: 0 identical-text pairs", + Expected: "0", Actual: fmt.Sprintf("%d", prefIdenticalText), + Passed: prefIdenticalText == 0, Required: true, + }) + + // Provenance check: every export row must carry a 64-char hex + // sig_hash. Walks sft + rag + pref together since the contract + // is uniform across all three. + noProv := 0 + checkProv := func(line string) { + var r struct { + Provenance struct { + SigHash string `json:"sig_hash"` + } `json:"provenance"` + } + if err := json.Unmarshal([]byte(line), &r); err != nil { + return + } + if r.Provenance.SigHash == "" || !sigHashRe.MatchString(r.Provenance.SigHash) { + noProv++ + } + } + for _, line := range sftRows { + checkProv(line) + } + for _, line := range ragRows { + checkProv(line) + } + for _, line := range prefRows { + checkProv(line) + } + report.Checks = append(report.Checks, PhaseCheck{ + Phase: 4, Name: "every export row carries valid sha256 provenance.sig_hash", + Expected: "0 missing", Actual: fmt.Sprintf("%d missing", noProv), + Passed: noProv == 0, Required: true, + }) + + // Quarantine totals (informational — feeds the p4_total_quarantined + // metric used by the longitudinal drift signal). + totalQuar := int64(0) + for _, qp := range []string{ + "exports/quarantine/sft.jsonl", + "exports/quarantine/rag.jsonl", + "exports/quarantine/preference.jsonl", + } { + totalQuar += int64(len(readJSONLLines(filepath.Join(root, qp)))) + } + report.Metrics["p4_total_quarantined"] = totalQuar +} + +// ── helpers ──────────────────────────────────────────────────────── + +func fileExists(p string) bool { + _, err := os.Stat(p) + return err == nil +} + +// readJSONLLines reads a JSONL file and returns non-empty lines. +// Returns nil on missing file (matches Rust's existsSync ? read : []). +func readJSONLLines(path string) []string { + data, err := os.ReadFile(path) + if err != nil { + return nil + } + out := make([]string, 0) + for _, line := range strings.Split(string(data), "\n") { + if strings.TrimSpace(line) != "" { + out = append(out, line) + } + } + return out +} + +// FormatAuditFullReport renders a Markdown report mirroring the +// Rust phase8-full-audit-report.md shape so operators reading +// across runtimes don't have to re-learn the layout. +func FormatAuditFullReport(report PhaseCheckReport) string { + var b strings.Builder + fmt.Fprintln(&b, "# Audit-FULL report (Go)") + fmt.Fprintln(&b) + if report.GitHEAD != "" { + fmt.Fprintf(&b, "**git HEAD:** `%s`\n\n", report.GitHEAD) + } + failed := report.Failed + total := 0 + for _, c := range report.Checks { + if c.Required { + total++ + } + } + verdict := "PASS" + if failed > 0 { + verdict = "FAIL" + } + fmt.Fprintf(&b, "**Verdict:** %s — %d/%d required checks passed; %d phase(s) deferred.\n\n", + verdict, total-failed, total, report.Skipped) + + fmt.Fprintln(&b, "## Checks") + fmt.Fprintln(&b) + fmt.Fprintln(&b, "| phase | name | expected | actual | required | passed |") + fmt.Fprintln(&b, "|---|---|---|---|---|---|") + for _, c := range report.Checks { + req := "no" + if c.Required { + req = "**yes**" + } + passed := "✗" + if c.Passed { + passed = "✓" + } + fmt.Fprintf(&b, "| %d | %s | %s | %s | %s | %s |\n", + c.Phase, c.Name, c.Expected, c.Actual, req, passed) + for _, n := range c.Notes { + fmt.Fprintf(&b, "| | _note_ | %s | | | |\n", n) + } + } + + if len(report.Metrics) > 0 { + fmt.Fprintln(&b) + fmt.Fprintln(&b, "## Metrics") + fmt.Fprintln(&b) + fmt.Fprintln(&b, "| metric | value |") + fmt.Fprintln(&b, "|---|---:|") + // Stable order for diffs. + names := make([]string, 0, len(report.Metrics)) + for k := range report.Metrics { + names = append(names, k) + } + // sort imported via audit_baseline.go + sortStrings(names) + for _, k := range names { + fmt.Fprintf(&b, "| %s | %d |\n", k, report.Metrics[k]) + } + } + return b.String() +} + +// sortStrings is the local sort wrapper to keep imports tidy across +// audit_baseline.go and audit_full.go (both need string sorting; +// importing sort once at the package level is cleaner). +func sortStrings(s []string) { + // Insertion sort — N is at most a dozen metric names. + for i := 1; i < len(s); i++ { + for j := i; j > 0 && s[j-1] > s[j]; j-- { + s[j-1], s[j] = s[j], s[j-1] + } + } +} diff --git a/internal/distillation/audit_full_test.go b/internal/distillation/audit_full_test.go new file mode 100644 index 0000000..9f3967e --- /dev/null +++ b/internal/distillation/audit_full_test.go @@ -0,0 +1,218 @@ +package distillation + +import ( + "os" + "path/filepath" + "strings" + "testing" +) + +// TestRunAuditFull_EmptyRoot: missing data directories yield +// failures on required checks but doesn't error out the run. +// Operator running on a fresh box sees the report with the +// expected "missing" actuals. +func TestRunAuditFull_EmptyRoot(t *testing.T) { + tmp := t.TempDir() + report := RunAuditFull(AuditFullOptions{Root: tmp}) + if len(report.Checks) == 0 { + t.Fatalf("expected check rows even on empty root, got %d", len(report.Checks)) + } + // Phase 3's "scored-runs on disk" must fail (required); the + // failure count rises by at least 1. + if report.Failed < 1 { + t.Errorf("expected ≥1 required failure on empty root, got %d", report.Failed) + } +} + +// TestRunAuditFull_FullFixtureFlow seeds a complete data layout +// and verifies all phases produce the expected metrics + a clean +// PASS verdict. Locks the end-to-end orchestration. +func TestRunAuditFull_FullFixtureFlow(t *testing.T) { + tmp := t.TempDir() + // scored-runs: one accepted record (passes phase 3 required check) + scoredDir := filepath.Join(tmp, "data", "scored-runs", "2026", "05", "01") + if err := os.MkdirAll(scoredDir, 0o755); err != nil { + t.Fatalf("mkdir scored: %v", err) + } + scoredJSONL := `{"category":"accepted","evidence_run_id":"r1","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff0","recorded_at":"2026-05-01T00:00:00Z"}} +{"category":"partially_accepted","evidence_run_id":"r2","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff1","recorded_at":"2026-05-01T00:00:00Z"}} +{"category":"rejected","evidence_run_id":"r3","provenance":{"source_file":"data/_kb/scrum_reviews.jsonl","sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff2","recorded_at":"2026-05-01T00:00:00Z"}} +` + if err := os.WriteFile(filepath.Join(scoredDir, "run.jsonl"), []byte(scoredJSONL), 0o644); err != nil { + t.Fatalf("write scored: %v", err) + } + + // SFT export: only legal quality scores, valid sig_hash on every row. + sftDir := filepath.Join(tmp, "exports", "sft") + if err := os.MkdirAll(sftDir, 0o755); err != nil { + t.Fatalf("mkdir sft: %v", err) + } + sftJSONL := `{"quality_score":"accepted","provenance":{"sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff0"}} +{"quality_score":"partially_accepted","provenance":{"sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff1"}} +` + if err := os.WriteFile(filepath.Join(sftDir, "instruction_response.jsonl"), []byte(sftJSONL), 0o644); err != nil { + t.Fatalf("write sft: %v", err) + } + + // RAG: no rejected leaks + ragDir := filepath.Join(tmp, "exports", "rag") + if err := os.MkdirAll(ragDir, 0o755); err != nil { + t.Fatalf("mkdir rag: %v", err) + } + ragJSONL := `{"success_score":"accepted","provenance":{"sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff0"}} +` + if err := os.WriteFile(filepath.Join(ragDir, "playbooks.jsonl"), []byte(ragJSONL), 0o644); err != nil { + t.Fatalf("write rag: %v", err) + } + + // Preference: distinct chosen vs rejected, no self-pairs + prefDir := filepath.Join(tmp, "exports", "preference") + if err := os.MkdirAll(prefDir, 0o755); err != nil { + t.Fatalf("mkdir pref: %v", err) + } + prefJSONL := `{"chosen_run_id":"a","rejected_run_id":"b","chosen":"good","rejected":"bad","provenance":{"sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff0"}} +` + if err := os.WriteFile(filepath.Join(prefDir, "chosen_rejected.jsonl"), []byte(prefJSONL), 0o644); err != nil { + t.Fatalf("write pref: %v", err) + } + + report := RunAuditFull(AuditFullOptions{Root: tmp}) + if report.Failed != 0 { + t.Errorf("clean fixture should have 0 required failures, got %d", report.Failed) + for _, c := range report.Checks { + if c.Required && !c.Passed { + t.Logf(" failed: phase=%d name=%q actual=%q", c.Phase, c.Name, c.Actual) + } + } + } + // Metrics populated correctly + if report.Metrics["p3_accepted"] != 1 { + t.Errorf("p3_accepted: got %d, want 1", report.Metrics["p3_accepted"]) + } + if report.Metrics["p3_partial"] != 1 { + t.Errorf("p3_partial: got %d, want 1", report.Metrics["p3_partial"]) + } + if report.Metrics["p3_rejected"] != 1 { + t.Errorf("p3_rejected: got %d, want 1", report.Metrics["p3_rejected"]) + } + if report.Metrics["p4_sft_rows"] != 2 { + t.Errorf("p4_sft_rows: got %d, want 2", report.Metrics["p4_sft_rows"]) + } + if report.Metrics["p4_rag_rows"] != 1 { + t.Errorf("p4_rag_rows: got %d, want 1", report.Metrics["p4_rag_rows"]) + } + if report.Metrics["p4_pref_pairs"] != 1 { + t.Errorf("p4_pref_pairs: got %d, want 1", report.Metrics["p4_pref_pairs"]) + } +} + +// TestPhase4_SftFirewallCatchesRejected: contamination must never +// leak into SFT export. Test seeds a row with a forbidden +// quality_score and asserts the firewall flags it. +func TestPhase4_SftFirewallCatchesRejected(t *testing.T) { + tmp := t.TempDir() + sftDir := filepath.Join(tmp, "exports", "sft") + if err := os.MkdirAll(sftDir, 0o755); err != nil { + t.Fatalf("mkdir: %v", err) + } + bad := `{"quality_score":"rejected","provenance":{"sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff0"}} +` + if err := os.WriteFile(filepath.Join(sftDir, "instruction_response.jsonl"), []byte(bad), 0o644); err != nil { + t.Fatalf("write: %v", err) + } + report := RunAuditFull(AuditFullOptions{Root: tmp}) + found := false + for _, c := range report.Checks { + if c.Phase == 4 && strings.Contains(c.Name, "SFT contamination firewall") { + if c.Passed { + t.Errorf("firewall should fail on rejected SFT row, but check passed") + } + if c.Actual != "1" { + t.Errorf("firewall actual: got %q, want '1'", c.Actual) + } + found = true + } + } + if !found { + t.Errorf("firewall check not present in report") + } +} + +// TestPhase4_PreferenceSelfPairCaught: same chosen + rejected run_id +// is structural noise and must be flagged. +func TestPhase4_PreferenceSelfPairCaught(t *testing.T) { + tmp := t.TempDir() + prefDir := filepath.Join(tmp, "exports", "preference") + if err := os.MkdirAll(prefDir, 0o755); err != nil { + t.Fatalf("mkdir: %v", err) + } + bad := `{"chosen_run_id":"X","rejected_run_id":"X","chosen":"a","rejected":"b","provenance":{"sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff0"}} +` + if err := os.WriteFile(filepath.Join(prefDir, "chosen_rejected.jsonl"), []byte(bad), 0o644); err != nil { + t.Fatalf("write: %v", err) + } + report := RunAuditFull(AuditFullOptions{Root: tmp}) + found := false + for _, c := range report.Checks { + if c.Phase == 4 && strings.Contains(c.Name, "self-pairs") { + if c.Passed { + t.Errorf("self-pair check should fail, but passed") + } + found = true + } + } + if !found { + t.Errorf("self-pair check not present in report") + } +} + +// TestPhase4_ProvenanceRequiresValidSha256: bad sig_hash must be +// flagged. Locks the regex shape — only 64-char lowercase hex. +func TestPhase4_ProvenanceRequiresValidSha256(t *testing.T) { + tmp := t.TempDir() + sftDir := filepath.Join(tmp, "exports", "sft") + if err := os.MkdirAll(sftDir, 0o755); err != nil { + t.Fatalf("mkdir: %v", err) + } + // Three rows: one valid, one wrong-length, one wrong-charset (uppercase). + bad := `{"quality_score":"accepted","provenance":{"sig_hash":"a1b2c3d4e5f60718293a4b5c6d7e8f900112233445566778899aabbccddeeff0"}} +{"quality_score":"accepted","provenance":{"sig_hash":"too_short"}} +{"quality_score":"accepted","provenance":{"sig_hash":"A1B2C3D4E5F60718293A4B5C6D7E8F900112233445566778899AABBCCDDEEFF0"}} +` + if err := os.WriteFile(filepath.Join(sftDir, "instruction_response.jsonl"), []byte(bad), 0o644); err != nil { + t.Fatalf("write: %v", err) + } + report := RunAuditFull(AuditFullOptions{Root: tmp}) + for _, c := range report.Checks { + if c.Phase == 4 && strings.Contains(c.Name, "sig_hash") { + if c.Actual != "2 missing" { + t.Errorf("provenance check: got actual=%q, want '2 missing'", c.Actual) + } + if c.Passed { + t.Errorf("provenance check should fail with 2 bad sig_hashes") + } + } + } +} + +// TestFormatAuditFullReport_RendersCheckTable: smoke-test the +// Markdown formatter — operators should see the right verdict + +// per-phase rows. +func TestFormatAuditFullReport_RendersCheckTable(t *testing.T) { + report := PhaseCheckReport{ + GitHEAD: "deadbeef", + Checks: []PhaseCheck{ + {Phase: 0, Name: "test check", Expected: "x", Actual: "x", Passed: true, Required: true}, + {Phase: 4, Name: "fail check", Expected: "0", Actual: "5", Passed: false, Required: true}, + }, + Metrics: map[string]int64{"p3_accepted": 42, "p4_sft_rows": 17}, + Failed: 1, + Skipped: 4, + } + out := FormatAuditFullReport(report) + for _, want := range []string{"FAIL", "deadbeef", "test check", "fail check", "p3_accepted", "42", "deferred"} { + if !strings.Contains(out, want) { + t.Errorf("expected %q in formatted report:\n%s", want, out) + } + } +} diff --git a/reports/cutover/SUMMARY.md b/reports/cutover/SUMMARY.md index 3f4f588..81041d3 100644 --- a/reports/cutover/SUMMARY.md +++ b/reports/cutover/SUMMARY.md @@ -8,6 +8,7 @@ what's safe to flip. Append a row when a new endpoint clears parity. | `embed` (forced v1) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text` forced both sides | | `embed` (forced v2-moe) | 2026-04-30 | `/ai/embed` | `/v1/embed` | ✅ PASS 5/5 cos=1.000 | bit-identical with `model=nomic-embed-text-v2-moe` forced both sides — both Ollamas have the model | | `audit_baselines.jsonl` | 2026-05-01 | `data/_kb/audit_baselines.jsonl` | `internal/distillation` `LoadLastBaseline` / `AppendBaseline` / `BuildAuditDriftTable` | ✅ PASS round-trip | Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See `audit_baselines_roundtrip.md`. | +| `audit-FULL` (phases 0/3/4) | 2026-05-01 | `scripts/distillation/audit_full.ts` | `cmd/audit_full` + `internal/distillation` `RunAuditFull` | ✅ PASS metric-equal | Go-side run against live Rust root: all 8 ported metrics (p3_*, p4_*) byte-equal to the last Rust-emitted `audit_baselines.jsonl` entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See `audit_full_go_vs_rust.md`. | ## Wire-format drift catalog diff --git a/reports/cutover/audit_full_go_vs_rust.md b/reports/cutover/audit_full_go_vs_rust.md new file mode 100644 index 0000000..f64ba35 --- /dev/null +++ b/reports/cutover/audit_full_go_vs_rust.md @@ -0,0 +1,33 @@ +# Audit-FULL report (Go) + +**git HEAD:** `eb0dfdff047e34439896552d483abbee673d5a47` + +**Verdict:** PASS — 6/6 required checks passed; 4 phase(s) deferred. + +## Checks + +| phase | name | expected | actual | required | passed | +|---|---|---|---|---|---| +| 0 | recon doc exists | docs/recon/local-distillation-recon.md present | true | no | ✓ | +| 0 | tier-1 source streams present | all 4 tier-1 jsonls on disk | all present | no | ✓ | +| 3 | on-disk scored-runs distribution non-empty | >=1 accepted | acc=386 part=132 rej=57 hum=480 | **yes** | ✓ | +| 3 | scored-runs distribution sums positive | >0 total | 1055 total | no | ✓ | +| 4 | SFT contamination firewall: 0 forbidden quality_scores | 0 | 0 | **yes** | ✓ | +| | _note_ | this is the spec non-negotiable — rejected/needs_human_review must NEVER appear in SFT | | | | +| 4 | RAG firewall: 0 rejected leaks | 0 | 0 | **yes** | ✓ | +| 4 | Preference: 0 self-pairs (chosen_run_id != rejected_run_id) | 0 | 0 | **yes** | ✓ | +| 4 | Preference: 0 identical-text pairs | 0 | 0 | **yes** | ✓ | +| 4 | every export row carries valid sha256 provenance.sig_hash | 0 missing | 0 missing | **yes** | ✓ | + +## Metrics + +| metric | value | +|---|---:| +| p3_accepted | 386 | +| p3_human | 480 | +| p3_partial | 132 | +| p3_rejected | 57 | +| p4_pref_pairs | 83 | +| p4_rag_rows | 448 | +| p4_sft_rows | 353 | +| p4_total_quarantined | 1325 |