golangLAKEHOUSE

Author	SHA1	Message	Date
root	b0c8a3f227	parity probes: materializer + extract_json (caught + fixed real bug) Two new cross-runtime parity probes joining the validator probe from the gauntlet wave. Pattern: feed identical input through Rust and Go; diff outputs. Each probe surfaced a different signal. ## Materializer parity probe scripts/cutover/parity/materializer_parity.sh runs Bun + Go materializer against an identical synthetic data/_kb/ root, diffs the resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at). First run: 0/2 match. Real finding: Go's Provenance.LineOffset had `json:"line_offset,omitempty"` which strips the field when value is 0. Line offset 0 is the FIRST ROW of every source file — a real semantic value, not absent. Bun side always emits it. Fix: drop `omitempty` on Provenance.LineOffset. Updated comment explaining why. Re-run: 2/2 match. On-wire JSON parity holds. ## extract_json parity probe scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture strings through both runtimes' extract_json: - fenced ```json``` blocks - unfenced ``` blocks - bare braces with prose around - first-balanced-of-many - nested objects - unicode in string values - escaped quotes - empty object - top-level array (both return first inner object) - no JSON - depth-balanced but invalid syntax - trailing garbage Substrate gate: cargo test -p gateway extract_json PASS before probe. Result: 12/12 match. Algorithms genuinely equivalent. ## scripts/cutover/parity/extract_json_helper/main.go Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints {matched, value} JSON. Counterpart to the Rust parity_extract_json binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit). ## Pattern crystallized Every cross-runtime port should land with a parity probe. Three probes now exist: - validator (5/6 wire-format gap captured 2026-05-02) - materializer (caught + fixed real bug 2026-05-02) - extract_json (12/12 match 2026-05-02) The instrument is reusable — each new shared HTTP/CLI surface gets a probe row added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:43:54 -05:00
root	89ca72d471	materializer + replay ports + vectord substrate fix verified at scale Two threads landing together — the doc edits interleave so they ship in a single commit. 1. vectord substrate fix verified at original scale (closes the 2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput dropped 1,115 → 438/sec because previously-broken scenarios now do real HNSW Add work — honest cost of correctness. The fix (i.vectors side-store + safeGraphAdd recover wrappers + smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the footprint that originally surfaced the bug. 2. Materializer port — internal/materializer + cmd/materializer + scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts (12 transforms) + build_evidence_index.ts (idempotency, day-partition, receipt). On-wire JSON shape matches TS so Bun and Go runs are interchangeable. 14 tests green. 3. Replay port — internal/replay + cmd/replay + scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL phase 7 live invocation on the Go side. Both runtimes append to the same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green. Side effect on internal/distillation/types.go: EvidenceRecord gained prompt_tokens, completion_tokens, and metadata fields to mirror the TS shape the materializer transforms produce. STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions tracker moves the materializer + replay items from _open_ to DONE and adds the substrate-fix scale verification row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 03:31:02 -05:00
root	ee2a40c505	audit-FULL: port phases 1/2/5/7 — only acceptance.ts (TS-only) remains skipped Closes 4 of the 5 phases the initial audit-FULL port left as deferred. The pattern: most "deferred" phases didn't actually need the un-ported Rust pieces — they were observer-mode by design and just needed to read existing on-disk artifacts. Phase 1 (schema validators) → ported via exec.Command: Invokes `go test ./internal/distillation/...` — the Go equivalent of Rust's `bun test auditor/schemas/distillation/`. New GoTestModule field on AuditFullOptions controls the package pattern; empty disables the invocation (test mode, prevents recursion when audit-full is invoked from inside `go test`). Phase 2 (evidence materialization) → ported as observer: Reads data/evidence/ directly and tallies rows + tier-1 source hits. Doesn't re-run the materializer (which is Rust-side TS). Emits p2_evidence_rows + p2_evidence_skips metrics matching Rust shape — drop-in audit_baselines.jsonl entries possible. Phase 5 (run summary) → ported as observer: Reads reports/distillation/{run_id}/summary.json + 5 stage receipts. Validates schema_version=1, run_hash sha256, git_commit 40-char hex, all stage receipts decode as JSON. Full schema validation (StageReceipt schema) is intentionally NOT ported — it would require porting the TS schemas/distillation/ validators in full; basic shape checks catch the load-bearing invariants. Phase 7 (replay log) → ported as observer: Reads data/_kb/replay_runs.jsonl, validates last 50 rows parse as JSON. Skips the live-replay invocation that Rust's phase 7 also does — porting Rust replay.ts is substantial and not in scope. The "log shape sanity" check is what audit-full actually needs; the live invocation is a separate concern. Phase 6 (acceptance gate) — STILL SKIPPED: Rust acceptance.ts is a TS-only fixture harness with bun-specific deps. Porting the fixtures (tests/fixtures/distillation/acceptance/) + the 22-invariant runner to Go is an ADR-worth undertaking. Documented in the header comment. Live-data probe (against /home/profit/lakehouse): Skips count: 4 → 1 (only phase 6). Required checks: 6/6 → 12/12 PASS. New metric: p2_evidence_rows=1055, BYTE-EQUAL to the Rust pipeline's collect.records_out from the latest summary.json. Cross-runtime parity now extends across phases 0/1/2/3/4/5/7. 6 new tests: - TestPhase2_EvidenceTallyFromOnDisk: row + tier-1-hit tallying - TestPhase5_FullSummaryFlow: complete run-summary fixture passes - TestPhase5_ShortRunHashCaught: bad run_hash fails required check - TestPhase7_ReplayLogReadsFromDisk: row-count reporting - TestPhase7_MalformedTailRowsCaught: structural parse failure - TestRunAuditFull_FullFixtureFlow updated to seed evidence/ + reports/distillation/ for the phases now wired. Cleanup: removed local sortStrings helper (replaced with sort.Strings now that `sort` is imported for phase 5's mtime-sort). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:35:13 -05:00
root	55b8c76a8c	distillation: audit-FULL pipeline port (phases 0/3/4) — cross-runtime metric parity verified Ports the metric-collection passes from scripts/distillation/audit_full.ts. The substrate that PRODUCES audit_baselines.jsonl entries — the half OPEN #2 left as "deferred to next wave" after the read/write substrate landed in ca142b9. Phase coverage: Phase 0 (file presence) ported Phase 1 (schema validators) skipped (Go's `go test` covers it) Phase 2 (materializer dry-run) deferred (Go materializer not yet ported) Phase 3 (scored-runs distribution) ported Phase 4 (contamination firewall) ported Phase 5 (receipts validation) deferred (Go run-summary JSON not yet emitted) Phase 6 (replay sanity) deferred (Go replay tool not ported) Phase 7 (run summary lineage) deferred (same) Cross-runtime parity verified end-to-end: Go-side audit-full against /home/profit/lakehouse produced metrics IDENTICAL to the last Rust-emitted audit_baselines.jsonl entry. All 8 ported metrics match byte-for-byte: p3_accepted=386, p3_partial=132, p3_rejected=57, p3_human=480, p4_sft_rows=353, p4_rag_rows=448, p4_pref_pairs=83, p4_total_quarantined=1325 6/6 required checks pass on live data. Components: - internal/distillation/audit_full.go: PhaseCheck struct (mirrors Rust shape), PhaseCheckReport aggregation, RunAuditFull orchestrator, auditPhase0/3/4 implementations, FormatAuditFullReport Markdown writer. - cmd/audit_full/main.go: CLI binary with -root, -out, -json, -append-baseline flags. Operators run "./bin/audit_full -append-baseline" to grow the longitudinal log alongside the Rust pipeline (entries are interchangeable — same envelope shape). - 6 new tests: empty-root failure handling, full-fixture clean PASS (locks all 8 metrics + all 6 required checks), SFT firewall contamination detection, preference self-pair detection, sig_hash regex correctness (rejects wrong-length + uppercase), Markdown formatter smoke. Live-data probe captured at reports/cutover/audit_full_go_vs_rust.md (linked from reports/cutover/SUMMARY.md). Same shape as the audit_baselines round-trip evidence — both Go-side ports of the distillation surface are now validated against real Rust data, not just fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 01:30:23 -05:00
root	eb0dfdff04	vectord: v2 envelope + handleMerge robustness — actions post_role_gate_v1 scrum 3-lineage scrum on 434f466..0d4f033 surfaced one convergent finding (Opus + Kimi) and 3 Opus-only real bugs. All actioned in this commit. Two false positives (Kimi rollback misreading, Opus stale- comment claim) verified + rejected — both required manual control- flow inspection to refute, matching the documented Kimi-truncation behavior in feedback_cross_lineage_review.md. Convergent fix — DecodeIndex lost nil-meta items: - Envelope version bumped 1 → 2. - New v2 field: IDs []string carries the canonical ID set explicitly, independent of meta map's nil-vs-{} sparseness. - DecodeIndex accepts both versions: v2 reads from env.IDs; v1 falls back to meta-key inference (with the documented limitation that nil-meta items are invisible — preserved for backward-compat with already-persisted indexes). - Encode emits v2 going forward. - 2 new regression tests: - TestEncodeDecode_NilMetaItemsSurviveRoundTrip: items added with nil metadata MUST survive Encode → Decode and remain visible to IDs(). Pre-fix would have yielded IDs() == []. - TestDecodeIndex_V1BackwardCompat: hand-crafted v1 envelope still decodes (proves the fallback path). Opus-only fixes: - handleMerge: non-ErrIndexNotFound errors at h.reg.Get(name) / h.reg.Get(req.Dest) now return 500 + log instead of falling through with nil src/dest pointers (which would panic on the next deref). Real bug — only the sentinel error was handled. - internal/drift/drift.go: mathLog wrapper removed; math.Log inlined. Wrapper added no value (math was already imported). - internal/distillation/audit_baseline.go: BuildAuditDriftTable's bubble sort replaced with sort.Slice. Idiomatic + shorter. Rejected after verification: - Kimi WARN "missing rollback on partial merge": misread the control flow. Code at cmd/vectord/main.go:404-414 does NOT delete from src when dest.Add fails (continue before reaching src.Delete). Only successful Adds trigger Deletes. - Opus INFO "TimestampUnixNano comment references missing field": field exists at scripts/multi_coord_stress/main.go:128. Opus saw only the diff context, not the full file. Deferred (no fired trigger): - Opus WARN "no per-index lock during merge": no concurrent merge callers today (operators run merge as deliberate one-shot job). Worth a lock if/when matrixd or chatd start auto-triggering. Disposition: reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md. Build + vet + tests green; 2 new regression tests + all prior tests unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 01:20:37 -05:00
root	ca142b9271	distillation: audit-baselines lineage port — fully closes the OPEN #2 surface The original OPEN #2 line called for "SFT export pipeline + audit_baselines lineage." Commit 7bb432f shipped the SFT export. This commit ports the audit_baselines half — the longitudinal drift signal that distinguishes "metrics shifted because the world changed" from "metrics shifted because we broke something." Mirrors Rust scripts/distillation/audit_full.ts's substrate: - LoadLastBaseline(path) reads the most recent entry from data/_kb/audit_baselines.jsonl. Returns (nil, nil) on missing file (first run), errors on truncated last line (partial-write detection — operators don't lose drift signal silently). - AppendBaseline(path, baseline) appends one entry as a JSON line. Atomic at the line level via bufio + O_APPEND. Creates the parent directory if missing. - BuildAuditDriftTable(prior, current, threshold) computes per-metric drift. flag values mirror Rust exactly: first_run, ok, warn. DefaultDriftWarnThreshold = 0.20 = Rust's 20%. - FormatAuditDriftTable renders a fixed-width text grid for stdout dumps in audit-full runs. Edge cases handled: - Zero-baseline: prior=0 means no division — PctChange stays nil. current=0 → ok (no change). current>0 → warn (zero→nonzero is always notable, never silently fine). - New metric in current: flagged first_run, not "0%-change". Operators see "this is a new signal we haven't tracked before." - Sort: stable by metric name for deterministic JSON output and clean CI diffs. Generic on metric name (vs Rust's pinned p2_evidence_rows etc.): the Rust phase numbering doesn't translate to Go directly. The AuditBaselineRustCompat constant pins the Rust names so operators running both runtimes use the same labels, which makes drift comparison meaningful across the two pipelines. 13 new tests covering: missing file, last-line-wins, blank-line tolerance, malformed-line errors, append round-trip, append-to- existing, schema validation, first-run, threshold boundary, zero-baseline, new-metric-in-current, sort-by-metric stability, formatter output rendering. OPEN #2's "audit_baselines lineage" half now closed. The distillation package surface is at parity with the Rust pipeline: scorer, scored runs, SFT export, audit baselines all available on the Go side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:11:47 -05:00
root	7bb432f6c8	distillation: full SFT export port — closes OPEN #2 fully Follow-up to b216b7e (which shipped the SFT export substrate). This commit ports the synthesis logic, completing the migration: - SynthesizeSft(scored, ev, recordedAt, sftID) → *SftSample Mirrors the Rust synthesizeSft byte-for-byte. Returns nil for extraction-class records + empty-text records (same skip semantics as Rust). - LoadEvidenceByRunID(scoredPath, cache) reads the paired evidence JSONL (path derived by /scored-runs/ → /evidence/ replacement). Per-call cache so multiple scored-runs files in the same dir don't reload the same evidence. - buildInstruction maps source_file stem → per-class instruction template. All 8 templates (scrum_reviews, mode_experiments, auto_apply, audits, observer_reviews, contract_analyses, outcomes, default) match Rust output exactly so a/b validation between runtimes can diff JSONL byte-for-byte. - stemFromSourceFile strips data/_kb/ prefix + .jsonl suffix. - ExportSft now writes data/distilled/sft/sft_export.jsonl with the synthesized samples (DryRun=true skips file write). Per-class templates verified by 8-case sub-test: - scrum_reviews → "Review the file '...' against the PRD..." - mode_experiments → "Run task_class='...' for file..." - auto_apply → "Auto-apply: emit a 6-line surgical patch..." - audits with phase: prefix → strips to bare phase name - observer_reviews → "Observer-review the latest attempt..." - contract_analyses with permit: prefix → strips to permit ID - outcomes → "Run scenario; report per-event outcome..." - unknown source → "Source 'X' run; produce the appropriate output" Caveat documented inline: contract_analyses uses ev.metadata.contractor in Rust to produce "Analyze contractor 'X' for permit 'Y'" when present. Go's EvidenceRecord doesn't carry a free-form metadata bag yet, so we always emit the no-contractor form. Operators needing contractor-aware instructions can extend EvidenceRecord with an explicit Metadata field (separate ADR). Test additions (5 new): - TestSynthesizeSft_PerSourceClass: 8 sub-cases, one per template - TestSynthesizeSft_RejectsExtraction: extraction-role records skipped - TestSynthesizeSft_RejectsEmptyText: empty/whitespace text skipped - TestSynthesizeSft_ContextAssembly: matrix + pathway + model context string formatting matches Rust " · " join - TestExportSft_FullPort_WritesJSONL: end-to-end fixture, asserts output contains expected instruction + omits firewalled records Pre-existing TestExportSft_PartialPort_FirewallFires renamed + updated to TestExportSft_FirewallFiresBeforeEvidenceLoad — reflects the new contract that records passing the firewall but lacking evidence land in "not-instructable" rather than being silently exported. Honest semantics shift documented in the test. OPEN #2 now fully closed (was: substrate-only). The synthesis path no longer requires the Rust pipeline to be invoked — Go-side operators can run the full distillation export end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:06:57 -05:00
root	b216b7e5b6	fix the other 4: close all OPEN-list items in one wave Substantial wave addressing all 4 prior OPEN items. Three closed in full, one partially (the speculative half deliberately deferred). OPEN #1 — Periodic fresh→main index merge (FULL): - POST /v1/vectors/index/{src}/merge with {dest, clear_source} - Idempotent on re-runs (existing-in-dest items skipped) - internal/vectord/index.go: new Index.IDs() snapshot method + i.ids tracker field as canonical ID set, independent of meta map's nil-vs-{} sparseness (was a real bug — IDs() backed by meta alone missed items added with nil metadata) - 4 cmd-level integration tests (happy path drain+clear, dim mismatch, dest not found, self-merge rejection) + 1 unit test - DecodeIndex backward-compat: old envelopes restore i.ids from meta keys (best effort; new items going forward use the tracker) OPEN #2 — Distillation SFT export (SUBSTRATE): - internal/distillation/sft_export.go ports the load-bearing half: IsSftNever predicate + ListScoredRunFiles (data/scored-runs/YYYY/ MM/DD walk) + LoadScoredRunsFromFile + partial ExportSft. - Synthesis (instruction/input/response generation) deferred to a separate wave — too big for this session, but the substrate makes the next wave a port-not-design exercise. - TestSftNever_PinsExpectedSet locks the contamination firewall set: if a future commit adds/removes from SftNever, this test fails — forcing the change through review. - 5 new tests; firewall fires end-to-end through the partial port. OPEN #3 — Distribution drift via PSI (FULL): - internal/drift/drift.go: ComputeDistributionDrift via Population Stability Index. Standard finance/risk metric, well-defined verdict tiers (stable < 0.10, minor 0.10–0.25, major ≥ 0.25). - Equal-width bucketing over combined min/max so neither dist falls outside; epsilon-clamping for empty buckets so log doesn't blow up. Per-bucket breakdown for drilldown. - Pairs with the existing ComputeScorerDrift: scorer drift is categorical, distribution drift is continuous. Different shapes, same package. - 7 new tests covering identical-is-stable, hard-shift-is-major, moderate-detected-not-stable, empty-inputs-safe, all-identical- safe, bucket-counts-conserved, num-buckets-clamping. OPEN #4 — Ops nice-to-haves (PARTIAL — wall-clock done, others deferred): - (a) Real-time wall-clock for stress harness: per-phase elapsed time logged to stdout as it runs (`[stress] phase NAME starting (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`). Output.PhaseTimings + Output.TotalElapsedMs in JSON. - (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase calibration: not actioned — no fired trigger, would be speculative. Documented as deferred-until-need rather than ignored. Per the project's discipline ("don't add features beyond what the task requires"). OPEN list now empty / steady-state. Future items will land as production triggers fire. Build + vet + tests green; 18 new tests across the 4 closures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:42:11 -05:00
root	57d0df125d	E (partial): distillation port — scorer + contamination firewall First slice of the Rust v1.0.0 distillation substrate (e7636f2) ported to Go per ADR-001 #4 (port LOGIC, not bit-identical reproducibility). This commit lands the LOAD-BEARING pieces named in project_distillation_substrate.md memory: - The deterministic Success Scorer (8 sub-scorers + dispatch) - The contamination firewall on SFT samples (the "non-negotiable" spec property: rejected/needs_human_review NEVER ship to SFT) - All on-wire types + validators for ScoredRun, SftSample, EvidenceRecord with Provenance Files: internal/distillation/types.go — types + ScorerVersion + SftNever + ValidateScoredRun + ValidateSftSample internal/distillation/scorer.go — ScoreRecord + 8 class scorers + BuildScoredRun (deterministic) internal/distillation/scorer_test.go — ~40 test cases: - source-class dispatch (verdict / telemetry / extraction) - scrum_review (4 attempt cases) - observer_review (5 verdict cases) - audit (legacy + severity, 9 cases) - auto_apply (4 cases) - outcomes / mode_experiment / extraction - CONTAMINATION FIREWALL: ErrSftContamination sentinel fires on rejected/needs_human_review, distinct from typo errors - empty-pair guard (instruction/response trim != "") - reasons-required ScoredRun validation - deterministic sig_hash on identical input - purity check (input not mutated, repeatable output) Per the 2026-04-29 cross-lineage scrum's discipline: false-positive findings would be dismissed inline (none in this commit). Real findings would be addressed before merge — but this is greenfield port code reviewed against its Rust source line-by-line, which the test suite encodes as truth tables. Explicitly DEFERRED to follow-up commits: - Materialization layer (jsonl read/write, date-partitioned storage in data/scored-runs/YYYY/MM/DD/, evidence index) - SFT exporter (file iteration + filtering — the SCORING firewall is here; the EXPORT firewall is the next layer) - export_preference, export_rag (other export shapes) - Acceptance harness (16/16 acceptance gate that locks v1.0.0) - replay, receipts, build_evidence_index, transforms The scorer + firewall validator are pure functions — operational tooling layers on top without changing the deterministic logic the downstream learning loop depends on. The Go ScorerVersion stays at v1.0.0 to match the Rust e7636f2 baseline; bumping in the Go materialization commit is reserved for the next scoring-rule change, NOT the port itself. 15-smoke regression all green. vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:04:29 -05:00

9 Commits