From b0c8a3f227de158d3690efc5d328f7591669c5f8 Mon Sep 17 00:00:00 2001 From: root Date: Sat, 2 May 2026 04:43:54 -0500 Subject: [PATCH] parity probes: materializer + extract_json (caught + fixed real bug) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two new cross-runtime parity probes joining the validator probe from the gauntlet wave. Pattern: feed identical input through Rust and Go; diff outputs. Each probe surfaced a different signal. ## Materializer parity probe scripts/cutover/parity/materializer_parity.sh runs Bun + Go materializer against an identical synthetic data/_kb/ root, diffs the resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at). **First run: 0/2 match.** Real finding: Go's Provenance.LineOffset had `json:"line_offset,omitempty"` which strips the field when value is 0. Line offset 0 is the FIRST ROW of every source file — a real semantic value, not absent. Bun side always emits it. Fix: drop `omitempty` on Provenance.LineOffset. Updated comment explaining why. **Re-run: 2/2 match.** On-wire JSON parity holds. ## extract_json parity probe scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture strings through both runtimes' extract_json: - fenced ```json``` blocks - unfenced ``` blocks - bare braces with prose around - first-balanced-of-many - nested objects - unicode in string values - escaped quotes - empty object - top-level array (both return first inner object) - no JSON - depth-balanced but invalid syntax - trailing garbage Substrate gate: cargo test -p gateway extract_json PASS before probe. **Result: 12/12 match.** Algorithms genuinely equivalent. ## scripts/cutover/parity/extract_json_helper/main.go Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints {matched, value} JSON. Counterpart to the Rust parity_extract_json binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit). ## Pattern crystallized Every cross-runtime port should land with a parity probe. Three probes now exist: - validator (5/6 wire-format gap captured 2026-05-02) - materializer (caught + fixed real bug 2026-05-02) - extract_json (12/12 match 2026-05-02) The instrument is reusable — each new shared HTTP/CLI surface gets a probe row added. Co-Authored-By: Claude Opus 4.7 (1M context) --- STATE_OF_PLAY.md | 4 +- docs/ARCHITECTURE_COMPARISON.md | 2 + internal/distillation/types.go | 14 +- .../parity/extract_json_parity.md | 14 ++ .../parity/materializer_parity.md | 13 ++ .../parity/extract_json_helper/main.go | 36 ++++ scripts/cutover/parity/extract_json_parity.sh | 143 ++++++++++++++ scripts/cutover/parity/materializer_parity.sh | 180 ++++++++++++++++++ 8 files changed, 400 insertions(+), 6 deletions(-) create mode 100644 reports/cutover/gauntlet_2026-05-02/parity/extract_json_parity.md create mode 100644 reports/cutover/gauntlet_2026-05-02/parity/materializer_parity.md create mode 100644 scripts/cutover/parity/extract_json_helper/main.go create mode 100755 scripts/cutover/parity/extract_json_parity.sh create mode 100755 scripts/cutover/parity/materializer_parity.sh diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index d3c87c8..1ddbdf7 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -1,7 +1,7 @@ # STATE OF PLAY — Lakehouse-Go -**Last verified:** 2026-05-02 ~05:00 CDT -**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green in ~60s, per-component scrum across 4 bundles (no convergent findings, no real bugs), cross-runtime validator parity probe (6/6 status match, 5/6 body shape divergence captured as known gap). Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`. +**Last verified:** 2026-05-02 ~05:30 CDT +**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green in ~60s, per-component scrum across 4 bundles, **3 cross-runtime parity probes** (validator: 6/6 status match + 5/6 body-shape gap captured; materializer: caught a real omitempty bug, 2/2 match post-fix; extract_json: 12/12 match including unicode + escaped quotes). Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`. > **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. diff --git a/docs/ARCHITECTURE_COMPARISON.md b/docs/ARCHITECTURE_COMPARISON.md index 7e78fbd..97d6c17 100644 --- a/docs/ARCHITECTURE_COMPARISON.md +++ b/docs/ARCHITECTURE_COMPARISON.md @@ -53,6 +53,8 @@ Don't: | 2026-05-02 | **Port Rust replay tool to Go — DONE** | `internal/replay` + `cmd/replay` + `replay_smoke.sh`. Ports `replay.ts` retrieve → bundle → /v1/chat → validate → log. Closes audit-FULL phase 7 live invocation on Go side. 14 tests green; same `data/_kb/replay_runs.jsonl` shape (schema=replay_run.v1) as TS. | | 2026-05-02 | **`/v1/validate` + `/v1/iterate` HTTP surface — DONE** | `cmd/validatord` (port 3221) hosts both endpoints. `internal/validator` gains `PlaybookValidator` (3rd kind), JSONL roster loader, and the `Iterate` orchestrator + `ExtractJSON` helper. Gateway proxies `/v1/validate` + `/v1/iterate` to validatord. Closes the last "Go-primary" backlog item (architecture_comparison.md item #7). 30+ tests + `validatord_smoke.sh` 5/5 PASS. | | 2026-05-02 | **Cross-runtime validator parity probe — surfaced wire-format gap** | New `scripts/cutover/parity/validator_parity.sh` runs 6 identical /v1/validate cases against Rust :3100 AND Go :4110, compares status + body. Result: **6/6 status codes match (logic-level equivalence holds), 5/6 body shapes diverge.** Rust returns serde-tagged enum `{"Schema":{"field":"x","reason":"y"}}`; Go returns flat struct `{"Kind":"schema","Field":"x","Reason":"y"}`. Any caller parsing the error envelope would break in cutover. **Open**: pick a target shape (Go matching Rust is the cutover-friendly direction) and align via custom `MarshalJSON` on `ValidationError`. | +| 2026-05-02 | **Materializer parity probe — caught + fixed real bug** | New `scripts/cutover/parity/materializer_parity.sh` runs Bun + Go materializer on identical synthetic root, diffs output JSONL. Result on first run: **0/2 match** — Go's `Provenance.LineOffset` had `json:",omitempty"` and stripped the field on first-row records (line_offset=0 is semantically meaningful, not absent). 1-line fix (drop `omitempty` + comment explaining why). Re-run: **2/2 match**. Real cross-runtime gap surfaced + closed in same wave. | +| 2026-05-02 | **extract_json parity probe — 12/12 match across edge cases** | New `scripts/cutover/parity/extract_json_parity.sh` runs identical model-output strings through Rust `gateway::v1::iterate::extract_json` AND Go `validator.ExtractJSON`. 12 fixtures: fenced/unfenced blocks, nested objects, unicode, escaped quotes, top-level array, malformed JSON. Substrate gate: `cargo test -p gateway extract_json` PASS before probe. Result: **12/12 match.** Algorithms genuinely equivalent. Rust side gained `pub` on `extract_json` + new `bin/parity_extract_json` (~30 LOC). | | _open_ | **Validator wire-format alignment** | Surfaced by 2026-05-02 parity probe. Choose canonical error JSON shape, align both runtimes. ~50 LOC custom `MarshalJSON` either side. | | _open_ | Decide on Lance vector backend | Defer until corpus exceeds ~5M rows. | | _open_ | Pick Go primary vs Rust primary | Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness. | diff --git a/internal/distillation/types.go b/internal/distillation/types.go index 2a1f317..5a1ee37 100644 --- a/internal/distillation/types.go +++ b/internal/distillation/types.go @@ -125,11 +125,17 @@ const ( // Provenance is the source-linkage every distillation record carries. // SourceFile is required (no record without source linkage); other // fields are best-effort for de-duplication and trace-back. +// +// LineOffset is intentionally NOT `omitempty`: line offset 0 is the +// first row of a source file — a real, semantically-meaningful value. +// Stripping it on first-row records would silently diverge from the +// Bun materializer (which always emits the field). Surfaced by the +// 2026-05-02 cross-runtime parity probe. type Provenance struct { - SourceFile string `json:"source_file"` - LineOffset int64 `json:"line_offset,omitempty"` - SigHash string `json:"sig_hash"` - RecordedAt string `json:"recorded_at"` // ISO 8601 + SourceFile string `json:"source_file"` + LineOffset int64 `json:"line_offset"` + SigHash string `json:"sig_hash"` + RecordedAt string `json:"recorded_at"` // ISO 8601 } // ObserverVerdict is what an observer returned for an executor's diff --git a/reports/cutover/gauntlet_2026-05-02/parity/extract_json_parity.md b/reports/cutover/gauntlet_2026-05-02/parity/extract_json_parity.md new file mode 100644 index 0000000..aba21d7 --- /dev/null +++ b/reports/cutover/gauntlet_2026-05-02/parity/extract_json_parity.md @@ -0,0 +1,14 @@ +# extract_json parity probe — Rust vs Go + +**Date:** 2026-05-02T09:42:37Z +**Rust helper:** `/home/profit/lakehouse/target/release/parity_extract_json` (links live `gateway::v1::iterate::extract_json`) +**Go helper:** `./bin/parity_extract_json_go` (links live `internal/validator.ExtractJSON`) + +Identical model-output strings → both runtimes' `extract_json`. +Match = identical `{matched, value}` JSON output. + +**Substrate gate:** `cargo test -p gateway extract_json` PASS before probe. + +**Tally:** 12 match · 0 diff (out of 12 fixtures) + +_No divergences — extract_json parity holds across all fixtures._ diff --git a/reports/cutover/gauntlet_2026-05-02/parity/materializer_parity.md b/reports/cutover/gauntlet_2026-05-02/parity/materializer_parity.md new file mode 100644 index 0000000..39bd92d --- /dev/null +++ b/reports/cutover/gauntlet_2026-05-02/parity/materializer_parity.md @@ -0,0 +1,13 @@ +# Materializer parity probe — Bun vs Go + +**Date:** 2026-05-02T09:38:32Z +**Bun:** `/home/profit/lakehouse/scripts/distillation/build_evidence_index.ts` +**Go:** `./bin/materializer` + +Identical `data/_kb/` source → both runtimes' materializer. +Match = JSONL byte-equal after normalizing `provenance.recorded_at` +(per-run wall clock) + sorted line order (dedup ordering). + +**Tally:** 2 match · 0 diff (out of 2 stems) + +_No divergences — on-wire JSON parity holds._ diff --git a/scripts/cutover/parity/extract_json_helper/main.go b/scripts/cutover/parity/extract_json_helper/main.go new file mode 100644 index 0000000..4654cb6 --- /dev/null +++ b/scripts/cutover/parity/extract_json_helper/main.go @@ -0,0 +1,36 @@ +// extract_json_helper — Go-side counterpart to the Rust +// parity_extract_json binary. Reads stdin, runs +// validator.ExtractJSON, prints {matched, value} JSON to stdout. +// +// Used exclusively by +// `scripts/cutover/parity/extract_json_parity.sh` to verify +// cross-runtime equivalence of extract_json across Rust and Go. +package main + +import ( + "encoding/json" + "fmt" + "io" + "os" + + "git.agentview.dev/profit/golangLAKEHOUSE/internal/validator" +) + +func main() { + buf, err := io.ReadAll(os.Stdin) + if err != nil { + fmt.Fprintf(os.Stderr, "read stdin: %v\n", err) + os.Exit(1) + } + v := validator.ExtractJSON(string(buf)) + out := map[string]any{ + "matched": v != nil, + "value": v, + } + body, err := json.Marshal(out) + if err != nil { + fmt.Fprintf(os.Stderr, "marshal: %v\n", err) + os.Exit(1) + } + fmt.Println(string(body)) +} diff --git a/scripts/cutover/parity/extract_json_parity.sh b/scripts/cutover/parity/extract_json_parity.sh new file mode 100755 index 0000000..6175326 --- /dev/null +++ b/scripts/cutover/parity/extract_json_parity.sh @@ -0,0 +1,143 @@ +#!/usr/bin/env bash +# extract_json_parity — feed identical model-output strings through +# both Rust extract_json AND Go ExtractJSON; diff outputs. +# +# Why: the iteration loop's correctness hinges on extract_json finding +# the same JSON object in the same model output regardless of runtime. +# A divergence here means a model output that one runtime accepts and +# the other rejects (or worse, both accept but parse differently). +# +# Approach: +# 1. Run cargo test -p gateway extract_json to assert the LIVE Rust +# function still passes its own unit tests (substrate gate) +# 2. For each fixture (input, label) tuple: +# Rust: ./target/release/parity_extract_json < fixture +# Go: ./bin/parity_extract_json_go < fixture +# Compare {matched, value} JSON outputs +# 3. Emit a markdown report with per-fixture matches/diffs +# +# Outputs: reports/cutover/gauntlet_2026-05-02/parity/extract_json_parity.md +# +# Env overrides: +# RUST_REPO=/home/profit/lakehouse +# RUST_BIN=$RUST_REPO/target/release/parity_extract_json + +set -uo pipefail +cd "$(dirname "$0")/../../.." + +RUST_REPO="${RUST_REPO:-/home/profit/lakehouse}" +RUST_BIN="${RUST_BIN:-$RUST_REPO/target/release/parity_extract_json}" +GO_BIN="${GO_BIN:-./bin/parity_extract_json_go}" +OUT_DIR="reports/cutover/gauntlet_2026-05-02/parity" +mkdir -p "$OUT_DIR" +OUT="$OUT_DIR/extract_json_parity.md" + +export PATH="$PATH:/usr/local/go/bin" + +# ── Build / verify both sides ─────────────────────────────────────── +if [ ! -x "$RUST_BIN" ]; then + echo "[extract-json-parity] building Rust helper..." + (cd "$RUST_REPO" && cargo build -p gateway --bin parity_extract_json --release 2>&1 | tail -3) +fi +if [ ! -x "$RUST_BIN" ]; then + echo "[extract-json-parity] SKIP: $RUST_BIN missing" + exit 0 +fi + +# Run live Rust unit tests (substrate gate) — ensures our helper +# matches the production extract_json behavior. +echo "[extract-json-parity] running cargo test extract_json (substrate gate)..." +(cd "$RUST_REPO" && cargo test -p gateway --release extract_json 2>&1 | tail -8) > /tmp/rust_extract_test.log +if ! grep -q "test result: ok" /tmp/rust_extract_test.log; then + echo "[extract-json-parity] live Rust tests FAILED — aborting probe" + cat /tmp/rust_extract_test.log + exit 1 +fi +echo " ✓ live Rust extract_json tests PASS" + +# Build Go-side helper from internal/validator.ExtractJSON. +go build -o "$GO_BIN" ./scripts/cutover/parity/extract_json_helper + +# ── Fixture set ───────────────────────────────────────────────────── +# Inline as label||raw pairs. Curated to exercise every documented +# branch: +# - fenced ```json``` block +# - fenced unlabeled ``` block +# - bare-braces with stray prose +# - first-balanced-of-many +# - nested object +# - unicode in string values +# - escaped quotes +# - empty object +# - top-level array (both runtimes return first inner object) +# - no JSON at all +# - malformed JSON-shaped text (depth balanced but invalid syntax) +# - very-large input (~10KB of prose around a tiny object) +declare -a FIXTURES=( + "fenced_json_block||Here's my answer: +\`\`\`json +{\"fills\":[{\"candidate_id\":\"W-1\"}]} +\`\`\` +Done." + "fenced_unlabeled||result: +\`\`\` +{\"k\":\"v\"} +\`\`\`" + "bare_braces||Here you go: {\"fills\":[{\"candidate_id\":\"W-2\"}]}" + "first_of_many||{\"a\":1} then {\"b\":2}" + "nested||prefix {\"outer\":{\"inner\":[1,2,3]},\"x\":\"y\"} suffix" + "unicode||{\"name\":\"Café résumé\",\"emoji\":\"⭐\"}" + "escaped_quotes||{\"msg\":\"she said \\\"hello\\\"\"}" + "empty_object||{}" + "array_of_objects||[{\"a\":1},{\"b\":2}]" + "no_json||just prose, no json" + "depth_balanced_invalid||{not a key: still not}" + "trailing_garbage||{\"k\":\"v\"} and then 5} more } stuff" +) + +TOTAL=0; MATCH=0; DIFF=0 +DIFF_DETAIL="" + +for entry in "${FIXTURES[@]}"; do + IFS='||' read -r label raw <<<"$entry" + TOTAL=$((TOTAL+1)) + rust_out=$(printf '%s' "$raw" | "$RUST_BIN" 2>/dev/null || echo "RUST_ERROR") + go_out=$(printf '%s' "$raw" | "$GO_BIN" 2>/dev/null || echo "GO_ERROR") + # Normalize JSON serialization (key order) before comparing. + rust_norm=$(echo "$rust_out" | jq -cS . 2>/dev/null || echo "$rust_out") + go_norm=$(echo "$go_out" | jq -cS . 2>/dev/null || echo "$go_out") + if [ "$rust_norm" = "$go_norm" ]; then + MATCH=$((MATCH+1)) + else + DIFF=$((DIFF+1)) + raw_short=$(printf '%s' "$raw" | head -c 120 | tr '\n' ' ') + DIFF_DETAIL="$DIFF_DETAIL"$'\n\n'"### $label"$'\n''**Input (first 120 chars):** `'"$raw_short"'`'$'\n\n''**Rust:**'$'\n''```json'$'\n'"$rust_norm"$'\n''```'$'\n\n''**Go:**'$'\n''```json'$'\n'"$go_norm"$'\n''```' + fi +done + +# ── Report ────────────────────────────────────────────────────────── +{ + echo "# extract_json parity probe — Rust vs Go" + echo + echo "**Date:** $(date -u +%Y-%m-%dT%H:%M:%SZ)" + echo "**Rust helper:** \`$RUST_BIN\` (links live \`gateway::v1::iterate::extract_json\`)" + echo "**Go helper:** \`$GO_BIN\` (links live \`internal/validator.ExtractJSON\`)" + echo + echo "Identical model-output strings → both runtimes' \`extract_json\`." + echo "Match = identical \`{matched, value}\` JSON output." + echo + echo "**Substrate gate:** \`cargo test -p gateway extract_json\` PASS before probe." + echo + echo "**Tally:** $MATCH match · $DIFF diff (out of $TOTAL fixtures)" + if [ -n "$DIFF_DETAIL" ]; then + echo + echo "## Divergences" + echo "$DIFF_DETAIL" + else + echo + echo "_No divergences — extract_json parity holds across all fixtures._" + fi +} > "$OUT" + +echo "[parity] extract_json: $MATCH match / $DIFF diff (out of $TOTAL) → $OUT" +[ "$DIFF" -eq 0 ] diff --git a/scripts/cutover/parity/materializer_parity.sh b/scripts/cutover/parity/materializer_parity.sh new file mode 100755 index 0000000..6fc5369 --- /dev/null +++ b/scripts/cutover/parity/materializer_parity.sh @@ -0,0 +1,180 @@ +#!/usr/bin/env bash +# materializer_parity — run Bun + Go materializer against an identical +# synthetic root, diff the resulting data/evidence/ JSONL files. +# +# This validates the parity claim from the 2026-05-02 port: "on-wire +# JSON shape matches TS so Bun and Go runs are interchangeable." Any +# divergence here is a finding the architecture comparison should +# record (precedent: 2026-05-02 validator parity probe surfaced the +# Rust serde-tagged-enum vs Go flat-struct error envelope gap). +# +# Approach: +# 1. Set up a temp ROOT with a fixed source data/_kb/distilled_facts.jsonl +# + observer_escalations.jsonl (small, every transform field exercised) +# 2. Run Bun materializer: +# bun run /home/profit/lakehouse/scripts/distillation/build_evidence_index.ts +# against TS-side ROOT (TS expects a real lakehouse repo layout) +# 3. Run Go materializer: +# ./bin/materializer -root +# against the same ROOT (after wiping evidence/ between runs) +# 4. Diff the output JSONL files, normalized for non-deterministic +# fields (provenance.recorded_at, ordering). +# +# Outputs: reports/cutover/gauntlet_2026-05-02/parity/materializer_parity.md +# +# Exit 0 = byte-equal (modulo timestamps); exit non-zero = drift. +# +# Env overrides: +# RUST_REPO=/home/profit/lakehouse # Rust legacy repo +# GO_BIN=./bin/materializer # Go binary (built per-call) + +set -uo pipefail +cd "$(dirname "$0")/../../.." + +RUST_REPO="${RUST_REPO:-/home/profit/lakehouse}" +GO_BIN="${GO_BIN:-./bin/materializer}" +OUT_DIR="reports/cutover/gauntlet_2026-05-02/parity" +mkdir -p "$OUT_DIR" +OUT="$OUT_DIR/materializer_parity.md" + +# Build Go materializer fresh. +export PATH="$PATH:/usr/local/go/bin" +go build -o "$GO_BIN" ./cmd/materializer + +# Locate Bun. Skip with an explicit message if it's missing (CI without bun). +if ! command -v bun >/dev/null 2>&1; then + echo "[materializer-parity] SKIP: bun not on PATH" + exit 0 +fi + +# Confirm Rust-side materializer is present. +TS_MAT="$RUST_REPO/scripts/distillation/build_evidence_index.ts" +if [ ! -f "$TS_MAT" ]; then + echo "[materializer-parity] SKIP: $TS_MAT not found" + exit 0 +fi + +ROOT="$(mktemp -d)" +trap 'rm -rf "$ROOT"' EXIT INT TERM + +mkdir -p "$ROOT/data/_kb" +# Synthetic distilled_facts — exercises the simplest transform shape. +cat > "$ROOT/data/_kb/distilled_facts.jsonl" < "$ROOT/data/_kb/observer_escalations.jsonl" < /tmp/bun_mat.log 2>&1 || { + echo "[materializer-parity] bun run failed:" + tail -30 /tmp/bun_mat.log + exit 1 + } + +# ── Go run ───────────────────────────────────────────────────────── +GO_ROOT="$ROOT/go_side" +mkdir -p "$GO_ROOT/data/_kb" +cp "$ROOT/data/_kb/"* "$GO_ROOT/data/_kb/" + +echo "[materializer-parity] running Go materializer..." +"$GO_BIN" -root "$GO_ROOT" > /tmp/go_mat.log 2>&1 || { + echo "[materializer-parity] go run failed:" + tail -30 /tmp/go_mat.log + exit 1 +} + +# ── Find output day-partition ────────────────────────────────────── +# Both runs use today's UTC date. Look up the partition. +TODAY="$(date -u +%Y/%m/%d)" +BUN_OUT="$BUN_ROOT/data/evidence/$TODAY" +GO_OUT="$GO_ROOT/data/evidence/$TODAY" + +if [ ! -d "$BUN_OUT" ]; then + echo "[materializer-parity] no Bun output dir: $BUN_OUT" + ls -la "$BUN_ROOT/data/evidence" 2>/dev/null || true + exit 1 +fi +if [ ! -d "$GO_OUT" ]; then + echo "[materializer-parity] no Go output dir: $GO_OUT" + exit 1 +fi + +# ── Normalize + diff per source-stem ─────────────────────────────── +# Stripped fields: +# provenance.recorded_at — different per-run wall clock +# +# Sorted by sig_hash so dedup ordering can't matter. +normalize() { + jq -c -S 'del(.provenance.recorded_at)' "$1" 2>/dev/null \ + | sort +} + +TOTAL=0; MATCH=0; DIFF=0 +DIFF_DETAIL="" +for f in "$BUN_OUT"/*.jsonl; do + stem=$(basename "$f" .jsonl) + go_f="$GO_OUT/$stem.jsonl" + TOTAL=$((TOTAL+1)) + if [ ! -f "$go_f" ]; then + DIFF=$((DIFF+1)) + DIFF_DETAIL="$DIFF_DETAIL"$'\n'"- $stem: present in Bun, missing in Go" + continue + fi + bun_norm=$(normalize "$f") + go_norm=$(normalize "$go_f") + if [ "$bun_norm" = "$go_norm" ]; then + MATCH=$((MATCH+1)) + else + DIFF=$((DIFF+1)) + # Capture a small diff for the report. + diff_block="$(diff <(echo "$bun_norm") <(echo "$go_norm") | head -40)" + DIFF_DETAIL="$DIFF_DETAIL"$'\n\n'"### $stem"$'\n''```diff'$'\n'"$diff_block"$'\n''```' + fi +done + +# ── Write report ─────────────────────────────────────────────────── +{ + echo "# Materializer parity probe — Bun vs Go" + echo + echo "**Date:** $(date -u +%Y-%m-%dT%H:%M:%SZ)" + echo "**Bun:** \`$TS_MAT\`" + echo "**Go:** \`$GO_BIN\`" + echo + echo "Identical \`data/_kb/\` source → both runtimes' materializer." + echo "Match = JSONL byte-equal after normalizing \`provenance.recorded_at\`" + echo "(per-run wall clock) + sorted line order (dedup ordering)." + echo + echo "**Tally:** $MATCH match · $DIFF diff (out of $TOTAL stems)" + if [ -n "$DIFF_DETAIL" ]; then + echo + echo "## Divergences" + echo "$DIFF_DETAIL" + else + echo + echo "_No divergences — on-wire JSON parity holds._" + fi +} > "$OUT" + +echo "[parity] materializer: $MATCH match / $DIFF diff (out of $TOTAL) → $OUT" +[ "$DIFF" -eq 0 ]