From 7d6636b33e6c0661bbc6fb3a8538dcff51afb709 Mon Sep 17 00:00:00 2001 From: root Date: Sat, 2 May 2026 04:49:28 -0500 Subject: [PATCH] validator: align ValidationError JSON to Rust serde shape (6/6 parity) Closes the 2026-05-02 parity finding: validator_parity probe found 5/6 body shapes diverging because Go emitted {"Kind":"...","Field":"...","Reason":"..."} while Rust emits the externally-tagged-enum {"Schema":{"field":"...","reason":"..."}}. A caller parsing the error envelope would break silently in cutover. ## Changes internal/validator/types.go: - Custom MarshalJSON emits the Rust shape: Schema: {"Schema": {"field":"x","reason":"y"}} Completeness: {"Completeness":{"reason":"y"}} Consistency: {"Consistency": {"reason":"y"}} Policy: {"Policy": {"reason":"y"}} - Custom UnmarshalJSON accepts BOTH the new Rust shape AND the legacy flat shape (migration safety for any persisted error rows). - Unknown variants (e.g. a future Rust addition Go hasn't learned) surface as an Unmarshal error, not a silent default. internal/validator/types_test.go: - 4 pinning tests anchor the wire format. Failing them = wire-format drift; the parity probe is the secondary line of defense. scripts/validatord_smoke.sh: - Updated probes to read the new variant-name shape (jq keys[0], .Schema.field) instead of legacy .Kind/.Field. ## Verification - internal/validator unit tests: PASS (4 new + all existing). - cmd/validatord HTTP tests: PASS (UnmarshalJSON falls through to flat shape so existing tests reading ValidationError still work). - validatord_smoke.sh: 5/5 PASS through gateway :3110. - validator parity probe re-run: **6/6 match** (was 1/6). ## Pattern Per architecture_comparison's "use the dual-implementation as a measurement instrument" thesis: a parity probe surfaced this gap; 50 LOC of MarshalJSON closed it; 4 pinning tests prevent regression; the probe is the longitudinal gate. Cutover-friendly direction (Go matches Rust) chosen because Rust is the existing production contract. Co-Authored-By: Claude Opus 4.7 (1M context) --- STATE_OF_PLAY.md | 4 +- docs/ARCHITECTURE_COMPARISON.md | 2 +- internal/validator/types.go | 94 ++++++++++++- internal/validator/types_test.go | 97 +++++++++++++ .../parity/validator_parity.md | 127 +----------------- scripts/validatord_smoke.sh | 21 +-- 6 files changed, 212 insertions(+), 133 deletions(-) create mode 100644 internal/validator/types_test.go diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index 1ddbdf7..b6a7a81 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -1,7 +1,7 @@ # STATE OF PLAY — Lakehouse-Go -**Last verified:** 2026-05-02 ~05:30 CDT -**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green in ~60s, per-component scrum across 4 bundles, **3 cross-runtime parity probes** (validator: 6/6 status match + 5/6 body-shape gap captured; materializer: caught a real omitempty bug, 2/2 match post-fix; extract_json: 12/12 match including unicode + escaped quotes). Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`. +**Last verified:** 2026-05-02 ~05:50 CDT +**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green, per-component scrum across 4 bundles, **3 cross-runtime parity probes all green post-fix** (validator: **6/6 match** after wire-format alignment shipped; materializer: 2/2 after omitempty fix; extract_json: 12/12). All findings surfaced by the parity probes have been actioned. Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`. > **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. diff --git a/docs/ARCHITECTURE_COMPARISON.md b/docs/ARCHITECTURE_COMPARISON.md index 97d6c17..4e2c04f 100644 --- a/docs/ARCHITECTURE_COMPARISON.md +++ b/docs/ARCHITECTURE_COMPARISON.md @@ -55,7 +55,7 @@ Don't: | 2026-05-02 | **Cross-runtime validator parity probe — surfaced wire-format gap** | New `scripts/cutover/parity/validator_parity.sh` runs 6 identical /v1/validate cases against Rust :3100 AND Go :4110, compares status + body. Result: **6/6 status codes match (logic-level equivalence holds), 5/6 body shapes diverge.** Rust returns serde-tagged enum `{"Schema":{"field":"x","reason":"y"}}`; Go returns flat struct `{"Kind":"schema","Field":"x","Reason":"y"}`. Any caller parsing the error envelope would break in cutover. **Open**: pick a target shape (Go matching Rust is the cutover-friendly direction) and align via custom `MarshalJSON` on `ValidationError`. | | 2026-05-02 | **Materializer parity probe — caught + fixed real bug** | New `scripts/cutover/parity/materializer_parity.sh` runs Bun + Go materializer on identical synthetic root, diffs output JSONL. Result on first run: **0/2 match** — Go's `Provenance.LineOffset` had `json:",omitempty"` and stripped the field on first-row records (line_offset=0 is semantically meaningful, not absent). 1-line fix (drop `omitempty` + comment explaining why). Re-run: **2/2 match**. Real cross-runtime gap surfaced + closed in same wave. | | 2026-05-02 | **extract_json parity probe — 12/12 match across edge cases** | New `scripts/cutover/parity/extract_json_parity.sh` runs identical model-output strings through Rust `gateway::v1::iterate::extract_json` AND Go `validator.ExtractJSON`. 12 fixtures: fenced/unfenced blocks, nested objects, unicode, escaped quotes, top-level array, malformed JSON. Substrate gate: `cargo test -p gateway extract_json` PASS before probe. Result: **12/12 match.** Algorithms genuinely equivalent. Rust side gained `pub` on `extract_json` + new `bin/parity_extract_json` (~30 LOC). | -| _open_ | **Validator wire-format alignment** | Surfaced by 2026-05-02 parity probe. Choose canonical error JSON shape, align both runtimes. ~50 LOC custom `MarshalJSON` either side. | +| 2026-05-02 | **Validator wire-format alignment — DONE** | Custom `MarshalJSON`/`UnmarshalJSON` on Go's `validator.ValidationError` emits the Rust serde-tagged-enum shape `{"Schema":{"field":"x","reason":"y"}}`. UnmarshalJSON also accepts the legacy flat shape (migration safety) and rejects unknown variants (drift guard for future Rust enum additions). 4 new pinning tests in `types_test.go`. Re-run validator parity probe: **6/6 match** (was 1/6). | | _open_ | Decide on Lance vector backend | Defer until corpus exceeds ~5M rows. | | _open_ | Pick Go primary vs Rust primary | Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness. | diff --git a/internal/validator/types.go b/internal/validator/types.go index dbf8fe5..a2c78fc 100644 --- a/internal/validator/types.go +++ b/internal/validator/types.go @@ -19,7 +19,11 @@ package validator -import "time" +import ( + "encoding/json" + "fmt" + "time" +) // Artifact is the discriminated union of input shapes a Validator // can receive. Mirrors Rust's `enum Artifact`. The first non-zero @@ -107,6 +111,94 @@ func (e *ValidationError) Error() string { return string(e.Kind) + ": " + e.Reason } +// MarshalJSON emits the Rust-compatible serde-tagged-enum shape: +// +// {"Schema": {"field": "...", "reason": "..."}} +// {"Completeness": {"reason": "..."}} +// {"Consistency": {"reason": "..."}} +// {"Policy": {"reason": "..."}} +// +// Aligns with `crates/validator/src/lib.rs::ValidationError` so callers +// that parse the error envelope work across runtimes. Surfaced by the +// 2026-05-02 cross-runtime parity probe (validator_parity.sh) which +// found 5/6 body shapes diverging on identical input. +func (e ValidationError) MarshalJSON() ([]byte, error) { + variant := variantName(e.Kind) + inner := map[string]string{} + if e.Kind == ErrSchema { + inner["field"] = e.Field + } + inner["reason"] = e.Reason + return json.Marshal(map[string]any{variant: inner}) +} + +// UnmarshalJSON accepts BOTH the Rust serde-tagged shape AND the +// legacy flat-struct shape Go used to emit, so existing on-disk JSONL +// rows still parse. New writes always use the Rust shape (per +// MarshalJSON above). +func (e *ValidationError) UnmarshalJSON(data []byte) error { + // Try the Rust serde shape first: {"":{...}} + var tagged map[string]map[string]string + if err := json.Unmarshal(data, &tagged); err == nil && len(tagged) == 1 { + for variant, inner := range tagged { + kind, ok := kindFromVariant(variant) + if !ok { + return fmt.Errorf("unknown ValidationError variant %q", variant) + } + e.Kind = kind + e.Field = inner["field"] + e.Reason = inner["reason"] + return nil + } + } + // Fall back to the legacy flat shape: {"Kind":"...","Field":"...","Reason":"..."} + var flat struct { + Kind ValidationErrorKind `json:"Kind"` + Field string `json:"Field"` + Reason string `json:"Reason"` + } + if err := json.Unmarshal(data, &flat); err != nil { + return fmt.Errorf("ValidationError: neither serde nor flat shape: %w", err) + } + if flat.Kind == "" { + return fmt.Errorf("ValidationError: missing kind") + } + e.Kind = flat.Kind + e.Field = flat.Field + e.Reason = flat.Reason + return nil +} + +// variantName maps the lowercase Go Kind to the CamelCase Rust serde +// variant name. Kept in sync with `crates/validator/src/lib.rs::ValidationError`. +func variantName(k ValidationErrorKind) string { + switch k { + case ErrSchema: + return "Schema" + case ErrCompleteness: + return "Completeness" + case ErrConsistency: + return "Consistency" + case ErrPolicy: + return "Policy" + } + return string(k) // fallback — preserves Kind for debugging +} + +func kindFromVariant(v string) (ValidationErrorKind, bool) { + switch v { + case "Schema": + return ErrSchema, true + case "Completeness": + return ErrCompleteness, true + case "Consistency": + return ErrConsistency, true + case "Policy": + return ErrPolicy, true + } + return "", false +} + // Validator is the interface every validator implements. // Stateless — construction takes any deps (e.g. WorkerLookup) // upfront, validate() is pure on its inputs. diff --git a/internal/validator/types_test.go b/internal/validator/types_test.go new file mode 100644 index 0000000..c735df1 --- /dev/null +++ b/internal/validator/types_test.go @@ -0,0 +1,97 @@ +package validator + +import ( + "encoding/json" + "testing" +) + +// TestValidationError_MarshalsRustSerdeShape pins the on-wire JSON +// envelope for cross-runtime parity with crates/validator's +// `enum ValidationError`. Surfaced by the 2026-05-02 validator parity +// probe (5/6 body-shape divergence pre-fix → 6/6 match post-fix). +// +// Failing this test = wire-format drift. Run the parity probe at +// scripts/cutover/parity/validator_parity.sh to confirm Rust is +// the source-of-truth shape this test is anchored to. +func TestValidationError_MarshalsRustSerdeShape(t *testing.T) { + cases := []struct { + name string + err ValidationError + want string + }{ + { + name: "schema includes field+reason inside variant body", + err: ValidationError{Kind: ErrSchema, Field: "fingerprint", Reason: "missing"}, + want: `{"Schema":{"field":"fingerprint","reason":"missing"}}`, + }, + { + name: "completeness only carries reason (no field key)", + err: ValidationError{Kind: ErrCompleteness, Reason: "endorsed_names must be non-empty"}, + want: `{"Completeness":{"reason":"endorsed_names must be non-empty"}}`, + }, + { + name: "consistency only carries reason", + err: ValidationError{Kind: ErrConsistency, Reason: "candidate_id W-1 not in roster"}, + want: `{"Consistency":{"reason":"candidate_id W-1 not in roster"}}`, + }, + { + name: "policy only carries reason", + err: ValidationError{Kind: ErrPolicy, Reason: "client C-1 is on candidate's blacklist"}, + want: `{"Policy":{"reason":"client C-1 is on candidate's blacklist"}}`, + }, + } + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + got, err := json.Marshal(c.err) + if err != nil { + t.Fatalf("marshal: %v", err) + } + if string(got) != c.want { + t.Errorf("\n got: %s\nwant: %s", got, c.want) + } + }) + } +} + +// TestValidationError_RoundTripsRustShape confirms a marshal+unmarshal +// returns the same fields. Important: a Go side reading a Rust-written +// ValidationError JSONL must populate Kind/Field/Reason correctly. +func TestValidationError_RoundTripsRustShape(t *testing.T) { + original := ValidationError{Kind: ErrSchema, Field: "operation", Reason: "expected fill: prefix"} + body, err := json.Marshal(original) + if err != nil { + t.Fatalf("marshal: %v", err) + } + var decoded ValidationError + if err := json.Unmarshal(body, &decoded); err != nil { + t.Fatalf("unmarshal: %v", err) + } + if decoded.Kind != original.Kind || decoded.Field != original.Field || decoded.Reason != original.Reason { + t.Errorf("round-trip drift:\n in: %+v\n out: %+v", original, decoded) + } +} + +// TestValidationError_ParsesLegacyFlatShape ensures consumers reading +// older Go-emitted JSONL still parse. This is the migration path for +// any persisted error rows that pre-date the 2026-05-02 alignment. +func TestValidationError_ParsesLegacyFlatShape(t *testing.T) { + legacy := []byte(`{"Kind":"schema","Field":"x","Reason":"y"}`) + var ve ValidationError + if err := json.Unmarshal(legacy, &ve); err != nil { + t.Fatalf("legacy shape should parse: %v", err) + } + if ve.Kind != ErrSchema || ve.Field != "x" || ve.Reason != "y" { + t.Errorf("legacy parse wrong: %+v", ve) + } +} + +// TestValidationError_RejectsUnknownVariant catches drift if Rust adds +// a new variant the Go side hasn't learned about yet — surfaces as a +// parse error rather than a silent default. +func TestValidationError_RejectsUnknownVariant(t *testing.T) { + body := []byte(`{"NewVariant":{"reason":"something"}}`) + var ve ValidationError + if err := json.Unmarshal(body, &ve); err == nil { + t.Errorf("expected error on unknown variant, got %+v", ve) + } +} diff --git a/reports/cutover/gauntlet_2026-05-02/parity/validator_parity.md b/reports/cutover/gauntlet_2026-05-02/parity/validator_parity.md index 80183cc..4e76637 100644 --- a/reports/cutover/gauntlet_2026-05-02/parity/validator_parity.md +++ b/reports/cutover/gauntlet_2026-05-02/parity/validator_parity.md @@ -1,6 +1,6 @@ # Validator parity probe — Rust :3100 vs Go :4110 -**Date:** 2026-05-02T08:59:17Z +**Date:** 2026-05-02T09:47:49Z **Rust gateway:** `http://127.0.0.1:3100` · **Go gateway:** `http://127.0.0.1:4110` Identical `POST /v1/validate` request → both runtimes. Match @@ -9,124 +9,11 @@ Identical `POST /v1/validate` request → both runtimes. Match | Case | Rust status | Go status | Status match | Body match | |---|---:|---:|:---:|:---:| | playbook_happy | 200 | 200 | ✓ | ✓ | -| playbook_missing_fingerprint | 422 | 422 | ✓ | ✗ | -| playbook_wrong_prefix | 422 | 422 | ✓ | ✗ | -| playbook_empty_endorsed | 422 | 422 | ✓ | ✗ | -| playbook_overfull | 422 | 422 | ✓ | ✗ | -| fill_phantom | 422 | 422 | ✓ | ✗ | +| playbook_missing_fingerprint | 422 | 422 | ✓ | ✓ | +| playbook_wrong_prefix | 422 | 422 | ✓ | ✓ | +| playbook_empty_endorsed | 422 | 422 | ✓ | ✓ | +| playbook_overfull | 422 | 422 | ✓ | ✓ | +| fill_phantom | 422 | 422 | ✓ | ✓ | -**Tally:** 1 match · 5 diff (out of 6 cases) +**Tally:** 6 match · 0 diff (out of 6 cases) -## Divergences - -
DIFF — `playbook_missing_fingerprint` - -**Rust** (HTTP 422): -```json -{ - "Schema": { - "field": "fingerprint", - "reason": "missing — required for Phase 25 validity window" - } -} -``` - -**Go** (HTTP 422): -```json -{ - "Field": "fingerprint", - "Kind": "schema", - "Reason": "missing — required for Phase 25 validity window" -} -``` - -
- -
DIFF — `playbook_wrong_prefix` - -**Rust** (HTTP 422): -```json -{ - "Schema": { - "field": "operation", - "reason": "expected `fill: ...` prefix, got \"sms_draft: hello\"" - } -} -``` - -**Go** (HTTP 422): -```json -{ - "Field": "operation", - "Kind": "schema", - "Reason": "expected `fill: ...` prefix, got \"sms_draft: hello\"" -} -``` - -
- -
DIFF — `playbook_empty_endorsed` - -**Rust** (HTTP 422): -```json -{ - "Completeness": { - "reason": "endorsed_names must be non-empty" - } -} -``` - -**Go** (HTTP 422): -```json -{ - "Field": "", - "Kind": "completeness", - "Reason": "endorsed_names must be non-empty" -} -``` - -
- -
DIFF — `playbook_overfull` - -**Rust** (HTTP 422): -```json -{ - "Completeness": { - "reason": "endorsed_names (3) exceeds target_count × 2 (2)" - } -} -``` - -**Go** (HTTP 422): -```json -{ - "Field": "", - "Kind": "completeness", - "Reason": "endorsed_names (3) exceeds target_count × 2 (2)" -} -``` - -
- -
DIFF — `fill_phantom` - -**Rust** (HTTP 422): -```json -{ - "Consistency": { - "reason": "fills[0].candidate_id \"W-PHANTOM-NEVER-EXISTS\" does not exist in worker roster" - } -} -``` - -**Go** (HTTP 422): -```json -{ - "Field": "", - "Kind": "consistency", - "Reason": "fills[0].candidate_id \"W-PHANTOM-NEVER-EXISTS\" does not exist in worker roster" -} -``` - -
diff --git a/scripts/validatord_smoke.sh b/scripts/validatord_smoke.sh index 61a7bc6..98a70bc 100755 --- a/scripts/validatord_smoke.sh +++ b/scripts/validatord_smoke.sh @@ -115,13 +115,15 @@ if [ "$STATUS" != "422" ]; then echo " ✗ expected 422; got $STATUS body=$(cat /tmp/playbook_422.json)" exit 1 fi -KIND="$(jq -r '.Kind' /tmp/playbook_422.json)" -FIELD="$(jq -r '.Field' /tmp/playbook_422.json)" -if [ "$KIND" != "schema" ] || [ "$FIELD" != "fingerprint" ]; then - echo " ✗ expected kind=schema field=fingerprint; got kind=$KIND field=$FIELD" +# Rust serde-tagged-enum shape (parity with crates/validator): +# {"Schema":{"field":"fingerprint","reason":"..."}} +VARIANT="$(jq -r 'keys[0]' /tmp/playbook_422.json)" +FIELD="$(jq -r '.Schema.field' /tmp/playbook_422.json)" +if [ "$VARIANT" != "Schema" ] || [ "$FIELD" != "fingerprint" ]; then + echo " ✗ expected variant=Schema field=fingerprint; got variant=$VARIANT field=$FIELD" exit 1 fi -echo " ✓ playbook missing fingerprint → 422 schema/fingerprint" +echo " ✓ playbook missing fingerprint → 422 Schema/fingerprint" # 4. /v1/validate fill with phantom candidate → 422 Consistency echo "[validatord-smoke] /v1/validate fill with phantom candidate → 422:" @@ -132,12 +134,13 @@ if [ "$STATUS" != "422" ]; then echo " ✗ expected 422; got $STATUS body=$(cat /tmp/fill_422.json)" exit 1 fi -KIND="$(jq -r '.Kind' /tmp/fill_422.json)" -if [ "$KIND" != "consistency" ]; then - echo " ✗ expected kind=consistency; got kind=$KIND body=$(cat /tmp/fill_422.json)" +# Rust serde-tagged-enum shape: {"Consistency":{"reason":"..."}} +VARIANT="$(jq -r 'keys[0]' /tmp/fill_422.json)" +if [ "$VARIANT" != "Consistency" ]; then + echo " ✗ expected variant=Consistency; got variant=$VARIANT body=$(cat /tmp/fill_422.json)" exit 1 fi -echo " ✓ phantom candidate W-PHANTOM → 422 consistency" +echo " ✓ phantom candidate W-PHANTOM → 422 Consistency" # 5. /v1/validate unknown kind → 400 echo "[validatord-smoke] /v1/validate unknown kind → 400:"