validator: align ValidationError JSON to Rust serde shape (6/6 parity)

Closes the 2026-05-02 parity finding: validator_parity probe found
5/6 body shapes diverging because Go emitted {"Kind":"...","Field":"...","Reason":"..."}
while Rust emits the externally-tagged-enum {"Schema":{"field":"...","reason":"..."}}.
A caller parsing the error envelope would break silently in cutover.

## Changes

internal/validator/types.go:
- Custom MarshalJSON emits the Rust shape:
    Schema:       {"Schema":      {"field":"x","reason":"y"}}
    Completeness: {"Completeness":{"reason":"y"}}
    Consistency:  {"Consistency": {"reason":"y"}}
    Policy:       {"Policy":      {"reason":"y"}}
- Custom UnmarshalJSON accepts BOTH the new Rust shape AND the legacy
  flat shape (migration safety for any persisted error rows).
- Unknown variants (e.g. a future Rust addition Go hasn't learned)
  surface as an Unmarshal error, not a silent default.

internal/validator/types_test.go:
- 4 pinning tests anchor the wire format. Failing them = wire-format
  drift; the parity probe is the secondary line of defense.

scripts/validatord_smoke.sh:
- Updated probes to read the new variant-name shape (jq keys[0],
  .Schema.field) instead of legacy .Kind/.Field.

## Verification

- internal/validator unit tests: PASS (4 new + all existing).
- cmd/validatord HTTP tests: PASS (UnmarshalJSON falls through to flat
  shape so existing tests reading ValidationError still work).
- validatord_smoke.sh: 5/5 PASS through gateway :3110.
- validator parity probe re-run: **6/6 match** (was 1/6).

## Pattern

Per architecture_comparison's "use the dual-implementation as a
measurement instrument" thesis: a parity probe surfaced this gap;
50 LOC of MarshalJSON closed it; 4 pinning tests prevent regression;
the probe is the longitudinal gate. Cutover-friendly direction (Go
matches Rust) chosen because Rust is the existing production
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-02 04:49:28 -05:00
parent b0c8a3f227
commit 7d6636b33e
6 changed files with 212 additions and 133 deletions

View File

@ -1,7 +1,7 @@
# STATE OF PLAY — Lakehouse-Go
**Last verified:** 2026-05-02 ~05:30 CDT
**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green in ~60s, per-component scrum across 4 bundles, **3 cross-runtime parity probes** (validator: 6/6 status match + 5/6 body-shape gap captured; materializer: caught a real omitempty bug, 2/2 match post-fix; extract_json: 12/12 match including unicode + escaped quotes). Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`.
**Last verified:** 2026-05-02 ~05:50 CDT
**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green, per-component scrum across 4 bundles, **3 cross-runtime parity probes all green post-fix** (validator: **6/6 match** after wire-format alignment shipped; materializer: 2/2 after omitempty fix; extract_json: 12/12). All findings surfaced by the parity probes have been actioned. Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`.
> **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.

View File

@ -55,7 +55,7 @@ Don't:
| 2026-05-02 | **Cross-runtime validator parity probe — surfaced wire-format gap** | New `scripts/cutover/parity/validator_parity.sh` runs 6 identical /v1/validate cases against Rust :3100 AND Go :4110, compares status + body. Result: **6/6 status codes match (logic-level equivalence holds), 5/6 body shapes diverge.** Rust returns serde-tagged enum `{"Schema":{"field":"x","reason":"y"}}`; Go returns flat struct `{"Kind":"schema","Field":"x","Reason":"y"}`. Any caller parsing the error envelope would break in cutover. **Open**: pick a target shape (Go matching Rust is the cutover-friendly direction) and align via custom `MarshalJSON` on `ValidationError`. |
| 2026-05-02 | **Materializer parity probe — caught + fixed real bug** | New `scripts/cutover/parity/materializer_parity.sh` runs Bun + Go materializer on identical synthetic root, diffs output JSONL. Result on first run: **0/2 match** — Go's `Provenance.LineOffset` had `json:",omitempty"` and stripped the field on first-row records (line_offset=0 is semantically meaningful, not absent). 1-line fix (drop `omitempty` + comment explaining why). Re-run: **2/2 match**. Real cross-runtime gap surfaced + closed in same wave. |
| 2026-05-02 | **extract_json parity probe — 12/12 match across edge cases** | New `scripts/cutover/parity/extract_json_parity.sh` runs identical model-output strings through Rust `gateway::v1::iterate::extract_json` AND Go `validator.ExtractJSON`. 12 fixtures: fenced/unfenced blocks, nested objects, unicode, escaped quotes, top-level array, malformed JSON. Substrate gate: `cargo test -p gateway extract_json` PASS before probe. Result: **12/12 match.** Algorithms genuinely equivalent. Rust side gained `pub` on `extract_json` + new `bin/parity_extract_json` (~30 LOC). |
| _open_ | **Validator wire-format alignment** | Surfaced by 2026-05-02 parity probe. Choose canonical error JSON shape, align both runtimes. ~50 LOC custom `MarshalJSON` either side. |
| 2026-05-02 | **Validator wire-format alignment — DONE** | Custom `MarshalJSON`/`UnmarshalJSON` on Go's `validator.ValidationError` emits the Rust serde-tagged-enum shape `{"Schema":{"field":"x","reason":"y"}}`. UnmarshalJSON also accepts the legacy flat shape (migration safety) and rejects unknown variants (drift guard for future Rust enum additions). 4 new pinning tests in `types_test.go`. Re-run validator parity probe: **6/6 match** (was 1/6). |
| _open_ | Decide on Lance vector backend | Defer until corpus exceeds ~5M rows. |
| _open_ | Pick Go primary vs Rust primary | Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness. |

View File

@ -19,7 +19,11 @@
package validator
import "time"
import (
"encoding/json"
"fmt"
"time"
)
// Artifact is the discriminated union of input shapes a Validator
// can receive. Mirrors Rust's `enum Artifact`. The first non-zero
@ -107,6 +111,94 @@ func (e *ValidationError) Error() string {
return string(e.Kind) + ": " + e.Reason
}
// MarshalJSON emits the Rust-compatible serde-tagged-enum shape:
//
// {"Schema": {"field": "...", "reason": "..."}}
// {"Completeness": {"reason": "..."}}
// {"Consistency": {"reason": "..."}}
// {"Policy": {"reason": "..."}}
//
// Aligns with `crates/validator/src/lib.rs::ValidationError` so callers
// that parse the error envelope work across runtimes. Surfaced by the
// 2026-05-02 cross-runtime parity probe (validator_parity.sh) which
// found 5/6 body shapes diverging on identical input.
func (e ValidationError) MarshalJSON() ([]byte, error) {
variant := variantName(e.Kind)
inner := map[string]string{}
if e.Kind == ErrSchema {
inner["field"] = e.Field
}
inner["reason"] = e.Reason
return json.Marshal(map[string]any{variant: inner})
}
// UnmarshalJSON accepts BOTH the Rust serde-tagged shape AND the
// legacy flat-struct shape Go used to emit, so existing on-disk JSONL
// rows still parse. New writes always use the Rust shape (per
// MarshalJSON above).
func (e *ValidationError) UnmarshalJSON(data []byte) error {
// Try the Rust serde shape first: {"<Variant>":{...}}
var tagged map[string]map[string]string
if err := json.Unmarshal(data, &tagged); err == nil && len(tagged) == 1 {
for variant, inner := range tagged {
kind, ok := kindFromVariant(variant)
if !ok {
return fmt.Errorf("unknown ValidationError variant %q", variant)
}
e.Kind = kind
e.Field = inner["field"]
e.Reason = inner["reason"]
return nil
}
}
// Fall back to the legacy flat shape: {"Kind":"...","Field":"...","Reason":"..."}
var flat struct {
Kind ValidationErrorKind `json:"Kind"`
Field string `json:"Field"`
Reason string `json:"Reason"`
}
if err := json.Unmarshal(data, &flat); err != nil {
return fmt.Errorf("ValidationError: neither serde nor flat shape: %w", err)
}
if flat.Kind == "" {
return fmt.Errorf("ValidationError: missing kind")
}
e.Kind = flat.Kind
e.Field = flat.Field
e.Reason = flat.Reason
return nil
}
// variantName maps the lowercase Go Kind to the CamelCase Rust serde
// variant name. Kept in sync with `crates/validator/src/lib.rs::ValidationError`.
func variantName(k ValidationErrorKind) string {
switch k {
case ErrSchema:
return "Schema"
case ErrCompleteness:
return "Completeness"
case ErrConsistency:
return "Consistency"
case ErrPolicy:
return "Policy"
}
return string(k) // fallback — preserves Kind for debugging
}
func kindFromVariant(v string) (ValidationErrorKind, bool) {
switch v {
case "Schema":
return ErrSchema, true
case "Completeness":
return ErrCompleteness, true
case "Consistency":
return ErrConsistency, true
case "Policy":
return ErrPolicy, true
}
return "", false
}
// Validator is the interface every validator implements.
// Stateless — construction takes any deps (e.g. WorkerLookup)
// upfront, validate() is pure on its inputs.

View File

@ -0,0 +1,97 @@
package validator
import (
"encoding/json"
"testing"
)
// TestValidationError_MarshalsRustSerdeShape pins the on-wire JSON
// envelope for cross-runtime parity with crates/validator's
// `enum ValidationError`. Surfaced by the 2026-05-02 validator parity
// probe (5/6 body-shape divergence pre-fix → 6/6 match post-fix).
//
// Failing this test = wire-format drift. Run the parity probe at
// scripts/cutover/parity/validator_parity.sh to confirm Rust is
// the source-of-truth shape this test is anchored to.
func TestValidationError_MarshalsRustSerdeShape(t *testing.T) {
cases := []struct {
name string
err ValidationError
want string
}{
{
name: "schema includes field+reason inside variant body",
err: ValidationError{Kind: ErrSchema, Field: "fingerprint", Reason: "missing"},
want: `{"Schema":{"field":"fingerprint","reason":"missing"}}`,
},
{
name: "completeness only carries reason (no field key)",
err: ValidationError{Kind: ErrCompleteness, Reason: "endorsed_names must be non-empty"},
want: `{"Completeness":{"reason":"endorsed_names must be non-empty"}}`,
},
{
name: "consistency only carries reason",
err: ValidationError{Kind: ErrConsistency, Reason: "candidate_id W-1 not in roster"},
want: `{"Consistency":{"reason":"candidate_id W-1 not in roster"}}`,
},
{
name: "policy only carries reason",
err: ValidationError{Kind: ErrPolicy, Reason: "client C-1 is on candidate's blacklist"},
want: `{"Policy":{"reason":"client C-1 is on candidate's blacklist"}}`,
},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
got, err := json.Marshal(c.err)
if err != nil {
t.Fatalf("marshal: %v", err)
}
if string(got) != c.want {
t.Errorf("\n got: %s\nwant: %s", got, c.want)
}
})
}
}
// TestValidationError_RoundTripsRustShape confirms a marshal+unmarshal
// returns the same fields. Important: a Go side reading a Rust-written
// ValidationError JSONL must populate Kind/Field/Reason correctly.
func TestValidationError_RoundTripsRustShape(t *testing.T) {
original := ValidationError{Kind: ErrSchema, Field: "operation", Reason: "expected fill: prefix"}
body, err := json.Marshal(original)
if err != nil {
t.Fatalf("marshal: %v", err)
}
var decoded ValidationError
if err := json.Unmarshal(body, &decoded); err != nil {
t.Fatalf("unmarshal: %v", err)
}
if decoded.Kind != original.Kind || decoded.Field != original.Field || decoded.Reason != original.Reason {
t.Errorf("round-trip drift:\n in: %+v\n out: %+v", original, decoded)
}
}
// TestValidationError_ParsesLegacyFlatShape ensures consumers reading
// older Go-emitted JSONL still parse. This is the migration path for
// any persisted error rows that pre-date the 2026-05-02 alignment.
func TestValidationError_ParsesLegacyFlatShape(t *testing.T) {
legacy := []byte(`{"Kind":"schema","Field":"x","Reason":"y"}`)
var ve ValidationError
if err := json.Unmarshal(legacy, &ve); err != nil {
t.Fatalf("legacy shape should parse: %v", err)
}
if ve.Kind != ErrSchema || ve.Field != "x" || ve.Reason != "y" {
t.Errorf("legacy parse wrong: %+v", ve)
}
}
// TestValidationError_RejectsUnknownVariant catches drift if Rust adds
// a new variant the Go side hasn't learned about yet — surfaces as a
// parse error rather than a silent default.
func TestValidationError_RejectsUnknownVariant(t *testing.T) {
body := []byte(`{"NewVariant":{"reason":"something"}}`)
var ve ValidationError
if err := json.Unmarshal(body, &ve); err == nil {
t.Errorf("expected error on unknown variant, got %+v", ve)
}
}

View File

@ -1,6 +1,6 @@
# Validator parity probe — Rust :3100 vs Go :4110
**Date:** 2026-05-02T08:59:17Z
**Date:** 2026-05-02T09:47:49Z
**Rust gateway:** `http://127.0.0.1:3100` · **Go gateway:** `http://127.0.0.1:4110`
Identical `POST /v1/validate` request → both runtimes. Match
@ -9,124 +9,11 @@ Identical `POST /v1/validate` request → both runtimes. Match
| Case | Rust status | Go status | Status match | Body match |
|---|---:|---:|:---:|:---:|
| playbook_happy | 200 | 200 | ✓ | ✓ |
| playbook_missing_fingerprint | 422 | 422 | ✓ | |
| playbook_wrong_prefix | 422 | 422 | ✓ | |
| playbook_empty_endorsed | 422 | 422 | ✓ | |
| playbook_overfull | 422 | 422 | ✓ | |
| fill_phantom | 422 | 422 | ✓ | |
| playbook_missing_fingerprint | 422 | 422 | ✓ | |
| playbook_wrong_prefix | 422 | 422 | ✓ | |
| playbook_empty_endorsed | 422 | 422 | ✓ | |
| playbook_overfull | 422 | 422 | ✓ | |
| fill_phantom | 422 | 422 | ✓ | |
**Tally:** 1 match · 5 diff (out of 6 cases)
**Tally:** 6 match · 0 diff (out of 6 cases)
## Divergences
<details><summary>DIFF — `playbook_missing_fingerprint`</summary>
**Rust** (HTTP 422):
```json
{
"Schema": {
"field": "fingerprint",
"reason": "missing — required for Phase 25 validity window"
}
}
```
**Go** (HTTP 422):
```json
{
"Field": "fingerprint",
"Kind": "schema",
"Reason": "missing — required for Phase 25 validity window"
}
```
</details>
<details><summary>DIFF — `playbook_wrong_prefix`</summary>
**Rust** (HTTP 422):
```json
{
"Schema": {
"field": "operation",
"reason": "expected `fill: ...` prefix, got \"sms_draft: hello\""
}
}
```
**Go** (HTTP 422):
```json
{
"Field": "operation",
"Kind": "schema",
"Reason": "expected `fill: ...` prefix, got \"sms_draft: hello\""
}
```
</details>
<details><summary>DIFF — `playbook_empty_endorsed`</summary>
**Rust** (HTTP 422):
```json
{
"Completeness": {
"reason": "endorsed_names must be non-empty"
}
}
```
**Go** (HTTP 422):
```json
{
"Field": "",
"Kind": "completeness",
"Reason": "endorsed_names must be non-empty"
}
```
</details>
<details><summary>DIFF — `playbook_overfull`</summary>
**Rust** (HTTP 422):
```json
{
"Completeness": {
"reason": "endorsed_names (3) exceeds target_count × 2 (2)"
}
}
```
**Go** (HTTP 422):
```json
{
"Field": "",
"Kind": "completeness",
"Reason": "endorsed_names (3) exceeds target_count × 2 (2)"
}
```
</details>
<details><summary>DIFF — `fill_phantom`</summary>
**Rust** (HTTP 422):
```json
{
"Consistency": {
"reason": "fills[0].candidate_id \"W-PHANTOM-NEVER-EXISTS\" does not exist in worker roster"
}
}
```
**Go** (HTTP 422):
```json
{
"Field": "",
"Kind": "consistency",
"Reason": "fills[0].candidate_id \"W-PHANTOM-NEVER-EXISTS\" does not exist in worker roster"
}
```
</details>

View File

@ -115,13 +115,15 @@ if [ "$STATUS" != "422" ]; then
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/playbook_422.json)"
exit 1
fi
KIND="$(jq -r '.Kind' /tmp/playbook_422.json)"
FIELD="$(jq -r '.Field' /tmp/playbook_422.json)"
if [ "$KIND" != "schema" ] || [ "$FIELD" != "fingerprint" ]; then
echo " ✗ expected kind=schema field=fingerprint; got kind=$KIND field=$FIELD"
# Rust serde-tagged-enum shape (parity with crates/validator):
# {"Schema":{"field":"fingerprint","reason":"..."}}
VARIANT="$(jq -r 'keys[0]' /tmp/playbook_422.json)"
FIELD="$(jq -r '.Schema.field' /tmp/playbook_422.json)"
if [ "$VARIANT" != "Schema" ] || [ "$FIELD" != "fingerprint" ]; then
echo " ✗ expected variant=Schema field=fingerprint; got variant=$VARIANT field=$FIELD"
exit 1
fi
echo " ✓ playbook missing fingerprint → 422 schema/fingerprint"
echo " ✓ playbook missing fingerprint → 422 Schema/fingerprint"
# 4. /v1/validate fill with phantom candidate → 422 Consistency
echo "[validatord-smoke] /v1/validate fill with phantom candidate → 422:"
@ -132,12 +134,13 @@ if [ "$STATUS" != "422" ]; then
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/fill_422.json)"
exit 1
fi
KIND="$(jq -r '.Kind' /tmp/fill_422.json)"
if [ "$KIND" != "consistency" ]; then
echo " ✗ expected kind=consistency; got kind=$KIND body=$(cat /tmp/fill_422.json)"
# Rust serde-tagged-enum shape: {"Consistency":{"reason":"..."}}
VARIANT="$(jq -r 'keys[0]' /tmp/fill_422.json)"
if [ "$VARIANT" != "Consistency" ]; then
echo " ✗ expected variant=Consistency; got variant=$VARIANT body=$(cat /tmp/fill_422.json)"
exit 1
fi
echo " ✓ phantom candidate W-PHANTOM → 422 consistency"
echo " ✓ phantom candidate W-PHANTOM → 422 Consistency"
# 5. /v1/validate unknown kind → 400
echo "[validatord-smoke] /v1/validate unknown kind → 400:"