validator: align ValidationError JSON to Rust serde shape (6/6 parity)
Closes the 2026-05-02 parity finding: validator_parity probe found
5/6 body shapes diverging because Go emitted {"Kind":"...","Field":"...","Reason":"..."}
while Rust emits the externally-tagged-enum {"Schema":{"field":"...","reason":"..."}}.
A caller parsing the error envelope would break silently in cutover.
## Changes
internal/validator/types.go:
- Custom MarshalJSON emits the Rust shape:
Schema: {"Schema": {"field":"x","reason":"y"}}
Completeness: {"Completeness":{"reason":"y"}}
Consistency: {"Consistency": {"reason":"y"}}
Policy: {"Policy": {"reason":"y"}}
- Custom UnmarshalJSON accepts BOTH the new Rust shape AND the legacy
flat shape (migration safety for any persisted error rows).
- Unknown variants (e.g. a future Rust addition Go hasn't learned)
surface as an Unmarshal error, not a silent default.
internal/validator/types_test.go:
- 4 pinning tests anchor the wire format. Failing them = wire-format
drift; the parity probe is the secondary line of defense.
scripts/validatord_smoke.sh:
- Updated probes to read the new variant-name shape (jq keys[0],
.Schema.field) instead of legacy .Kind/.Field.
## Verification
- internal/validator unit tests: PASS (4 new + all existing).
- cmd/validatord HTTP tests: PASS (UnmarshalJSON falls through to flat
shape so existing tests reading ValidationError still work).
- validatord_smoke.sh: 5/5 PASS through gateway :3110.
- validator parity probe re-run: **6/6 match** (was 1/6).
## Pattern
Per architecture_comparison's "use the dual-implementation as a
measurement instrument" thesis: a parity probe surfaced this gap;
50 LOC of MarshalJSON closed it; 4 pinning tests prevent regression;
the probe is the longitudinal gate. Cutover-friendly direction (Go
matches Rust) chosen because Rust is the existing production
contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
b0c8a3f227
commit
7d6636b33e
@ -1,7 +1,7 @@
|
||||
# STATE OF PLAY — Lakehouse-Go
|
||||
|
||||
**Last verified:** 2026-05-02 ~05:30 CDT
|
||||
**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green in ~60s, per-component scrum across 4 bundles, **3 cross-runtime parity probes** (validator: 6/6 status match + 5/6 body-shape gap captured; materializer: caught a real omitempty bug, 2/2 match post-fix; extract_json: 12/12 match including unicode + escaped quotes). Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`.
|
||||
**Last verified:** 2026-05-02 ~05:50 CDT
|
||||
**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green, per-component scrum across 4 bundles, **3 cross-runtime parity probes all green post-fix** (validator: **6/6 match** after wire-format alignment shipped; materializer: 2/2 after omitempty fix; extract_json: 12/12). All findings surfaced by the parity probes have been actioned. Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`.
|
||||
|
||||
> **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||
|
||||
|
||||
@ -55,7 +55,7 @@ Don't:
|
||||
| 2026-05-02 | **Cross-runtime validator parity probe — surfaced wire-format gap** | New `scripts/cutover/parity/validator_parity.sh` runs 6 identical /v1/validate cases against Rust :3100 AND Go :4110, compares status + body. Result: **6/6 status codes match (logic-level equivalence holds), 5/6 body shapes diverge.** Rust returns serde-tagged enum `{"Schema":{"field":"x","reason":"y"}}`; Go returns flat struct `{"Kind":"schema","Field":"x","Reason":"y"}`. Any caller parsing the error envelope would break in cutover. **Open**: pick a target shape (Go matching Rust is the cutover-friendly direction) and align via custom `MarshalJSON` on `ValidationError`. |
|
||||
| 2026-05-02 | **Materializer parity probe — caught + fixed real bug** | New `scripts/cutover/parity/materializer_parity.sh` runs Bun + Go materializer on identical synthetic root, diffs output JSONL. Result on first run: **0/2 match** — Go's `Provenance.LineOffset` had `json:",omitempty"` and stripped the field on first-row records (line_offset=0 is semantically meaningful, not absent). 1-line fix (drop `omitempty` + comment explaining why). Re-run: **2/2 match**. Real cross-runtime gap surfaced + closed in same wave. |
|
||||
| 2026-05-02 | **extract_json parity probe — 12/12 match across edge cases** | New `scripts/cutover/parity/extract_json_parity.sh` runs identical model-output strings through Rust `gateway::v1::iterate::extract_json` AND Go `validator.ExtractJSON`. 12 fixtures: fenced/unfenced blocks, nested objects, unicode, escaped quotes, top-level array, malformed JSON. Substrate gate: `cargo test -p gateway extract_json` PASS before probe. Result: **12/12 match.** Algorithms genuinely equivalent. Rust side gained `pub` on `extract_json` + new `bin/parity_extract_json` (~30 LOC). |
|
||||
| _open_ | **Validator wire-format alignment** | Surfaced by 2026-05-02 parity probe. Choose canonical error JSON shape, align both runtimes. ~50 LOC custom `MarshalJSON` either side. |
|
||||
| 2026-05-02 | **Validator wire-format alignment — DONE** | Custom `MarshalJSON`/`UnmarshalJSON` on Go's `validator.ValidationError` emits the Rust serde-tagged-enum shape `{"Schema":{"field":"x","reason":"y"}}`. UnmarshalJSON also accepts the legacy flat shape (migration safety) and rejects unknown variants (drift guard for future Rust enum additions). 4 new pinning tests in `types_test.go`. Re-run validator parity probe: **6/6 match** (was 1/6). |
|
||||
| _open_ | Decide on Lance vector backend | Defer until corpus exceeds ~5M rows. |
|
||||
| _open_ | Pick Go primary vs Rust primary | Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness. |
|
||||
|
||||
|
||||
@ -19,7 +19,11 @@
|
||||
|
||||
package validator
|
||||
|
||||
import "time"
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Artifact is the discriminated union of input shapes a Validator
|
||||
// can receive. Mirrors Rust's `enum Artifact`. The first non-zero
|
||||
@ -107,6 +111,94 @@ func (e *ValidationError) Error() string {
|
||||
return string(e.Kind) + ": " + e.Reason
|
||||
}
|
||||
|
||||
// MarshalJSON emits the Rust-compatible serde-tagged-enum shape:
|
||||
//
|
||||
// {"Schema": {"field": "...", "reason": "..."}}
|
||||
// {"Completeness": {"reason": "..."}}
|
||||
// {"Consistency": {"reason": "..."}}
|
||||
// {"Policy": {"reason": "..."}}
|
||||
//
|
||||
// Aligns with `crates/validator/src/lib.rs::ValidationError` so callers
|
||||
// that parse the error envelope work across runtimes. Surfaced by the
|
||||
// 2026-05-02 cross-runtime parity probe (validator_parity.sh) which
|
||||
// found 5/6 body shapes diverging on identical input.
|
||||
func (e ValidationError) MarshalJSON() ([]byte, error) {
|
||||
variant := variantName(e.Kind)
|
||||
inner := map[string]string{}
|
||||
if e.Kind == ErrSchema {
|
||||
inner["field"] = e.Field
|
||||
}
|
||||
inner["reason"] = e.Reason
|
||||
return json.Marshal(map[string]any{variant: inner})
|
||||
}
|
||||
|
||||
// UnmarshalJSON accepts BOTH the Rust serde-tagged shape AND the
|
||||
// legacy flat-struct shape Go used to emit, so existing on-disk JSONL
|
||||
// rows still parse. New writes always use the Rust shape (per
|
||||
// MarshalJSON above).
|
||||
func (e *ValidationError) UnmarshalJSON(data []byte) error {
|
||||
// Try the Rust serde shape first: {"<Variant>":{...}}
|
||||
var tagged map[string]map[string]string
|
||||
if err := json.Unmarshal(data, &tagged); err == nil && len(tagged) == 1 {
|
||||
for variant, inner := range tagged {
|
||||
kind, ok := kindFromVariant(variant)
|
||||
if !ok {
|
||||
return fmt.Errorf("unknown ValidationError variant %q", variant)
|
||||
}
|
||||
e.Kind = kind
|
||||
e.Field = inner["field"]
|
||||
e.Reason = inner["reason"]
|
||||
return nil
|
||||
}
|
||||
}
|
||||
// Fall back to the legacy flat shape: {"Kind":"...","Field":"...","Reason":"..."}
|
||||
var flat struct {
|
||||
Kind ValidationErrorKind `json:"Kind"`
|
||||
Field string `json:"Field"`
|
||||
Reason string `json:"Reason"`
|
||||
}
|
||||
if err := json.Unmarshal(data, &flat); err != nil {
|
||||
return fmt.Errorf("ValidationError: neither serde nor flat shape: %w", err)
|
||||
}
|
||||
if flat.Kind == "" {
|
||||
return fmt.Errorf("ValidationError: missing kind")
|
||||
}
|
||||
e.Kind = flat.Kind
|
||||
e.Field = flat.Field
|
||||
e.Reason = flat.Reason
|
||||
return nil
|
||||
}
|
||||
|
||||
// variantName maps the lowercase Go Kind to the CamelCase Rust serde
|
||||
// variant name. Kept in sync with `crates/validator/src/lib.rs::ValidationError`.
|
||||
func variantName(k ValidationErrorKind) string {
|
||||
switch k {
|
||||
case ErrSchema:
|
||||
return "Schema"
|
||||
case ErrCompleteness:
|
||||
return "Completeness"
|
||||
case ErrConsistency:
|
||||
return "Consistency"
|
||||
case ErrPolicy:
|
||||
return "Policy"
|
||||
}
|
||||
return string(k) // fallback — preserves Kind for debugging
|
||||
}
|
||||
|
||||
func kindFromVariant(v string) (ValidationErrorKind, bool) {
|
||||
switch v {
|
||||
case "Schema":
|
||||
return ErrSchema, true
|
||||
case "Completeness":
|
||||
return ErrCompleteness, true
|
||||
case "Consistency":
|
||||
return ErrConsistency, true
|
||||
case "Policy":
|
||||
return ErrPolicy, true
|
||||
}
|
||||
return "", false
|
||||
}
|
||||
|
||||
// Validator is the interface every validator implements.
|
||||
// Stateless — construction takes any deps (e.g. WorkerLookup)
|
||||
// upfront, validate() is pure on its inputs.
|
||||
|
||||
97
internal/validator/types_test.go
Normal file
97
internal/validator/types_test.go
Normal file
@ -0,0 +1,97 @@
|
||||
package validator
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestValidationError_MarshalsRustSerdeShape pins the on-wire JSON
|
||||
// envelope for cross-runtime parity with crates/validator's
|
||||
// `enum ValidationError`. Surfaced by the 2026-05-02 validator parity
|
||||
// probe (5/6 body-shape divergence pre-fix → 6/6 match post-fix).
|
||||
//
|
||||
// Failing this test = wire-format drift. Run the parity probe at
|
||||
// scripts/cutover/parity/validator_parity.sh to confirm Rust is
|
||||
// the source-of-truth shape this test is anchored to.
|
||||
func TestValidationError_MarshalsRustSerdeShape(t *testing.T) {
|
||||
cases := []struct {
|
||||
name string
|
||||
err ValidationError
|
||||
want string
|
||||
}{
|
||||
{
|
||||
name: "schema includes field+reason inside variant body",
|
||||
err: ValidationError{Kind: ErrSchema, Field: "fingerprint", Reason: "missing"},
|
||||
want: `{"Schema":{"field":"fingerprint","reason":"missing"}}`,
|
||||
},
|
||||
{
|
||||
name: "completeness only carries reason (no field key)",
|
||||
err: ValidationError{Kind: ErrCompleteness, Reason: "endorsed_names must be non-empty"},
|
||||
want: `{"Completeness":{"reason":"endorsed_names must be non-empty"}}`,
|
||||
},
|
||||
{
|
||||
name: "consistency only carries reason",
|
||||
err: ValidationError{Kind: ErrConsistency, Reason: "candidate_id W-1 not in roster"},
|
||||
want: `{"Consistency":{"reason":"candidate_id W-1 not in roster"}}`,
|
||||
},
|
||||
{
|
||||
name: "policy only carries reason",
|
||||
err: ValidationError{Kind: ErrPolicy, Reason: "client C-1 is on candidate's blacklist"},
|
||||
want: `{"Policy":{"reason":"client C-1 is on candidate's blacklist"}}`,
|
||||
},
|
||||
}
|
||||
for _, c := range cases {
|
||||
t.Run(c.name, func(t *testing.T) {
|
||||
got, err := json.Marshal(c.err)
|
||||
if err != nil {
|
||||
t.Fatalf("marshal: %v", err)
|
||||
}
|
||||
if string(got) != c.want {
|
||||
t.Errorf("\n got: %s\nwant: %s", got, c.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestValidationError_RoundTripsRustShape confirms a marshal+unmarshal
|
||||
// returns the same fields. Important: a Go side reading a Rust-written
|
||||
// ValidationError JSONL must populate Kind/Field/Reason correctly.
|
||||
func TestValidationError_RoundTripsRustShape(t *testing.T) {
|
||||
original := ValidationError{Kind: ErrSchema, Field: "operation", Reason: "expected fill: prefix"}
|
||||
body, err := json.Marshal(original)
|
||||
if err != nil {
|
||||
t.Fatalf("marshal: %v", err)
|
||||
}
|
||||
var decoded ValidationError
|
||||
if err := json.Unmarshal(body, &decoded); err != nil {
|
||||
t.Fatalf("unmarshal: %v", err)
|
||||
}
|
||||
if decoded.Kind != original.Kind || decoded.Field != original.Field || decoded.Reason != original.Reason {
|
||||
t.Errorf("round-trip drift:\n in: %+v\n out: %+v", original, decoded)
|
||||
}
|
||||
}
|
||||
|
||||
// TestValidationError_ParsesLegacyFlatShape ensures consumers reading
|
||||
// older Go-emitted JSONL still parse. This is the migration path for
|
||||
// any persisted error rows that pre-date the 2026-05-02 alignment.
|
||||
func TestValidationError_ParsesLegacyFlatShape(t *testing.T) {
|
||||
legacy := []byte(`{"Kind":"schema","Field":"x","Reason":"y"}`)
|
||||
var ve ValidationError
|
||||
if err := json.Unmarshal(legacy, &ve); err != nil {
|
||||
t.Fatalf("legacy shape should parse: %v", err)
|
||||
}
|
||||
if ve.Kind != ErrSchema || ve.Field != "x" || ve.Reason != "y" {
|
||||
t.Errorf("legacy parse wrong: %+v", ve)
|
||||
}
|
||||
}
|
||||
|
||||
// TestValidationError_RejectsUnknownVariant catches drift if Rust adds
|
||||
// a new variant the Go side hasn't learned about yet — surfaces as a
|
||||
// parse error rather than a silent default.
|
||||
func TestValidationError_RejectsUnknownVariant(t *testing.T) {
|
||||
body := []byte(`{"NewVariant":{"reason":"something"}}`)
|
||||
var ve ValidationError
|
||||
if err := json.Unmarshal(body, &ve); err == nil {
|
||||
t.Errorf("expected error on unknown variant, got %+v", ve)
|
||||
}
|
||||
}
|
||||
@ -1,6 +1,6 @@
|
||||
# Validator parity probe — Rust :3100 vs Go :4110
|
||||
|
||||
**Date:** 2026-05-02T08:59:17Z
|
||||
**Date:** 2026-05-02T09:47:49Z
|
||||
**Rust gateway:** `http://127.0.0.1:3100` · **Go gateway:** `http://127.0.0.1:4110`
|
||||
|
||||
Identical `POST /v1/validate` request → both runtimes. Match
|
||||
@ -9,124 +9,11 @@ Identical `POST /v1/validate` request → both runtimes. Match
|
||||
| Case | Rust status | Go status | Status match | Body match |
|
||||
|---|---:|---:|:---:|:---:|
|
||||
| playbook_happy | 200 | 200 | ✓ | ✓ |
|
||||
| playbook_missing_fingerprint | 422 | 422 | ✓ | ✗ |
|
||||
| playbook_wrong_prefix | 422 | 422 | ✓ | ✗ |
|
||||
| playbook_empty_endorsed | 422 | 422 | ✓ | ✗ |
|
||||
| playbook_overfull | 422 | 422 | ✓ | ✗ |
|
||||
| fill_phantom | 422 | 422 | ✓ | ✗ |
|
||||
| playbook_missing_fingerprint | 422 | 422 | ✓ | ✓ |
|
||||
| playbook_wrong_prefix | 422 | 422 | ✓ | ✓ |
|
||||
| playbook_empty_endorsed | 422 | 422 | ✓ | ✓ |
|
||||
| playbook_overfull | 422 | 422 | ✓ | ✓ |
|
||||
| fill_phantom | 422 | 422 | ✓ | ✓ |
|
||||
|
||||
**Tally:** 1 match · 5 diff (out of 6 cases)
|
||||
**Tally:** 6 match · 0 diff (out of 6 cases)
|
||||
|
||||
## Divergences
|
||||
|
||||
<details><summary>DIFF — `playbook_missing_fingerprint`</summary>
|
||||
|
||||
**Rust** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Schema": {
|
||||
"field": "fingerprint",
|
||||
"reason": "missing — required for Phase 25 validity window"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Go** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Field": "fingerprint",
|
||||
"Kind": "schema",
|
||||
"Reason": "missing — required for Phase 25 validity window"
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details><summary>DIFF — `playbook_wrong_prefix`</summary>
|
||||
|
||||
**Rust** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Schema": {
|
||||
"field": "operation",
|
||||
"reason": "expected `fill: ...` prefix, got \"sms_draft: hello\""
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Go** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Field": "operation",
|
||||
"Kind": "schema",
|
||||
"Reason": "expected `fill: ...` prefix, got \"sms_draft: hello\""
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details><summary>DIFF — `playbook_empty_endorsed`</summary>
|
||||
|
||||
**Rust** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Completeness": {
|
||||
"reason": "endorsed_names must be non-empty"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Go** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Field": "",
|
||||
"Kind": "completeness",
|
||||
"Reason": "endorsed_names must be non-empty"
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details><summary>DIFF — `playbook_overfull`</summary>
|
||||
|
||||
**Rust** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Completeness": {
|
||||
"reason": "endorsed_names (3) exceeds target_count × 2 (2)"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Go** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Field": "",
|
||||
"Kind": "completeness",
|
||||
"Reason": "endorsed_names (3) exceeds target_count × 2 (2)"
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details><summary>DIFF — `fill_phantom`</summary>
|
||||
|
||||
**Rust** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Consistency": {
|
||||
"reason": "fills[0].candidate_id \"W-PHANTOM-NEVER-EXISTS\" does not exist in worker roster"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Go** (HTTP 422):
|
||||
```json
|
||||
{
|
||||
"Field": "",
|
||||
"Kind": "consistency",
|
||||
"Reason": "fills[0].candidate_id \"W-PHANTOM-NEVER-EXISTS\" does not exist in worker roster"
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
@ -115,13 +115,15 @@ if [ "$STATUS" != "422" ]; then
|
||||
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/playbook_422.json)"
|
||||
exit 1
|
||||
fi
|
||||
KIND="$(jq -r '.Kind' /tmp/playbook_422.json)"
|
||||
FIELD="$(jq -r '.Field' /tmp/playbook_422.json)"
|
||||
if [ "$KIND" != "schema" ] || [ "$FIELD" != "fingerprint" ]; then
|
||||
echo " ✗ expected kind=schema field=fingerprint; got kind=$KIND field=$FIELD"
|
||||
# Rust serde-tagged-enum shape (parity with crates/validator):
|
||||
# {"Schema":{"field":"fingerprint","reason":"..."}}
|
||||
VARIANT="$(jq -r 'keys[0]' /tmp/playbook_422.json)"
|
||||
FIELD="$(jq -r '.Schema.field' /tmp/playbook_422.json)"
|
||||
if [ "$VARIANT" != "Schema" ] || [ "$FIELD" != "fingerprint" ]; then
|
||||
echo " ✗ expected variant=Schema field=fingerprint; got variant=$VARIANT field=$FIELD"
|
||||
exit 1
|
||||
fi
|
||||
echo " ✓ playbook missing fingerprint → 422 schema/fingerprint"
|
||||
echo " ✓ playbook missing fingerprint → 422 Schema/fingerprint"
|
||||
|
||||
# 4. /v1/validate fill with phantom candidate → 422 Consistency
|
||||
echo "[validatord-smoke] /v1/validate fill with phantom candidate → 422:"
|
||||
@ -132,12 +134,13 @@ if [ "$STATUS" != "422" ]; then
|
||||
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/fill_422.json)"
|
||||
exit 1
|
||||
fi
|
||||
KIND="$(jq -r '.Kind' /tmp/fill_422.json)"
|
||||
if [ "$KIND" != "consistency" ]; then
|
||||
echo " ✗ expected kind=consistency; got kind=$KIND body=$(cat /tmp/fill_422.json)"
|
||||
# Rust serde-tagged-enum shape: {"Consistency":{"reason":"..."}}
|
||||
VARIANT="$(jq -r 'keys[0]' /tmp/fill_422.json)"
|
||||
if [ "$VARIANT" != "Consistency" ]; then
|
||||
echo " ✗ expected variant=Consistency; got variant=$VARIANT body=$(cat /tmp/fill_422.json)"
|
||||
exit 1
|
||||
fi
|
||||
echo " ✓ phantom candidate W-PHANTOM → 422 consistency"
|
||||
echo " ✓ phantom candidate W-PHANTOM → 422 Consistency"
|
||||
|
||||
# 5. /v1/validate unknown kind → 400
|
||||
echo "[validatord-smoke] /v1/validate unknown kind → 400:"
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user