golangLAKEHOUSE

profit/golangLAKEHOUSE

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	fa4e1b4e16	parity: session_log probe + Rust observability parity recorded Companion to lakehouse commit 57bde63 (Rust gateway gains trace-id propagation + coordinator session JSONL). The cross-runtime parity probe is the regression gate that prevents silent schema drift between the two runtimes. scripts/cutover/parity/session_log_parity.sh: - 4 fixtures (accepted_grounded, max_iter_exhausted, infra_error, unicode_in_prompt) feed identical input to both helpers - jq -e validity gate + non-trivial-equal guard prevents the "both sides fail identically → spurious match" failure mode (caught one IFS='\|\|' bug during initial authoring — recorded in the script comment) - normalize() strips timestamp + daemon (legitimate per-producer differences); everything else must be byte-equal - Result: 4/4 fixtures match, including unicode scripts/cutover/parity/session_log_helper/main.go: - Tiny stdin/stdout Go helper that round-trips a fixture through validator.SessionRecord serde - Counterpart to crates/gateway/src/bin/parity_session_log.rs docs/ARCHITECTURE_COMPARISON.md decisions tracker: - "Rust observability parity" row added (DONE 2026-05-02) - Cross-runtime probe documented as reusable gate STATE_OF_PLAY refreshed. Both observability pieces (trace-id propagation, session JSONL) now exist on both runtimes. Operators who point Rust gateway and Go validatord at the same session-log path get a unified longitudinal stream queryable via DuckDB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:39:49 -05:00
root	b0c8a3f227	parity probes: materializer + extract_json (caught + fixed real bug) Two new cross-runtime parity probes joining the validator probe from the gauntlet wave. Pattern: feed identical input through Rust and Go; diff outputs. Each probe surfaced a different signal. ## Materializer parity probe scripts/cutover/parity/materializer_parity.sh runs Bun + Go materializer against an identical synthetic data/_kb/ root, diffs the resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at). First run: 0/2 match. Real finding: Go's Provenance.LineOffset had `json:"line_offset,omitempty"` which strips the field when value is 0. Line offset 0 is the FIRST ROW of every source file — a real semantic value, not absent. Bun side always emits it. Fix: drop `omitempty` on Provenance.LineOffset. Updated comment explaining why. Re-run: 2/2 match. On-wire JSON parity holds. ## extract_json parity probe scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture strings through both runtimes' extract_json: - fenced ```json``` blocks - unfenced ``` blocks - bare braces with prose around - first-balanced-of-many - nested objects - unicode in string values - escaped quotes - empty object - top-level array (both return first inner object) - no JSON - depth-balanced but invalid syntax - trailing garbage Substrate gate: cargo test -p gateway extract_json PASS before probe. Result: 12/12 match. Algorithms genuinely equivalent. ## scripts/cutover/parity/extract_json_helper/main.go Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints {matched, value} JSON. Counterpart to the Rust parity_extract_json binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit). ## Pattern crystallized Every cross-runtime port should land with a parity probe. Three probes now exist: - validator (5/6 wire-format gap captured 2026-05-02) - materializer (caught + fixed real bug 2026-05-02) - extract_json (12/12 match 2026-05-02) The instrument is reusable — each new shared HTTP/CLI surface gets a probe row added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:43:54 -05:00
root	e8cf113af8	gauntlet 2026-05-02: smoke chain + per-component scrum + parity probe Production-readiness gauntlet exploiting the dual Rust/Go implementation as a measurement instrument. ## Phase 1 — Full smoke chain 21/21 PASS in ~60s. Substrate intact across the full service surface. ## Phase 2 — Per-component scrum (token-volume fix) Prior wave (165KB diff): Kimi 62 tokens out, Qwen 297 → no useful analysis. This wave splits today's commits into 4 focused bundles (36-71KB each): c1 validatord (46KB) → 0 convergent / 11 distinct c2 vectord substrate (36KB) → 0 convergent / 10 distinct c3 materializer (71KB) → 0 convergent / 6 distinct (Opus emitted a BLOCK then self-retracted in same response) c4 replay (45KB) → 0 convergent / 10 distinct Reviewer engagement vs prior wave: Kimi went 62 → ~250 tokens out once bundles dropped below 60KB. scripts/scrum_review.sh hardening: * Diff-size guard (warn >60KB, hard-fail >100KB, SCRUM_FORCE_OVERSIZE=1 override) * Tightened prompt — file path must appear EXACTLY as in diff so post-processor can grep WHERE: lines reliably * Auto-tally step dedupes by (reviewer, location); convergence counts distinct lineages (closes the prior `opus+opus+opus` false-convergence bug) ## Phase 3 — Cross-runtime validator parity probe (the headline finding) scripts/cutover/parity/validator_parity.sh sends 6 identical /v1/validate cases to Rust :3100 AND Go :4110, compares status+body. Result: 6/6 status codes match · 5/6 body shapes diverge. Rust returns serde-tagged enum: {"Schema":{"field":"x","reason":"y"}} Go returns flat exported-fields: {"Kind":"schema","Field":"x","Reason":"y"} Both round-trip inside their own runtime; a caller swapping one for the other would break parsing silently. Captured as new _open_ row in docs/ARCHITECTURE_COMPARISON.md decisions tracker. This is the "use the dual-implementation as a measurement instrument" return — single-repo scrums can't catch this class of cross-runtime drift. ## Phase 4 — Production assessment ship-with-known-gap. Validator wire-format gap is documented, not regressed. ~50 LOC future fix on Go side (custom MarshalJSON on ValidationError to match Rust's serde shape). Persistent stack config (/tmp/lakehouse-persistent.toml) gains validatord on :3221 + persistent-validatord binary so operators bringing up the persistent stack get the new daemon automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:05:18 -05:00

Author

SHA1

Message

Date

root

fa4e1b4e16

parity: session_log probe + Rust observability parity recorded

Companion to lakehouse commit 57bde63 (Rust gateway gains
trace-id propagation + coordinator session JSONL). The
cross-runtime parity probe is the regression gate that prevents
silent schema drift between the two runtimes.

scripts/cutover/parity/session_log_parity.sh:
  - 4 fixtures (accepted_grounded, max_iter_exhausted, infra_error,
    unicode_in_prompt) feed identical input to both helpers
  - jq -e validity gate + non-trivial-equal guard prevents the
    "both sides fail identically → spurious match" failure mode
    (caught one IFS='||' bug during initial authoring — recorded
    in the script comment)
  - normalize() strips timestamp + daemon (legitimate per-producer
    differences); everything else must be byte-equal
  - Result: 4/4 fixtures match, including unicode

scripts/cutover/parity/session_log_helper/main.go:
  - Tiny stdin/stdout Go helper that round-trips a fixture
    through validator.SessionRecord serde
  - Counterpart to crates/gateway/src/bin/parity_session_log.rs

docs/ARCHITECTURE_COMPARISON.md decisions tracker:
  - "Rust observability parity" row added (DONE 2026-05-02)
  - Cross-runtime probe documented as reusable gate

STATE_OF_PLAY refreshed.

Both observability pieces (trace-id propagation, session JSONL)
now exist on both runtimes. Operators who point Rust gateway and
Go validatord at the same session-log path get a unified
longitudinal stream queryable via DuckDB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 05:39:49 -05:00

root

b0c8a3f227

parity probes: materializer + extract_json (caught + fixed real bug)

Two new cross-runtime parity probes joining the validator probe from
the gauntlet wave. Pattern: feed identical input through Rust and Go;
diff outputs. Each probe surfaced a different signal.

## Materializer parity probe
scripts/cutover/parity/materializer_parity.sh runs Bun + Go
materializer against an identical synthetic data/_kb/ root, diffs the
resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at).

**First run: 0/2 match.** Real finding: Go's Provenance.LineOffset
had `json:"line_offset,omitempty"` which strips the field when value
is 0. Line offset 0 is the FIRST ROW of every source file — a real
semantic value, not absent. Bun side always emits it.

Fix: drop `omitempty` on Provenance.LineOffset. Updated comment
explaining why.

**Re-run: 2/2 match.** On-wire JSON parity holds.

## extract_json parity probe
scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture
strings through both runtimes' extract_json:
  - fenced ```json``` blocks
  - unfenced ``` blocks
  - bare braces with prose around
  - first-balanced-of-many
  - nested objects
  - unicode in string values
  - escaped quotes
  - empty object
  - top-level array (both return first inner object)
  - no JSON
  - depth-balanced but invalid syntax
  - trailing garbage

Substrate gate: cargo test -p gateway extract_json PASS before probe.

**Result: 12/12 match.** Algorithms genuinely equivalent.

## scripts/cutover/parity/extract_json_helper/main.go
Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints
{matched, value} JSON. Counterpart to the Rust parity_extract_json
binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit).

## Pattern crystallized
Every cross-runtime port should land with a parity probe. Three
probes now exist:
  - validator (5/6 wire-format gap captured 2026-05-02)
  - materializer (caught + fixed real bug 2026-05-02)
  - extract_json (12/12 match 2026-05-02)

The instrument is reusable — each new shared HTTP/CLI surface gets
a probe row added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 04:43:54 -05:00

root

e8cf113af8

gauntlet 2026-05-02: smoke chain + per-component scrum + parity probe

Production-readiness gauntlet exploiting the dual Rust/Go
implementation as a measurement instrument.

## Phase 1 — Full smoke chain
21/21 PASS in ~60s. Substrate intact across the full service surface.

## Phase 2 — Per-component scrum (token-volume fix)
Prior wave (165KB diff): Kimi 62 tokens out, Qwen 297 → no useful
analysis. This wave splits today's commits into 4 focused bundles
(36-71KB each):
  c1 validatord (46KB) → 0 convergent / 11 distinct
  c2 vectord substrate (36KB) → 0 convergent / 10 distinct
  c3 materializer (71KB) → 0 convergent / 6 distinct (Opus emitted
                           a BLOCK then self-retracted in same response)
  c4 replay (45KB) → 0 convergent / 10 distinct

Reviewer engagement vs prior wave: Kimi went 62 → ~250 tokens out
once bundles dropped below 60KB.

scripts/scrum_review.sh hardening:
  * Diff-size guard (warn >60KB, hard-fail >100KB,
    SCRUM_FORCE_OVERSIZE=1 override)
  * Tightened prompt — file path must appear EXACTLY as in diff
    so post-processor can grep WHERE: lines reliably
  * Auto-tally step dedupes by (reviewer, location); convergence
    counts distinct lineages (closes the prior `opus+opus+opus`
    false-convergence bug)

## Phase 3 — Cross-runtime validator parity probe (the headline finding)
scripts/cutover/parity/validator_parity.sh sends 6 identical
/v1/validate cases to Rust :3100 AND Go :4110, compares status+body.

Result: **6/6 status codes match · 5/6 body shapes diverge.**

Rust returns serde-tagged enum:   {"Schema":{"field":"x","reason":"y"}}
Go returns flat exported-fields:  {"Kind":"schema","Field":"x","Reason":"y"}

Both round-trip inside their own runtime; a caller swapping one for
the other would break parsing silently. Captured as new _open_ row
in docs/ARCHITECTURE_COMPARISON.md decisions tracker.

This is the "use the dual-implementation as a measurement instrument"
return — single-repo scrums can't catch this class of cross-runtime
drift.

## Phase 4 — Production assessment
ship-with-known-gap. Validator wire-format gap is documented, not
regressed. ~50 LOC future fix on Go side (custom MarshalJSON on
ValidationError to match Rust's serde shape).

Persistent stack config (/tmp/lakehouse-persistent.toml) gains
validatord on :3221 + persistent-validatord binary so operators
bringing up the persistent stack get the new daemon automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 04:05:18 -05:00

3 Commits