Compare commits
No commits in common. "5d3996b51d855286760566bd02e48c3a22dacf1e" and "b314ed1c9475b1fc84b5df844e9efb925725e820" have entirely different histories.
5d3996b51d
...
b314ed1c94
@ -3,7 +3,7 @@
|
||||
**Last verified:** 2026-05-02 ~11:30 CDT
|
||||
**Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green, per-component scrum across 4 bundles, **3 cross-runtime parity probes all green post-fix** (validator: **6/6 match** after wire-format alignment shipped; materializer: 2/2 after omitempty fix; extract_json: 12/12). All findings surfaced by the parity probes have been actioned. Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`.
|
||||
|
||||
> **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo) but ASK if ambiguous — the Rust system at `/home/profit/lakehouse/` is also receiving real work as of 2026-05-02 (Lance backend hardening, observability parity, sidecar drop — see `lakehouse/STATE_OF_PLAY.md`). The cross-runtime decisions tracker at `docs/ARCHITECTURE_COMPARISON.md` is the single source of truth for what's open across both. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||
> **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||
|
||||
---
|
||||
|
||||
@ -198,7 +198,7 @@ Verbatim verdicts at `reports/scrum/_evidence/2026-04-30/verdicts/`. Disposition
|
||||
|
||||
### Session frame (don't redo)
|
||||
|
||||
- The Rust system is **NOT maintenance-only** as of 2026-05-02 — it's receiving parity work + infrastructure (Lance gauntlet, sidecar drop, observability parity). Don't propose ports of components already shipped on the Go side, but do expect Rust commits to land in parallel for cross-runtime fabric.
|
||||
- The Rust legacy is **maintenance-only** until Go reaches feature parity. Don't propose ports of components already shipped here.
|
||||
- The matrix indexer **5/5 components** are shipped. Don't propose to "build the matrix indexer" — it's done.
|
||||
- The 5-loop substrate's load-bearing gate is **PASSED**. v3 (`154a72e`) showed 6/6 paraphrase recovery via Shape B. Don't propose to "test if the playbook learns" — it does.
|
||||
- **Shape B is the playbook stance now.** When `use_playbook=true`, both `ApplyPlaybookBoost` (re-rank in place) AND `InjectPlaybookMisses` (insert recorded answers not in regular top-K) run. Don't revert to boost-only; v2 proved that path can't recover paraphrase queries.
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
> comparison.
|
||||
> **Owner**: J. Update this when either side ships a fix that changes
|
||||
> the table values, or when a new architectural axis surfaces.
|
||||
> **Last meaningful refresh**: 2026-05-02 (post-Lance-gauntlet + observability parity)
|
||||
> **Last meaningful refresh**: 2026-05-01 (post-Rust-cache + Go-validator-port)
|
||||
|
||||
This document compares the two parallel implementations of the lakehouse
|
||||
substrate — Rust at `/home/profit/lakehouse/` (production today), Go at
|
||||
@ -58,8 +58,7 @@ Don't:
|
||||
| 2026-05-02 | **Materializer parity probe — caught + fixed real bug** | New `scripts/cutover/parity/materializer_parity.sh` runs Bun + Go materializer on identical synthetic root, diffs output JSONL. Result on first run: **0/2 match** — Go's `Provenance.LineOffset` had `json:",omitempty"` and stripped the field on first-row records (line_offset=0 is semantically meaningful, not absent). 1-line fix (drop `omitempty` + comment explaining why). Re-run: **2/2 match**. Real cross-runtime gap surfaced + closed in same wave. |
|
||||
| 2026-05-02 | **extract_json parity probe — 12/12 match across edge cases** | New `scripts/cutover/parity/extract_json_parity.sh` runs identical model-output strings through Rust `gateway::v1::iterate::extract_json` AND Go `validator.ExtractJSON`. 12 fixtures: fenced/unfenced blocks, nested objects, unicode, escaped quotes, top-level array, malformed JSON. Substrate gate: `cargo test -p gateway extract_json` PASS before probe. Result: **12/12 match.** Algorithms genuinely equivalent. Rust side gained `pub` on `extract_json` + new `bin/parity_extract_json` (~30 LOC). |
|
||||
| 2026-05-02 | **Validator wire-format alignment — DONE** | Custom `MarshalJSON`/`UnmarshalJSON` on Go's `validator.ValidationError` emits the Rust serde-tagged-enum shape `{"Schema":{"field":"x","reason":"y"}}`. UnmarshalJSON also accepts the legacy flat shape (migration safety) and rejects unknown variants (drift guard for future Rust enum additions). 4 new pinning tests in `types_test.go`. Re-run validator parity probe: **6/6 match** (was 1/6). |
|
||||
| 2026-05-02 | **Lance backend gauntlet (4-pack + root-cause fix) — DONE** | Lance crate had zero tests + no smoke when audited this morning. Shipped: (a) `sanitize_lance_err` over all 5 routes (search/doc/index/append/migrate) — missing-index now 404 not 500, no `/home/` or `/root/.cargo/` paths leaked; (b) 7 unit tests in `crates/vectord-lance` with synth Parquet helper; (c) 9-probe `scripts/lance_smoke.sh` against live `:3100`; (d) 10M re-bench (`reports/lance_10m_rebench_2026-05-02.md`) — search warm ~20ms, search cold ~46ms median. Bench surfaced doc-fetch p50 ~100ms (300x slower than ADR-019 100K projection); root-caused to lance-bench bypassing IndexMeta → warming auto-build never fired → no `doc_id` btree. **Fix shipped (commit `5d30b3d`)**: `lance_migrate` HTTP handler now auto-builds the btree inline (1.2s on 10M, +269MB), drops doc-fetch to ~5ms (20x). Live verified 9/9 smoke + post-restart doc-fetch 4-15ms. |
|
||||
| _open_ | Decide Lance vs Parquet+HNSW for primary | Lance verified production-ready at 10M (this morning's gauntlet). HNSW at 10M doesn't fit RAM (~60GB for vectors+graph), so the comparison is between Lance and Parquet+HNSW-with-spilling. Decide once we have a 10M ingest scenario where the Parquet path is bottlenecked. |
|
||||
| _open_ | Decide on Lance vector backend | Defer until corpus exceeds ~5M rows. |
|
||||
| _open_ | Pick Go primary vs Rust primary | Both viable. Go has perf edge after today; Rust has production deploy + producer-side completeness. |
|
||||
|
||||
---
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# /v1/embed cross-runtime parity probe
|
||||
|
||||
**Date:** 2026-05-03T03:13:12Z
|
||||
**Date:** 2026-05-02T11:28:19Z
|
||||
**Rust:** `http://127.0.0.1:3100/ai/embed` · **Go:** `http://127.0.0.1:4110/v1/embed`
|
||||
**Model:** `nomic-embed-text` (forced — overrides each side's default)
|
||||
**Match metric:** cosine similarity ≥ `0.99999`
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# extract_json parity probe — Rust vs Go
|
||||
|
||||
**Date:** 2026-05-03T03:13:10Z
|
||||
**Date:** 2026-05-02T11:24:21Z
|
||||
**Rust helper:** `/home/profit/lakehouse/target/release/parity_extract_json` (links live `gateway::v1::iterate::extract_json`)
|
||||
**Go helper:** `./bin/parity_extract_json_go` (links live `internal/validator.ExtractJSON`)
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# Materializer parity probe — Bun vs Go
|
||||
|
||||
**Date:** 2026-05-03T03:13:11Z
|
||||
**Date:** 2026-05-02T11:24:21Z
|
||||
**Bun:** `/home/profit/lakehouse/scripts/distillation/build_evidence_index.ts`
|
||||
**Go:** `./bin/materializer`
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# session_log parity probe — Rust gateway vs Go validatord
|
||||
|
||||
**Date:** 2026-05-03T03:13:10Z
|
||||
**Date:** 2026-05-02T11:24:21Z
|
||||
**Rust helper:** `/home/profit/lakehouse/target/release/parity_session_log`
|
||||
**Go helper:** `./bin/parity_session_log_go`
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# Validator parity probe — Rust :3100 vs Go :4110
|
||||
|
||||
**Date:** 2026-05-03T03:13:05Z
|
||||
**Date:** 2026-05-02T11:24:15Z
|
||||
**Rust gateway:** `http://127.0.0.1:3100` · **Go gateway:** `http://127.0.0.1:4110`
|
||||
|
||||
Identical `POST /v1/validate` request → both runtimes. Match
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user