golangLAKEHOUSE

profit/golangLAKEHOUSE

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	0d4f033b34	audit_baselines: round-trip validation against live Rust data Same shape of proof as embed_parity.sh for the embed endpoint: take the just-shipped Go port (ca142b9) and validate it against the actual production data the Rust legacy emits, not just unit- test fixtures. Locks the cross-runtime parity that operators running mixed pipelines depend on. scripts/cutover/audit_baselines_validate.go: - Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl - Parses every entry via the Go AuditBaseline struct - Round-trips the last entry: encode → decode → field-by-field equality check (catches any silently-dropped JSON keys) - Calls LoadLastBaseline against the live file (proves the public API works on real shapes, not just inline parsing) - Computes BuildAuditDriftTable(first → last) — full-window lineage drift over the captured baselines Live-data probe results (reports/cutover/audit_baselines_roundtrip.md): - 7 entries parse without error - Round-trip is byte-equal on every metric + every header field - Drift table fires the expected verdicts: - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold) - p3_accepted/partial/rejected/human 0→non-zero → warn (the zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline was designed to lock — verified now firing on real history) - p4_* metrics +0% → ok (stable across the window) What this does NOT prove (documented in the report): the Go-side audit-FULL pipeline that PRODUCES baselines doesn't exist yet. Only the load/append/drift substrate is ported. Operators running audit-full from Go would still need a metric-collection pass — that's a separate port deliberately not in this wave. reports/cutover/SUMMARY.md gains a new row alongside the embed parity entries; cutover-prep verification log keeps the discipline of "verified against real data, not just fixtures." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:20:18 -05:00
root	5687ec65c2	G5 cutover prep: embed parity probe — Rust /ai/embed ↔ Go /v1/embed verified First concrete cutover artifact: scripts/cutover/embed_parity.sh brings up Go embedd + gateway alongside the live Rust gateway, hits both /ai/embed and /v1/embed with the same forced model, and emits a per-date verdict report under reports/cutover/. Why embed first: the parity invariant is one math identity (cosine sim of vectors against same input). Retrieve has thousands of edge cases. If embed parity holds, all downstream vector consumers inherit confidence; if it doesn't, we catch it in 30s instead of after a flip. Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both Ollamas have it loaded). Math is provably equivalent across the gateway plumbing. Drift catalog (reports/cutover/SUMMARY.md): - URL: Rust /ai/embed vs Go /v1/embed - Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors, dimension} (singular). Wire-format adapter is the only real cutover work for this endpoint. - L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same direction (cos=1.0); harmless under cosine-distance HNSW (which is Go vectord's default), but worth fixing in internal/embed/ before extending to euclidean indexes. reports/cutover/ now tracked (joined the scrum/ + reality-tests/ exemptions in .gitignore). Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the real user-facing retrieve path. Embed parity gives that probe a clean foundation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:07:04 -05:00

Author

SHA1

Message

Date

root

0d4f033b34

audit_baselines: round-trip validation against live Rust data

Same shape of proof as embed_parity.sh for the embed endpoint:
take the just-shipped Go port (ca142b9) and validate it against
the actual production data the Rust legacy emits, not just unit-
test fixtures. Locks the cross-runtime parity that operators
running mixed pipelines depend on.

scripts/cutover/audit_baselines_validate.go:
- Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl
- Parses every entry via the Go AuditBaseline struct
- Round-trips the last entry: encode → decode → field-by-field
  equality check (catches any silently-dropped JSON keys)
- Calls LoadLastBaseline against the live file (proves the public
  API works on real shapes, not just inline parsing)
- Computes BuildAuditDriftTable(first → last) — full-window
  lineage drift over the captured baselines

Live-data probe results (reports/cutover/audit_baselines_roundtrip.md):
- 7 entries parse without error
- Round-trip is byte-equal on every metric + every header field
- Drift table fires the expected verdicts:
  - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold)
  - p3_accepted/partial/rejected/human 0→non-zero → warn (the
    zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline
    was designed to lock — verified now firing on real history)
  - p4_* metrics +0% → ok (stable across the window)

What this does NOT prove (documented in the report): the Go-side
audit-FULL pipeline that PRODUCES baselines doesn't exist yet.
Only the load/append/drift substrate is ported. Operators running
audit-full from Go would still need a metric-collection pass —
that's a separate port deliberately not in this wave.

reports/cutover/SUMMARY.md gains a new row alongside the embed
parity entries; cutover-prep verification log keeps the
discipline of "verified against real data, not just fixtures."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 00:20:18 -05:00

root

5687ec65c2

G5 cutover prep: embed parity probe — Rust /ai/embed ↔ Go /v1/embed verified

First concrete cutover artifact: scripts/cutover/embed_parity.sh
brings up Go embedd + gateway alongside the live Rust gateway,
hits both /ai/embed and /v1/embed with the same forced model, and
emits a per-date verdict report under reports/cutover/.

Why embed first: the parity invariant is one math identity (cosine
sim of vectors against same input). Retrieve has thousands of edge
cases. If embed parity holds, all downstream vector consumers
inherit confidence; if it doesn't, we catch it in 30s instead of
after a flip.

Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced
to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both
Ollamas have it loaded). Math is provably equivalent across the
gateway plumbing.

Drift catalog (reports/cutover/SUMMARY.md):
- URL: Rust /ai/embed vs Go /v1/embed
- Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors,
  dimension} (singular). Wire-format adapter is the only real
  cutover work for this endpoint.
- L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same
  direction (cos=1.0); harmless under cosine-distance HNSW (which
  is Go vectord's default), but worth fixing in internal/embed/
  before extending to euclidean indexes.

reports/cutover/ now tracked (joined the scrum/ + reality-tests/
exemptions in .gitignore).

Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the
real user-facing retrieve path. Embed parity gives that probe a
clean foundation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 20:07:04 -05:00

2 Commits