17 Commits

Author SHA1 Message Date
root
7d6636b33e validator: align ValidationError JSON to Rust serde shape (6/6 parity)
Closes the 2026-05-02 parity finding: validator_parity probe found
5/6 body shapes diverging because Go emitted {"Kind":"...","Field":"...","Reason":"..."}
while Rust emits the externally-tagged-enum {"Schema":{"field":"...","reason":"..."}}.
A caller parsing the error envelope would break silently in cutover.

## Changes

internal/validator/types.go:
- Custom MarshalJSON emits the Rust shape:
    Schema:       {"Schema":      {"field":"x","reason":"y"}}
    Completeness: {"Completeness":{"reason":"y"}}
    Consistency:  {"Consistency": {"reason":"y"}}
    Policy:       {"Policy":      {"reason":"y"}}
- Custom UnmarshalJSON accepts BOTH the new Rust shape AND the legacy
  flat shape (migration safety for any persisted error rows).
- Unknown variants (e.g. a future Rust addition Go hasn't learned)
  surface as an Unmarshal error, not a silent default.

internal/validator/types_test.go:
- 4 pinning tests anchor the wire format. Failing them = wire-format
  drift; the parity probe is the secondary line of defense.

scripts/validatord_smoke.sh:
- Updated probes to read the new variant-name shape (jq keys[0],
  .Schema.field) instead of legacy .Kind/.Field.

## Verification

- internal/validator unit tests: PASS (4 new + all existing).
- cmd/validatord HTTP tests: PASS (UnmarshalJSON falls through to flat
  shape so existing tests reading ValidationError still work).
- validatord_smoke.sh: 5/5 PASS through gateway :3110.
- validator parity probe re-run: **6/6 match** (was 1/6).

## Pattern

Per architecture_comparison's "use the dual-implementation as a
measurement instrument" thesis: a parity probe surfaced this gap;
50 LOC of MarshalJSON closed it; 4 pinning tests prevent regression;
the probe is the longitudinal gate. Cutover-friendly direction (Go
matches Rust) chosen because Rust is the existing production
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:49:28 -05:00
root
b0c8a3f227 parity probes: materializer + extract_json (caught + fixed real bug)
Two new cross-runtime parity probes joining the validator probe from
the gauntlet wave. Pattern: feed identical input through Rust and Go;
diff outputs. Each probe surfaced a different signal.

## Materializer parity probe
scripts/cutover/parity/materializer_parity.sh runs Bun + Go
materializer against an identical synthetic data/_kb/ root, diffs the
resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at).

**First run: 0/2 match.** Real finding: Go's Provenance.LineOffset
had `json:"line_offset,omitempty"` which strips the field when value
is 0. Line offset 0 is the FIRST ROW of every source file — a real
semantic value, not absent. Bun side always emits it.

Fix: drop `omitempty` on Provenance.LineOffset. Updated comment
explaining why.

**Re-run: 2/2 match.** On-wire JSON parity holds.

## extract_json parity probe
scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture
strings through both runtimes' extract_json:
  - fenced ```json``` blocks
  - unfenced ``` blocks
  - bare braces with prose around
  - first-balanced-of-many
  - nested objects
  - unicode in string values
  - escaped quotes
  - empty object
  - top-level array (both return first inner object)
  - no JSON
  - depth-balanced but invalid syntax
  - trailing garbage

Substrate gate: cargo test -p gateway extract_json PASS before probe.

**Result: 12/12 match.** Algorithms genuinely equivalent.

## scripts/cutover/parity/extract_json_helper/main.go
Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints
{matched, value} JSON. Counterpart to the Rust parity_extract_json
binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit).

## Pattern crystallized
Every cross-runtime port should land with a parity probe. Three
probes now exist:
  - validator (5/6 wire-format gap captured 2026-05-02)
  - materializer (caught + fixed real bug 2026-05-02)
  - extract_json (12/12 match 2026-05-02)

The instrument is reusable — each new shared HTTP/CLI surface gets
a probe row added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:43:54 -05:00
root
e8cf113af8 gauntlet 2026-05-02: smoke chain + per-component scrum + parity probe
Production-readiness gauntlet exploiting the dual Rust/Go
implementation as a measurement instrument.

## Phase 1 — Full smoke chain
21/21 PASS in ~60s. Substrate intact across the full service surface.

## Phase 2 — Per-component scrum (token-volume fix)
Prior wave (165KB diff): Kimi 62 tokens out, Qwen 297 → no useful
analysis. This wave splits today's commits into 4 focused bundles
(36-71KB each):
  c1 validatord (46KB) → 0 convergent / 11 distinct
  c2 vectord substrate (36KB) → 0 convergent / 10 distinct
  c3 materializer (71KB) → 0 convergent / 6 distinct (Opus emitted
                           a BLOCK then self-retracted in same response)
  c4 replay (45KB) → 0 convergent / 10 distinct

Reviewer engagement vs prior wave: Kimi went 62 → ~250 tokens out
once bundles dropped below 60KB.

scripts/scrum_review.sh hardening:
  * Diff-size guard (warn >60KB, hard-fail >100KB,
    SCRUM_FORCE_OVERSIZE=1 override)
  * Tightened prompt — file path must appear EXACTLY as in diff
    so post-processor can grep WHERE: lines reliably
  * Auto-tally step dedupes by (reviewer, location); convergence
    counts distinct lineages (closes the prior `opus+opus+opus`
    false-convergence bug)

## Phase 3 — Cross-runtime validator parity probe (the headline finding)
scripts/cutover/parity/validator_parity.sh sends 6 identical
/v1/validate cases to Rust :3100 AND Go :4110, compares status+body.

Result: **6/6 status codes match · 5/6 body shapes diverge.**

Rust returns serde-tagged enum:   {"Schema":{"field":"x","reason":"y"}}
Go returns flat exported-fields:  {"Kind":"schema","Field":"x","Reason":"y"}

Both round-trip inside their own runtime; a caller swapping one for
the other would break parsing silently. Captured as new _open_ row
in docs/ARCHITECTURE_COMPARISON.md decisions tracker.

This is the "use the dual-implementation as a measurement instrument"
return — single-repo scrums can't catch this class of cross-runtime
drift.

## Phase 4 — Production assessment
ship-with-known-gap. Validator wire-format gap is documented, not
regressed. ~50 LOC future fix on Go side (custom MarshalJSON on
ValidationError to match Rust's serde shape).

Persistent stack config (/tmp/lakehouse-persistent.toml) gains
validatord on :3221 + persistent-validatord binary so operators
bringing up the persistent stack get the new daemon automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:05:18 -05:00
root
89ca72d471 materializer + replay ports + vectord substrate fix verified at scale
Two threads landing together — the doc edits interleave so they ship
in a single commit.

1. **vectord substrate fix verified at original scale** (closes the
   2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211
   scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix).
   Throughput dropped 1,115 → 438/sec because previously-broken
   scenarios now do real HNSW Add work — honest cost of correctness.
   The fix (i.vectors side-store + safeGraphAdd recover wrappers +
   smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the
   footprint that originally surfaced the bug.

2. **Materializer port** — internal/materializer + cmd/materializer +
   scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts
   (12 transforms) + build_evidence_index.ts (idempotency, day-partition,
   receipt). On-wire JSON shape matches TS so Bun and Go runs are
   interchangeable. 14 tests green.

3. **Replay port** — internal/replay + cmd/replay +
   scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts
   (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL
   phase 7 live invocation on the Go side. Both runtimes append to the
   same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green.

Side effect on internal/distillation/types.go: EvidenceRecord gained
prompt_tokens, completion_tokens, and metadata fields to mirror the TS
shape the materializer transforms produce.

STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions
tracker moves the materializer + replay items from _open_ to DONE and
adds the substrate-fix scale verification row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:31:02 -05:00
root
277884b5eb multitier_100k: 335k scenarios @ 1,115/sec against 100k corpus, 4/6 at 0% fail
J asked for a much more sophisticated test using the 100k corpus from
the Rust legacy database. This commit ships:

scripts/cutover/multitier/main.go — 6-scenario harness with weighted
random selection per goroutine. Mixes search, email/SMS/fill
validators (in-process via internal/validator), profile swap with
ExcludeIDs, repeat-cache exercise, and playbook record/replay.

Scenarios + weights (cumulative scenario fractions):
  35% cold_search_email      — search + email outreach + EmailValidator
  15% surge_fill_validate    — search + fill proposal + FillValidator + record
  15% profile_swap           — original search + ExcludeIDs swap + no-overlap check
  15% repeat_cache           — same query × 5 (cache effectiveness)
  10% sms_validate           — SMS draft (≤160 chars, phone for SSN-FP guard)
  10% playbook_record_replay — cold → record → warm w/ use_playbook=true

Test results (5-min sustained, conc=50, 100k workers indexed):
  TOTAL 335,257 scenarios @ 1,115/sec
  cold_search_email     117k @ 0.0% fail · p50 2.2ms · p99 8.6ms
  surge_fill_validate    50k @ 98.8% fail (substrate bug below)
  profile_swap           50k @ 0.0% fail · p50 4.5ms · ExcludeIDs verified
  repeat_cache           50k × 5 = 252k searches @ 0.0% fail · p50 11.7ms
  sms_validate           33k @ 0.0% fail · phone-pattern guard works
  playbook_record_replay 33k @ 96.8% fail (substrate bug below)
  Total successful workflows: ~250k+

Validator integration verified at load:
  150,930 EmailValidator passes across cold_search_email + sms_validate
  35 + 1,061 successful FillValidator + playbook_record (where the bug
    didn't fire)
  zero false positives on the SSN-pattern guard against phone numbers

Resource footprint at 100k:
  vectord 1.23GB RSS (linear with 100k vectors)
  matrixd 26MB, 75% CPU (1-core saturated at conc=50)
  Total across 11 daemons: 1.7GB
  Compare to Rust at 14.9GB — ~10× less even at 100k.

SUBSTRATE BUG SURFACED: coder/hnsw v0.6.1 nil-deref in
layerNode.search at graph.go:95. Triggers on /v1/matrix/playbooks/record
under sustained writes to the small playbook_memory index. Both Add
and Search paths can panic.

Workaround applied (this commit) in internal/vectord/index.go
BatchAdd: recover() guard converts panic to error; daemon stays up
instead of crashing the request handler.

Operator recovery procedure (also documented in the report):
  curl -X DELETE http://localhost:4215/vectors/index/playbook_memory
Next record recreates the index fresh.

Real fix DEFERRED — open in docs/ARCHITECTURE_COMPARISON.md
Decisions tracker. Three options:
  a) upstream patch to coder/hnsw
  b) custom small-index Add path that always rebuilds when len < threshold
  c) alternate store for playbook_memory (Lance? in-memory map?)

Evidence: reports/cutover/multitier_100k.md (full methodology +
results + repro + bug analysis). docs/ARCHITECTURE_COMPARISON.md
Decisions tracker updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:28:50 -05:00
root
3a2823c02f g5 cutover: bigger load test — 5.87M req, 0 errors, 370MB RSS
Larger-scale follow-up to the original load test. Three axis
expansions: corpus 200→5K workers, body variety 6→200 distinct
queries, concurrency sweep 10/50/100/200, plus mixed
embed+search workload.

Concurrency sweep on /v1/matrix/search direct (3 min each):
  conc=10:  486,733 req  · 2,704 RPS · p50 2.19ms · p99 6.7ms
  conc=50:  1,148,543 req · 6,381 RPS · p50 7.08ms · p99 20ms
  conc=100: 1,253,389 req · 6,963 RPS · p50 13.34ms · p99 37ms
  conc=200: 1,460,676 req · 8,114 RPS · p50 23.45ms · p99 56ms

Mixed embed+search at 60 conc each, 90s:
  /v1/embed: 1,127,854 req · 12,531 RPS · p50 3.31ms · p99 14.6ms
  /v1/matrix/search: 392,229 req · 4,358 RPS · p50 12.68ms · p99 33.8ms

TOTAL: 5,869,424 requests across ~13.5 minutes. ZERO errors.

Resource footprint during peak load:
  matrixd  105% CPU, 33MB RSS (bottleneck — pegs 1 core)
  vectord   39% CPU, 82MB RSS
  gateway   44% CPU, 41MB RSS
  embedd    30% CPU, 67MB RSS
  Total RSS across 11 daemons: ~370MB

Compare to Rust gateway under similar load: 14.9GB RSS, 374% CPU.
Go uses ~40x less memory + spreads load across daemons rather
than packing into one mega-process.

Saturation analysis:
- conc 10→50: +135% RPS (linear-ish scaling)
- conc 50→100: +9% RPS (saturation begins)
- conc 100→200: +17% RPS (matrixd 1-core pegged)

Headroom paths if production exceeds current demand:
1. Run multiple matrixd instances behind a load balancer.
   Substrate is stateless (recordings via storaged), horizontal
   scale is straightforward.
2. Profile matrixd's per-request work (role-gate + judge-eligibility
   + result merge).
3. Skip Bun for hot endpoints (direct nginx → Go = 5.7x previously
   measured).

Evidence: reports/cutover/g5_load_test_big.md (full tables +
methodology + repro script).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 05:18:00 -05:00
root
2a974d6dea docs: ARCHITECTURE_COMPARISON.md as living source file
Per J's request: move the parallel-runtime comparison from
reports/cutover/ (where it lived as cutover-prep evidence) into
docs/ as the source-of-truth file. J will keep updating it as
fixes ship on either side.

Restructured for living-document use:
- Status header (last refresh date, owner, update triggers)
- 'How to update this doc' section with explicit dos and don'ts
- Decisions tracker at top — actioned items with commit refs
  + open backlog with LOC estimates
- Each comparison section now has 'Last verified' columns where
  numbers are time-sensitive
- Change log section at bottom for one-line entries on every
  meaningful refresh

The original at reports/cutover/architecture_comparison.md gains
a 'THIS IS A SNAPSHOT' header pointing at the docs/ source. Kept
as historical record but no longer the place to update.

Sister pointer file in /home/profit/lakehouse/docs/ARCHITECTURE_COMPARISON.md
so the doc is reachable from either repo side. That file explicitly
says the source lives in golangLAKEHOUSE and warns against
authoritative content in the pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:56:20 -05:00
root
b3ad14832d architecture_comparison: Rust vs Go lakehouse — weaknesses, strengths, abstracts to address
J asked for the comparison before locking in primary line. This
report documents what's actually structurally different vs
implementation-level different, and what to do about each.

Key findings:

1. Python sidecar is the single biggest architectural lever
   - Rust: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama
   - Go:   gateway → HTTP → embedd → HTTP → Ollama (no Python)
   - Sidecar adds zero compute over Ollama (just pydantic + httpx)
   - 63× perf gap (8,119 vs 128 RPS) driven by sidecar + cache absence

2. Process model: Rust 1 mega-binary (14.9G RSS), Go 11 daemons
   - Rust: simpler ops at small scale, panic blast radius = whole system
   - Go: per-daemon scale + crash isolation, more config surface

3. Code volume: Go 15,128 lines vs Rust 35,447 + 1,237 sidecar
   - Go is 43% the size doing similar work
   - Gap concentrated in vectord (Rust 11k lines, Go 804 — Lance + benchmarking)

4. Distillation pipeline asymmetry
   - Audit/observation: BOTH sides parallel-mature
   - Production: Rust-only (materializer + replay + RAG/pref export)
   - Go can READ everything but can't PRODUCE evidence

5. Production validators (FillValidator/EmailValidator/'/v1/validate')
   - Rust has them (1,286 lines, 12 tests each)
   - Go doesn't — matrix gate covers role bleed but not structural validation

Cross-cutting abstracts to address regardless of which wins:
- Drop Python sidecar from Rust (call Ollama directly)
- Add LRU embed cache to Rust aibridge
- Port materializer + replay + validators to Go
- Pin shared JSONL schemas as canonical (both runtimes consume same spec)
- Decide on Lance backend (defer until corpus > 5M rows)

If keeping Go primary: port materializer first, validators second,
skip Lance. If keeping Rust primary: drop Python + add cache,
port chatd 5-provider dispatcher + cross-role gate from Go.

Bottom line: substrate is parallel-mature on observation; producer
side is Rust-only; performance structurally favors Go ~60× on warm
workloads; operations favors Go on isolation; production deployment
favors Rust today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:34:24 -05:00
root
c164a3da96 g5 cutover: production load test — 0 errors / 101k req · Go direct = 2,772 RPS
Sustained-traffic load test against the cutover slice. Three runs,
zero correctness errors across 101,770 total requests. Substrate
holds up under concurrent load — matrix gate, vectord HNSW,
embedd cache, gateway proxy all hold. This was the load test's
primary question; latency numbers are secondary.

scripts/cutover/loadgen — focused Go load generator. 6-query
rotating body mix (Forklift/CNC/Warehouse/Picker/Loader/Shipping).
Configurable URL/concurrency/duration. Reports per-status-code
counts + p50/p95/p99 latencies + JSON summary on stderr.

Three runs:

  baseline (Bun → Go, conc=1, 10s):
    4,085 req · 408 RPS · p50 1.3ms · p99 32ms · max 215ms

  sustained (Bun → Go, conc=10, 30s):
    14,527 req · 484 RPS · p50 4.6ms · p99 92ms · max 372ms

  direct (→ Go, conc=10, 30s):
    83,158 req · 2,772 RPS · p50 2.5ms · p99 8.5ms · max 16ms

Critical findings:

1. ZERO correctness errors across 101k requests. No 5xx, no
   transport errors, no panics. Concurrency-safety verified across
   matrix gate / vectord / gateway / embedd cache.

2. Direct-to-Go is production-grade. 2,772 RPS at p99 8.5ms on a
   single host, no scaling cliff at concurrency=10.

3. Bun frontend is the bottleneck. -82% RPS, +982% p99 vs direct.
   Single-process JS event loop queueing under concurrent
   requests — known Bun proxy-mode characteristic. The substrate
   itself isn't the limiter.

4. For staffing-domain demand levels (<1 RPS typical per
   coordinator), Bun-fronted 484 RPS has 480× headroom. No
   urgency to optimize Bun out of the data path. If/when
   concurrent demand grows orders of magnitude, the path is
   nginx → Go direct for hot endpoints, skip Bun.

Substrate is now load-tested and verified production-ready.

What this load test does NOT cover (documented in
g5_load_test.md): cold-cache embed, larger corpus, mixed
read/write, multi-host, full 5-loop traffic with judge gate
calls. Each is its own probe shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:20:41 -05:00
root
6507dff26d g5 cutover: first 5-loop end-to-end through Bun frontend
Companion to c522ace (cutover slice live). That commit proved
infrastructure (Bun /_go/* → Go gateway). This commit proves the
SUBSTRATE'S CORE LEARNING BEHAVIOR through the same path.

Two tests against persistent Go stack on :4110 with the 200-worker
corpus, all traffic via Bun frontend on :3700:

TEST 1: same-role boost fires with exact math
  Q1: Need 3 Forklift Operators in Aurora IL for Parallel Machining
  query_role: "Forklift Operator"

  cold (use_playbook=false):
    rank=0 id=w-43  dist=0.4449  Brian Ramirez, Springfield IL

  POST /_go/v1/matrix/playbooks/record:
    query_text=Q1, role=Forklift Operator, answer_id=w-43, score=1.0
    → playbook_id=pb-1126c52bd106df6b

  warm (use_playbook=true):
    rank=0 id=w-43  dist=0.2224  ← halved
    boosted=1, injected=0

  Math check: BoostFactor = 1 - 0.5*score = 0.5 (for score=1.0).
  Expected warm_dist = 0.4449 * 0.5 = 0.22245.
  Observed: 0.2224. 4-decimal exact through 3 HTTP hops.

TEST 2: cross-role gate prevents bleed
  Q2: Need 1 CNC Operator in Detroit MI for Beacon Freight
  query_role: "CNC Operator"
  use_playbook: true (Forklift recording from Test 1 in playbook corpus)

  result:
    rank=0 id=w-175  Kevin Ruiz       (Machine Operator, Detroit MI)
    rank=2 id=w-102  Laura Long       (Forklift Operator, Cleveland OH)
    boosted=0, injected=0  ← role gate fired correctly

  w-102 (Forklift Operator) appears at rank 2 organically via
  cosine retrieval — but boosted=0 confirms the Forklift PLAYBOOK
  did NOT influence this query. Surgical: gate suppresses
  playbook-driven boosts from cross-role recordings, leaves
  organic retrieval untouched.

What this confirms about the substrate:
1. Learning works — single recording → measurable, math-exact boost
2. Bleed protection works — role gate (real_001 fix) holds through
   cutover slice
3. Math holds across HTTP hops — Bun → gateway → matrixd → vectord
   with no drift
4. Substrate works through real production-shape framing — CORS,
   content-type, body forwarding, all transparent

The substrate's reason-for-being (5-loop learning) is now
demonstrably executing on persistent daemons under
production-shape frontend traffic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:14:21 -05:00
root
c522acec8b g5 cutover slice live — first real Bun-frontend traffic to Go substrate
J said "let's go" → "next" (option 3): actual flip via Bun
mcp-server. Done. Real Bun-frontend traffic now reaches the Go
substrate via /_go/* on Bun :3700, routed to the persistent Go
gateway at :4110.

Companion change in /home/profit/lakehouse (Rust legacy):
  mcp-server/index.ts: new /_go/* pass-through, opt-in via
  GO_LAKEHOUSE_URL env var. Off-by-default (returns 503 on
  /_go/* with rationale). Existing /api/* (Rust gateway) path
  unchanged. Committed locally on the demo/post-pr11 branch.

System config:
  /etc/systemd/system/lakehouse-agent.service.d/go-cutover.conf
  adds Environment=GO_LAKEHOUSE_URL=http://127.0.0.1:4110 to
  the systemd-managed Bun service. Reversible via systemctl
  revert lakehouse-agent.

Live verification (operator curl through Bun frontend):
- /_go/health: gateway responds {"status":"ok","service":"gateway"}
- /_go/v1/embed: nomic-embed-text-v2-moe vectors, dim=768
- /_go/v1/matrix/search vs persistent 200-worker corpus:
    rank=0 id=w-43  Brian Ramirez   (Forklift Operator, Springfield IL)
    rank=1 id=w-102 Laura Long      (Forklift Operator, Cleveland   OH)
    rank=2 id=w-101 Terrence Gray   (Forklift Operator, Champaign   IL)
    3/3 role match, top-1 in IL exactly
- /api/health: lakehouse ok (Rust path unchanged — control verified)

What this is NOT:
- Not an nginx flip — devop.live/lakehouse/* still goes through
  /api/* → Rust :3100. /_go/* is parallel slice for opt-in.
- Not a tool-level cutover — each /_go/<path> is a manual choice;
  no automatic mapping of Rust paths to Go equivalents.
- Not a transformation layer — caller sends Go-shaped requests
  (e.g. /_go/v1/embed expects {texts, model}, not {text}).

Three cutover unit properties verified:
- ADDITIVE: zero modification to any existing Bun tool
- REVERSIBLE: unset GO_LAKEHOUSE_URL → /_go/* → 503
- ISOLATED: Rust gateway state unaffected (different port,
  different binary, different MinIO bucket)

This is the cutover slice operators can use to validate Go-side
handlers under realistic frontend conditions before any
production-traffic flip. Next step (deferred): pick a specific
mcp-server tool to optionally route through Go with response-
shape adapter — that's a product-visible flip rather than this
infrastructure-visible slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 03:45:41 -05:00
root
77a3dcf266 cutover: first end-to-end coordinator query against persistent Go stack
Three real-shape demand queries against the long-running 11-daemon
stack with 500 workers ingested from workers_500k.parquet (real
production data). Substrate is producing useful answers:

Q1 (Forklift @ Aurora IL): 5/5 role match, top 3 in IL, dist 0.44-0.46
Q2 (CNC @ Detroit MI):     top-1 in Detroit MI exactly, role pulls
                           Machine Operator (semantic neighbor)
Q3 (Warehouse @ Indianapolis IN): top-1 in Indianapolis IN, 5/5 role
                                  match, dist 0.42-0.54

This is the FIRST end-to-end coordinator-shape query against the
persistent Go stack — every prior reality test (real_001..real_005)
ran through harness-transient stacks that died on exit. This one
ran against daemons that have been up for minutes and stayed up
through retrieval.

Geo is load-bearing: top-1 city/state matched in 3/3 queries.
Embedder treats geography as a primary feature.

Q2's CNC→Machine Operator gap exposes the playbook learning loop's
purpose: judge would rate this ~3/5; the first time a coordinator
approves a Machine Operator for a CNC Operator query, that
recording starts shifting substrate behavior. That's the loop
we've been building toward — the persistent stack is now the
substrate that loop will run on.

Evidence: reports/cutover/persistent_stack_first_query.md (full
top-K tables + read on each query).

What this does NOT prove:
- Production-volume load (3 queries, 500 workers)
- Concurrent latency
- Full 5-loop substrate (this exercised retrieval only; no
  playbook recordings exist on the persistent stack yet)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 03:10:09 -05:00
root
09904d5222 cutover: persistent Go stack milestone — first long-running deployment + first Go-emitted audit_baselines entry
J's "let's go" instruction: leave OPEN list behind, push the Go
substrate forward into actual deployment shape. This commit marks
the first time the Go side has run as long-running daemons rather
than per-harness transient processes, and the first time the
shared cross-runtime longitudinal log has carried a Go-emitted
entry alongside the Rust ones.

What landed:

scripts/cutover/start_go_stack.sh — the persistent-stack runbook.
Brings up all 11 daemons (storaged → catalogd → ingestd → queryd
→ embedd → vectord → pathwayd → observerd → matrixd → gateway,
plus chatd-if-not-already-up) in dependency order via nohup +
disown. Anchored pkill per feedback_pkill_scope (never bare
"bin/"). Logs land in /tmp/gostack-logs/<bin>.log, one per daemon.

Verified live state:
- All 11 services healthy on :3110 + :3211-:3220
- gateway → embedd proxy returns nomic-embed-text-v2-moe vectors
- chatd reports 5/5 providers loaded
- No port collision with Rust gateway on :3100
- Daemons stay up after exit of the start script (production shape,
  not harness-transient)

audit_baselines.jsonl crosses the runtime boundary:
- 7 Rust-emitted entries (last: ca7375ea 2026-04-27)
- 1 Go-emitted entry (ee2a40c 2026-05-01T07:53:54Z) appended via
  ./bin/audit_full -append-baseline
- Same envelope shape, same metric set, same drift comparator
  semantics — operators running either runtime grow the same log

What this DOES prove:
- Substrate parity at deployment shape (not just unit tests)
- Cross-runtime artifact write-side compatibility (was previously
  proven on read side via audit_baselines roundtrip)
- The deploy machinery works end-to-end for the persistent case

What this does NOT prove (still ahead):
- Real coordinator traffic against the Go stack (no nginx flip yet;
  devop.live/lakehouse/ still serves through Rust)
- Go-side production materializer (Phase 2 is observer-only)
- Replay tool parity (Phase 7 is observer-only)
- The 5-loop product gate against actual humans

reports/cutover/SUMMARY.md now logs three new rows:
- audit-FULL with 12/12 phases ported
- First Go-emitted audit_baselines entry
- Persistent Go stack live

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:55:29 -05:00
root
ee2a40c505 audit-FULL: port phases 1/2/5/7 — only acceptance.ts (TS-only) remains skipped
Closes 4 of the 5 phases the initial audit-FULL port left as
deferred. The pattern: most "deferred" phases didn't actually need
the un-ported Rust pieces — they were observer-mode by design and
just needed to read existing on-disk artifacts.

Phase 1 (schema validators) → ported via exec.Command:
  Invokes `go test ./internal/distillation/...` — the Go equivalent
  of Rust's `bun test auditor/schemas/distillation/`. New
  GoTestModule field on AuditFullOptions controls the package
  pattern; empty disables the invocation (test mode, prevents
  recursion when audit-full is invoked from inside `go test`).

Phase 2 (evidence materialization) → ported as observer:
  Reads data/evidence/ directly and tallies rows + tier-1 source
  hits. Doesn't re-run the materializer (which is Rust-side TS).
  Emits p2_evidence_rows + p2_evidence_skips metrics matching
  Rust shape — drop-in audit_baselines.jsonl entries possible.

Phase 5 (run summary) → ported as observer:
  Reads reports/distillation/{run_id}/summary.json + 5 stage
  receipts. Validates schema_version=1, run_hash sha256, git_commit
  40-char hex, all stage receipts decode as JSON. Full schema
  validation (StageReceipt schema) is intentionally NOT ported —
  it would require porting the TS schemas/distillation/ validators
  in full; basic shape checks catch the load-bearing invariants.

Phase 7 (replay log) → ported as observer:
  Reads data/_kb/replay_runs.jsonl, validates last 50 rows parse
  as JSON. Skips the live-replay invocation that Rust's phase 7
  also does — porting Rust replay.ts is substantial and not in
  scope. The "log shape sanity" check is what audit-full actually
  needs; the live invocation is a separate concern.

Phase 6 (acceptance gate) — STILL SKIPPED:
  Rust acceptance.ts is a TS-only fixture harness with bun-specific
  deps. Porting the fixtures (tests/fixtures/distillation/acceptance/)
  + the 22-invariant runner to Go is an ADR-worth undertaking.
  Documented in the header comment.

Live-data probe (against /home/profit/lakehouse):
  Skips count: 4 → 1 (only phase 6).
  Required checks: 6/6 → 12/12 PASS.
  New metric: p2_evidence_rows=1055, BYTE-EQUAL to the Rust
  pipeline's collect.records_out from the latest summary.json.
  Cross-runtime parity now extends across phases 0/1/2/3/4/5/7.

6 new tests:
- TestPhase2_EvidenceTallyFromOnDisk: row + tier-1-hit tallying
- TestPhase5_FullSummaryFlow: complete run-summary fixture passes
- TestPhase5_ShortRunHashCaught: bad run_hash fails required check
- TestPhase7_ReplayLogReadsFromDisk: row-count reporting
- TestPhase7_MalformedTailRowsCaught: structural parse failure
- TestRunAuditFull_FullFixtureFlow updated to seed evidence/ +
  reports/distillation/ for the phases now wired.

Cleanup: removed local sortStrings helper (replaced with sort.Strings
now that `sort` is imported for phase 5's mtime-sort).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:35:13 -05:00
root
55b8c76a8c distillation: audit-FULL pipeline port (phases 0/3/4) — cross-runtime metric parity verified
Ports the metric-collection passes from scripts/distillation/audit_full.ts.
The substrate that PRODUCES audit_baselines.jsonl entries — the
half OPEN #2 left as "deferred to next wave" after the read/write
substrate landed in ca142b9.

Phase coverage:
  Phase 0 (file presence)             ported
  Phase 1 (schema validators)         skipped (Go's `go test` covers it)
  Phase 2 (materializer dry-run)      deferred (Go materializer not yet ported)
  Phase 3 (scored-runs distribution)  ported
  Phase 4 (contamination firewall)    ported
  Phase 5 (receipts validation)       deferred (Go run-summary JSON not yet emitted)
  Phase 6 (replay sanity)             deferred (Go replay tool not ported)
  Phase 7 (run summary lineage)       deferred (same)

Cross-runtime parity verified end-to-end:
  Go-side audit-full against /home/profit/lakehouse produced
  metrics IDENTICAL to the last Rust-emitted audit_baselines.jsonl
  entry. All 8 ported metrics match byte-for-byte:
    p3_accepted=386, p3_partial=132, p3_rejected=57, p3_human=480,
    p4_sft_rows=353, p4_rag_rows=448, p4_pref_pairs=83, p4_total_quarantined=1325
  6/6 required checks pass on live data.

Components:
- internal/distillation/audit_full.go: PhaseCheck struct (mirrors
  Rust shape), PhaseCheckReport aggregation, RunAuditFull
  orchestrator, auditPhase0/3/4 implementations, FormatAuditFullReport
  Markdown writer.
- cmd/audit_full/main.go: CLI binary with -root, -out, -json,
  -append-baseline flags. Operators run "./bin/audit_full
  -append-baseline" to grow the longitudinal log alongside the
  Rust pipeline (entries are interchangeable — same envelope shape).
- 6 new tests: empty-root failure handling, full-fixture clean PASS
  (locks all 8 metrics + all 6 required checks), SFT firewall
  contamination detection, preference self-pair detection, sig_hash
  regex correctness (rejects wrong-length + uppercase), Markdown
  formatter smoke.

Live-data probe captured at reports/cutover/audit_full_go_vs_rust.md
(linked from reports/cutover/SUMMARY.md). Same shape as the
audit_baselines round-trip evidence — both Go-side ports of the
distillation surface are now validated against real Rust data, not
just fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:30:23 -05:00
root
0d4f033b34 audit_baselines: round-trip validation against live Rust data
Same shape of proof as embed_parity.sh for the embed endpoint:
take the just-shipped Go port (ca142b9) and validate it against
the actual production data the Rust legacy emits, not just unit-
test fixtures. Locks the cross-runtime parity that operators
running mixed pipelines depend on.

scripts/cutover/audit_baselines_validate.go:
- Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl
- Parses every entry via the Go AuditBaseline struct
- Round-trips the last entry: encode → decode → field-by-field
  equality check (catches any silently-dropped JSON keys)
- Calls LoadLastBaseline against the live file (proves the public
  API works on real shapes, not just inline parsing)
- Computes BuildAuditDriftTable(first → last) — full-window
  lineage drift over the captured baselines

Live-data probe results (reports/cutover/audit_baselines_roundtrip.md):
- 7 entries parse without error
- Round-trip is byte-equal on every metric + every header field
- Drift table fires the expected verdicts:
  - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold)
  - p3_accepted/partial/rejected/human 0→non-zero → warn (the
    zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline
    was designed to lock — verified now firing on real history)
  - p4_* metrics +0% → ok (stable across the window)

What this does NOT prove (documented in the report): the Go-side
audit-FULL pipeline that PRODUCES baselines doesn't exist yet.
Only the load/append/drift substrate is ported. Operators running
audit-full from Go would still need a metric-collection pass —
that's a separate port deliberately not in this wave.

reports/cutover/SUMMARY.md gains a new row alongside the embed
parity entries; cutover-prep verification log keeps the
discipline of "verified against real data, not just fixtures."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:20:18 -05:00
root
5687ec65c2 G5 cutover prep: embed parity probe — Rust /ai/embed ↔ Go /v1/embed verified
First concrete cutover artifact: scripts/cutover/embed_parity.sh
brings up Go embedd + gateway alongside the live Rust gateway,
hits both /ai/embed and /v1/embed with the same forced model, and
emits a per-date verdict report under reports/cutover/.

Why embed first: the parity invariant is one math identity (cosine
sim of vectors against same input). Retrieve has thousands of edge
cases. If embed parity holds, all downstream vector consumers
inherit confidence; if it doesn't, we catch it in 30s instead of
after a flip.

Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced
to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both
Ollamas have it loaded). Math is provably equivalent across the
gateway plumbing.

Drift catalog (reports/cutover/SUMMARY.md):
- URL: Rust /ai/embed vs Go /v1/embed
- Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors,
  dimension} (singular). Wire-format adapter is the only real
  cutover work for this endpoint.
- L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same
  direction (cos=1.0); harmless under cosine-distance HNSW (which
  is Go vectord's default), but worth fixing in internal/embed/
  before extending to euclidean indexes.

reports/cutover/ now tracked (joined the scrum/ + reality-tests/
exemptions in .gitignore).

Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the
real user-facing retrieve path. Embed parity gives that probe a
clean foundation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:07:04 -05:00