diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index 579c4cb..7572777 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -272,6 +272,7 @@ a steady state. Future items will land here as production triggers fire. | (scrum) | 3-lineage scrum on `434f466..0d4f033` (post_role_gate_v1). Convergent finding (Opus + Kimi): `DecodeIndex` lost nil-meta items across persistence. **Fixed** by bumping envelope version 1→2 with explicit `IDs []string` field; v1 envelopes still load via meta-key fallback. Opus-only real bugs also actioned: `handleMerge` non-`ErrIndexNotFound` nil-deref, `mathLog` dead wrapper removed, bubble sort → `sort.Slice`. False positives rejected after verification (Kimi rollback misreading + Opus stale-comment claim). 2 new regression tests lock the v2 round-trip + v1 backward-compat. Disposition: `reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md`. | | (audit-full port) | **Audit-FULL pipeline** (phases 0/3/4) ported from `scripts/distillation/audit_full.ts`. `internal/distillation/audit_full.go` + `cmd/audit_full` CLI. 6 ported required-check classes; 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust pieces (materializer / replay / run-summaries) not yet ported. **Cross-runtime byte-equal verdict on live data**: Go-side audit-full against `/home/profit/lakehouse` produced p3_*/p4_* metrics IDENTICAL to the last Rust-emitted `audit_baselines.jsonl` entry (all 8 metrics match: p3_accepted=386, p3_partial=132, p3_rejected=57, p3_human=480, p4_sft_rows=353, p4_rag_rows=448, p4_pref_pairs=83, p4_total_quarantined=1325). 6 new tests + the live-data probe captured in `reports/cutover/audit_full_go_vs_rust.md`. | | (audit-full skips fixed) | **Phases 1/2/5/7 unskipped** (2026-05-01) — port reduced from 4 deferred phases to 1. **Phase 1**: invokes `go test ./internal/distillation/...` via exec.Command (Go equivalent of Rust's `bun test`). **Phase 2**: reads `data/evidence/` and tallies rows + tier-1 source hits as an observer (doesn't re-run the materializer; emits `p2_evidence_rows`/`p2_evidence_skips` metrics). **Phase 5**: reads `reports/distillation/{run_id}/summary.json` + 5 stage receipts; validates schema_version + run_hash sha256 + git_commit hex. **Phase 7**: reads `data/_kb/replay_runs.jsonl`; tail-row JSON parse check. Only **Phase 6** remains skipped (Rust `acceptance.ts` is a TS-only fixture harness; porting fixtures + invariant runner is its own ADR). Live-data probe: 12/12 required checks PASS, `p2_evidence_rows=1055` byte-equal to Rust `summary.json` `collect.records_out`. 6 new tests. | +| (lets-go) | **Persistent Go stack live** (2026-05-01). All 11 daemons (storaged/catalogd/ingestd/queryd/embedd/vectord/pathwayd/observerd/matrixd/gateway/chatd) up as long-running processes on :3110+:3211-:3220, alongside the live Rust gateway on :3100 (no port conflict). First time the Go side runs as production-shape daemons rather than per-harness transient processes. Brought up via `scripts/cutover/start_go_stack.sh`. Gateway proxies `/v1/embed` correctly to embedd; all 5 chatd providers loaded. **First Go-side entry written to `data/_kb/audit_baselines.jsonl`** (entry #8, git_commit=`ee2a40c`, golangLAKEHOUSE SHA distinguishable from Rust's `ca7375ea`); the longitudinal log now mixes runtimes. | | (close-3) | **OPEN #3: distribution drift via PSI** — `internal/drift/drift.go`: `ComputeDistributionDrift` returns Population Stability Index + verdict tier (stable < 0.10, minor 0.10–0.25, major ≥ 0.25). Equal-width bucketing over combined min/max range, epsilon-clamping for empty buckets, per-bucket breakdown for drilldown. 7 new tests including identical-is-stable, hard-shift-is-major, moderate-detected-not-stable, empty-inputs-safe, all-identical-safe, bucket-counts-conserved, num-buckets-clamping. | | (close-4) | **OPEN #4: ops nice-to-haves** — (a) Real-time wall-clock for stress harness: per-phase elapsed time logged to stdout as it runs (`[stress] phase NAME starting (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`); `Output.PhaseTimings` + `Output.TotalElapsedMs` written to JSON; (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase calibration: not actioned — no fired trigger yet, would be speculative. Documented as deferred-until-need rather than ignored. | diff --git a/reports/cutover/SUMMARY.md b/reports/cutover/SUMMARY.md index f100f03..ec1b505 100644 --- a/reports/cutover/SUMMARY.md +++ b/reports/cutover/SUMMARY.md @@ -10,6 +10,8 @@ what's safe to flip. Append a row when a new endpoint clears parity. | `audit_baselines.jsonl` | 2026-05-01 | `data/_kb/audit_baselines.jsonl` | `internal/distillation` `LoadLastBaseline` / `AppendBaseline` / `BuildAuditDriftTable` | ✅ PASS round-trip | Live Rust file (7 entries) parses + round-trips byte-equal; lineage drift table fires correctly on zero-baseline metrics. See `audit_baselines_roundtrip.md`. | | `audit-FULL` (phases 0/3/4) | 2026-05-01 | `scripts/distillation/audit_full.ts` | `cmd/audit_full` + `internal/distillation` `RunAuditFull` | ✅ PASS metric-equal | Go-side run against live Rust root: all 8 ported metrics (p3_*, p4_*) byte-equal to the last Rust-emitted `audit_baselines.jsonl` entry. 6/6 required checks pass. 4 phases (1, 2, 5, 6, 7) deferred — depend on broader Rust-side pieces (materializer / replay / run-summaries) not yet ported. See `audit_full_go_vs_rust.md`. | | `audit-FULL` (phases 0/1/2/3/4/5/7 — observer mode) | 2026-05-01 | `scripts/distillation/audit_full.ts` | `cmd/audit_full` + `internal/distillation` `RunAuditFull` | ✅ PASS 12/12 | Skips reduced from 4 → 1: phase 1 invokes `go test`, phases 2/5/7 read existing artifacts as observers (no live materializer/replay invocation). Only phase 6 (TS-only acceptance harness) remains skipped. `p2_evidence_rows=1055` matches Rust `summary.json` `collect.records_out=1055` byte-equal. Updated `audit_full_go_vs_rust.md`. | +| `audit_baselines.jsonl` write side | 2026-05-01 | `data/_kb/audit_baselines.jsonl` (Rust-emitted, 7 entries) | Go-emitted entry #8 via `cmd/audit_full -append-baseline` | ✅ Mixed-runtime log | First Go-side entry written to the shared longitudinal log: `git_commit=ee2a40c5...` (golangLAKEHOUSE SHA, distinguishable from prior Rust SHAs like `ca7375ea`). All 10 metric fields match Rust shape exactly — drift comparator fires correctly across the runtime boundary. | +| Full Go stack (persistent) | 2026-05-01 | per-binary on :31xx | 11 daemons (storaged/catalogd/ingestd/queryd/embedd/vectord/pathwayd/observerd/matrixd/gateway/chatd) | ✅ All 11 healthy | First time the Go stack runs as long-running daemons rather than per-harness transient processes. Brought up via `scripts/cutover/start_go_stack.sh`; gateway proxies `/v1/embed` correctly through to embedd; all 5 chatd providers loaded. Live alongside the Rust gateway on :3100 (no port conflict). | ## Wire-format drift catalog diff --git a/scripts/cutover/start_go_stack.sh b/scripts/cutover/start_go_stack.sh new file mode 100755 index 0000000..551daf0 --- /dev/null +++ b/scripts/cutover/start_go_stack.sh @@ -0,0 +1,81 @@ +#!/usr/bin/env bash +# scripts/cutover/start_go_stack.sh +# +# Bring up the full Go stack persistently — alongside the live Rust +# gateway on :3100. All Go daemons land on the parallel port range +# :3110 + :3211-:3220 so there's no port collision. +# +# Unlike playbook_lift.sh's transient harness boot (which kills the +# stack on exit), this script starts every daemon detached via nohup +# + disown. Operators run it once at boot or after a restart; the +# stack stays up until a `pkill -f "bin/(name)"` or reboot. +# +# Logs land in /tmp/gostack-logs/.log (one per daemon). +# +# Used to bring up the persistent stack 2026-05-01 — the first time +# the Go side has run as long-running daemons rather than per-harness +# transient processes. + +set -euo pipefail + +cd "$(dirname "$0")/../.." + +if [ ! -d bin ]; then + echo "[gostack] bin/ missing — run 'just build' first" >&2 + exit 1 +fi + +# Ensure no leftover from a transient harness run. Anchored pattern +# per feedback_pkill_scope; never bare `bin/`. +echo "[gostack] killing any stale Go daemons (anchored pkill)" +pkill -f "bin/(storaged|catalogd|ingestd|queryd|embedd|vectord|pathwayd|observerd|matrixd|gateway)$" 2>/dev/null || true +sleep 0.5 + +mkdir -p /tmp/gostack-logs + +start() { + local bin="$1" + local port="$2" + local log="/tmp/gostack-logs/$bin.log" + nohup ./bin/"$bin" -config lakehouse.toml > "$log" 2>&1 & disown + for _ in $(seq 1 50); do + if curl -sSf -m 1 "http://127.0.0.1:$port/health" >/dev/null 2>&1; then + echo " $bin :$port up (log: $log)" + return 0 + fi + sleep 0.1 + done + echo " $bin :$port FAILED — log tail:" + tail -20 "$log" + return 1 +} + +echo "[gostack] starting in dependency order" +start storaged 3211 +start catalogd 3212 +start ingestd 3213 +start queryd 3214 +start embedd 3216 +start vectord 3215 +start pathwayd 3217 +start observerd 3219 +start matrixd 3218 +start gateway 3110 + +# chatd is started independently — its provider key files come from +# /etc/lakehouse/{ollama_cloud,openrouter,opencode,kimi}.env; if +# chatd is already up (long-running from a prior session) we don't +# touch it. +if ! curl -sSf -m 1 http://127.0.0.1:3220/health >/dev/null 2>&1; then + echo "[gostack] chatd :3220 not up; starting" + start chatd 3220 +else + echo " chatd :3220 already up (skipping)" +fi + +echo +echo "[gostack] ready · sweep:" +for p in 3110 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220; do + curl -sSf -m 1 "http://127.0.0.1:$p/health" 2>/dev/null | head -c 80 + echo +done