golangLAKEHOUSE/scripts/cutover/start_go_stack.sh
root 09904d5222 cutover: persistent Go stack milestone — first long-running deployment + first Go-emitted audit_baselines entry
J's "let's go" instruction: leave OPEN list behind, push the Go
substrate forward into actual deployment shape. This commit marks
the first time the Go side has run as long-running daemons rather
than per-harness transient processes, and the first time the
shared cross-runtime longitudinal log has carried a Go-emitted
entry alongside the Rust ones.

What landed:

scripts/cutover/start_go_stack.sh — the persistent-stack runbook.
Brings up all 11 daemons (storaged → catalogd → ingestd → queryd
→ embedd → vectord → pathwayd → observerd → matrixd → gateway,
plus chatd-if-not-already-up) in dependency order via nohup +
disown. Anchored pkill per feedback_pkill_scope (never bare
"bin/"). Logs land in /tmp/gostack-logs/<bin>.log, one per daemon.

Verified live state:
- All 11 services healthy on :3110 + :3211-:3220
- gateway → embedd proxy returns nomic-embed-text-v2-moe vectors
- chatd reports 5/5 providers loaded
- No port collision with Rust gateway on :3100
- Daemons stay up after exit of the start script (production shape,
  not harness-transient)

audit_baselines.jsonl crosses the runtime boundary:
- 7 Rust-emitted entries (last: ca7375ea 2026-04-27)
- 1 Go-emitted entry (ee2a40c 2026-05-01T07:53:54Z) appended via
  ./bin/audit_full -append-baseline
- Same envelope shape, same metric set, same drift comparator
  semantics — operators running either runtime grow the same log

What this DOES prove:
- Substrate parity at deployment shape (not just unit tests)
- Cross-runtime artifact write-side compatibility (was previously
  proven on read side via audit_baselines roundtrip)
- The deploy machinery works end-to-end for the persistent case

What this does NOT prove (still ahead):
- Real coordinator traffic against the Go stack (no nginx flip yet;
  devop.live/lakehouse/ still serves through Rust)
- Go-side production materializer (Phase 2 is observer-only)
- Replay tool parity (Phase 7 is observer-only)
- The 5-loop product gate against actual humans

reports/cutover/SUMMARY.md now logs three new rows:
- audit-FULL with 12/12 phases ported
- First Go-emitted audit_baselines entry
- Persistent Go stack live

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:55:29 -05:00

82 lines
2.4 KiB
Bash
Executable File

#!/usr/bin/env bash
# scripts/cutover/start_go_stack.sh
#
# Bring up the full Go stack persistently — alongside the live Rust
# gateway on :3100. All Go daemons land on the parallel port range
# :3110 + :3211-:3220 so there's no port collision.
#
# Unlike playbook_lift.sh's transient harness boot (which kills the
# stack on exit), this script starts every daemon detached via nohup
# + disown. Operators run it once at boot or after a restart; the
# stack stays up until a `pkill -f "bin/(name)"` or reboot.
#
# Logs land in /tmp/gostack-logs/<bin>.log (one per daemon).
#
# Used to bring up the persistent stack 2026-05-01 — the first time
# the Go side has run as long-running daemons rather than per-harness
# transient processes.
set -euo pipefail
cd "$(dirname "$0")/../.."
if [ ! -d bin ]; then
echo "[gostack] bin/ missing — run 'just build' first" >&2
exit 1
fi
# Ensure no leftover from a transient harness run. Anchored pattern
# per feedback_pkill_scope; never bare `bin/`.
echo "[gostack] killing any stale Go daemons (anchored pkill)"
pkill -f "bin/(storaged|catalogd|ingestd|queryd|embedd|vectord|pathwayd|observerd|matrixd|gateway)$" 2>/dev/null || true
sleep 0.5
mkdir -p /tmp/gostack-logs
start() {
local bin="$1"
local port="$2"
local log="/tmp/gostack-logs/$bin.log"
nohup ./bin/"$bin" -config lakehouse.toml > "$log" 2>&1 & disown
for _ in $(seq 1 50); do
if curl -sSf -m 1 "http://127.0.0.1:$port/health" >/dev/null 2>&1; then
echo " $bin :$port up (log: $log)"
return 0
fi
sleep 0.1
done
echo " $bin :$port FAILED — log tail:"
tail -20 "$log"
return 1
}
echo "[gostack] starting in dependency order"
start storaged 3211
start catalogd 3212
start ingestd 3213
start queryd 3214
start embedd 3216
start vectord 3215
start pathwayd 3217
start observerd 3219
start matrixd 3218
start gateway 3110
# chatd is started independently — its provider key files come from
# /etc/lakehouse/{ollama_cloud,openrouter,opencode,kimi}.env; if
# chatd is already up (long-running from a prior session) we don't
# touch it.
if ! curl -sSf -m 1 http://127.0.0.1:3220/health >/dev/null 2>&1; then
echo "[gostack] chatd :3220 not up; starting"
start chatd 3220
else
echo " chatd :3220 already up (skipping)"
fi
echo
echo "[gostack] ready · sweep:"
for p in 3110 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220; do
curl -sSf -m 1 "http://127.0.0.1:$p/health" 2>/dev/null | head -c 80
echo
done