golangLAKEHOUSE/scripts/cutover/start_go_stack.sh
root c48b58ff8d start_go_stack.sh: 2-layer isolation from smoke harness
The 2026-05-01 persistent-stack milestone exposed two collision
modes between the long-running Go stack and the pre-push smoke
harness:

1. PKILL COLLISION: smoke teardown uses anchored
   `pkill -f "bin/(storaged|...|gateway)$"`. Same-named persistent
   processes match → smokes kill 7 of 11 persistent daemons.
2. MINIO STATE COLLISION: persistent stack writes
   `_vectors/workers.lhv1` to the shared lakehouse-go-primary
   bucket. Smoke vectord rehydrates from same bucket → sees both
   smoke-owned and persistent-owned indexes → assertion failures.

Both fixed in this commit by adding two isolation layers:

LAYER 1 — distinct binary names via symlink:
  bin/persistent-<daemon> → bin/<daemon>
  Persistent stack runs as ./bin/persistent-gateway etc.
  Smoke pattern `bin/(name)$` matches `bin/gateway$` but NOT
  `bin/persistent-gateway$` (regex group requires bin/ followed
  immediately by a daemon name; "bin/p..." doesn't qualify).
  Cmdline lookup verified: 7 persistent procs, 0 match smoke pkill.

LAYER 2 — separate MinIO bucket via temp config:
  Persistent stack writes to lakehouse-go-persistent (configurable
  via $LH_PERSISTENT_BUCKET). Temp toml at /tmp/lakehouse-persistent.toml
  inherits everything from lakehouse.toml except [s3].bucket which
  is sed-replaced. Bucket auto-created via mc if missing.
  Verified: workers.lhv1 lands in persistent bucket; primary
  bucket _vectors/ stays empty.

Net effect: the persistent stack should survive `git push` (which
runs smokes that rehydrate vectord from primary bucket and pkill
their own bin/<name>$ daemons). This commit is the first push test
WITH the persistent stack live.

Caveat: bin/persistent-* symlinks are gitignored already (/bin/ is
in .gitignore wholesale), so the symlinks need to be created on
each fresh checkout — which start_go_stack.sh does idempotently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 03:20:00 -05:00

143 lines
5.5 KiB
Bash
Executable File

#!/usr/bin/env bash
# scripts/cutover/start_go_stack.sh
#
# Bring up the full Go stack persistently — alongside the live Rust
# gateway on :3100 + alongside the harness-transient stacks the
# smokes spin up. All Go daemons land on the parallel port range
# :3110 + :3211-:3220 (no collision with Rust on :3100). Persistent
# daemons run under DIFFERENT BINARY NAMES (bin/persistent-*) and
# write to a SEPARATE MinIO BUCKET (lakehouse-go-persistent) so the
# pre-push smoke chain — which uses anchored `pkill -f "bin/(name)$"`
# teardown + reads from `lakehouse-go-primary` — can run without
# tearing down or polluting our long-running state.
#
# Two isolation layers:
# 1. BINARY NAMES — persistent stack runs via symlinks
# bin/persistent-<name> → bin/<name>. Smoke pkill pattern
# `bin/(storaged|...|gateway)` matches `bin/<name>` substrings;
# `bin/persistent-<name>` doesn't match because the slash is
# followed by 'p', not the daemon-name first letter.
# 2. MINIO BUCKETS — persistent stack uses lakehouse-go-persistent;
# smoke harnesses use lakehouse-go-primary. Different buckets
# mean rehydrate paths can't see each other's `_vectors/*`
# persistence files. The temp toml at /tmp/lakehouse-persistent.toml
# overrides only [s3].bucket; everything else inherits from
# lakehouse.toml.
#
# Logs land in /tmp/gostack-logs/<bin>.log (one per daemon).
#
# Used to bring up the persistent stack 2026-05-01 — the first time
# the Go side has run as long-running daemons rather than per-harness
# transient processes. The 2-isolation-layer split was added the
# same day after the pre-push gate caught a smoke-vs-persistent
# collision (g1p_smoke saw count=2 when expecting count=1 because
# vectord's MinIO bucket had both the smoke's persist_demo AND the
# persistent stack's workers index).
set -euo pipefail
cd "$(dirname "$0")/../.."
if [ ! -d bin ]; then
echo "[gostack] bin/ missing — run 'just build' first" >&2
exit 1
fi
# ── Layer 1: symlink-based binary names ─────────────────────────────
# Create bin/persistent-* symlinks to bin/* so the persistent stack
# has distinct cmdline strings that smoke pkill won't match. Idempotent
# (existing symlinks are left alone).
DAEMONS=(storaged catalogd ingestd queryd embedd vectord pathwayd observerd matrixd gateway)
for d in "${DAEMONS[@]}"; do
target="bin/persistent-$d"
if [ ! -L "$target" ] && [ ! -e "$target" ]; then
ln -s "$d" "$target"
fi
done
# ── Layer 2: separate MinIO bucket via temp config ──────────────────
# Generate /tmp/lakehouse-persistent.toml from the canonical
# lakehouse.toml with [s3].bucket overridden. Caller can override the
# bucket name via LH_PERSISTENT_BUCKET env var.
PERSISTENT_BUCKET="${LH_PERSISTENT_BUCKET:-lakehouse-go-persistent}"
TEMP_TOML=/tmp/lakehouse-persistent.toml
# Create the bucket if missing. mc is idempotent with --ignore-existing.
if command -v mc >/dev/null 2>&1; then
mc mb --ignore-existing "local/$PERSISTENT_BUCKET" >/dev/null 2>&1 || true
fi
# sed-replace the bucket line. Anchored to "lakehouse-go-primary" so
# no other accidental "primary" mention gets touched.
sed "s/lakehouse-go-primary/$PERSISTENT_BUCKET/g" lakehouse.toml > "$TEMP_TOML"
echo "[gostack] config: $TEMP_TOML (bucket=$PERSISTENT_BUCKET)"
# ── Cleanup any prior persistent daemons ────────────────────────────
# Match by the persistent- prefix so smoke processes are untouched.
echo "[gostack] killing any stale persistent Go daemons (anchored on persistent-)"
pkill -f "bin/persistent-(storaged|catalogd|ingestd|queryd|embedd|vectord|pathwayd|observerd|matrixd|gateway)$" 2>/dev/null || true
sleep 0.5
mkdir -p /tmp/gostack-logs
start() {
local bin="$1"
local port="$2"
local log="/tmp/gostack-logs/persistent-$bin.log"
nohup ./bin/persistent-"$bin" -config "$TEMP_TOML" > "$log" 2>&1 & disown
for _ in $(seq 1 50); do
if curl -sSf -m 1 "http://127.0.0.1:$port/health" >/dev/null 2>&1; then
echo " persistent-$bin :$port up (log: $log)"
return 0
fi
sleep 0.1
done
echo " persistent-$bin :$port FAILED — log tail:"
tail -20 "$log"
return 1
}
echo "[gostack] starting in dependency order"
start storaged 3211
start catalogd 3212
start ingestd 3213
start queryd 3214
start embedd 3216
start vectord 3215
start pathwayd 3217
start observerd 3219
start matrixd 3218
start gateway 3110
# chatd is started independently — its provider key files come from
# /etc/lakehouse/{ollama_cloud,openrouter,opencode,kimi}.env; if
# chatd is already up (long-running from a prior session) we don't
# touch it. chatd uses no S3, so no temp-toml override needed.
if ! curl -sSf -m 1 http://127.0.0.1:3220/health >/dev/null 2>&1; then
echo "[gostack] chatd :3220 not up; starting"
nohup ./bin/chatd -config lakehouse.toml > /tmp/gostack-logs/chatd.log 2>&1 & disown
for _ in $(seq 1 50); do
if curl -sSf -m 1 "http://127.0.0.1:3220/health" >/dev/null 2>&1; then
echo " chatd :3220 up"
break
fi
sleep 0.1
done
else
echo " chatd :3220 already up (skipping)"
fi
echo
echo "[gostack] ready · sweep:"
for p in 3110 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220; do
curl -sSf -m 1 "http://127.0.0.1:$p/health" 2>/dev/null | head -c 80
echo
done
echo
echo "[gostack] persistent stack uses bucket: $PERSISTENT_BUCKET"
echo "[gostack] smoke harnesses use bucket: lakehouse-go-primary"
echo "[gostack] tear down via: pkill -f 'bin/persistent-'"