start_go_stack.sh: document smoke-vs-persistent-stack pkill conflict

Caught immediately after the prior commit pushed: pre-push smokes
killed 7 of 11 persistent Go daemons because the smokes' anchored
`pkill -f "bin/(name)$"` teardown matches ANY process named
`bin/<daemon>`, not just the smokes' own children.

Documented in the script header as a KNOWN CONSTRAINT with a
workaround (re-run start_go_stack.sh after every push) and a
proper-fix sketch (give the persistent stack a different binary
name via build tag or symlink). Proper fix deferred until trigger
fires — operators living through this once will know to want it.

Persistent stack restored (all 11 healthy as of this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-01 02:56:52 -05:00
parent 09904d5222
commit 54b2e7db76

View File

@ -15,6 +15,18 @@
# Used to bring up the persistent stack 2026-05-01 — the first time
# the Go side has run as long-running daemons rather than per-harness
# transient processes.
#
# KNOWN CONSTRAINT: the pre-push smoke chain (`just verify` →
# scripts/{d,g}*_smoke.sh) uses the SAME anchored `pkill -f
# "bin/(name)$"` pattern this script does, and ALSO matches our
# persistent daemons by name. Pushing while the persistent stack
# is up will kill 7 of 11 daemons (gateway, storaged, catalogd,
# ingestd, queryd, embedd, vectord; the smokes don't reach for
# pathwayd/observerd/matrixd/chatd). Workaround: re-run this
# script after every push. A proper fix is to give the persistent
# stack a different binary name (e.g. via build tags or a
# wrapper symlink) so smoke-side pkill doesn't see it; deferred
# until the trigger fires (i.e. when an operator gets bitten).
set -euo pipefail