golangLAKEHOUSE/scripts/d1_smoke.sh
Claw ad2ec1aca9 G0 D1 hardened: 3-lineage scrum review on shipped code · 7 fixes applied
Code-review pass after D1 shipped, all three model lineages running
in parallel against the actual Go source (not docs):

Convergent findings (≥2 reviewers — high confidence):
- C1 BLOCK · Run() errCh/select race could silently drop fast bind
  errors. Fixed: net.Listen() now runs synchronously before the
  goroutine; bind errors surface as Run()'s return value.
- C2 BLOCK · scripts/d1_smoke.sh sleep 0.5 races bind on cold boxes.
  Fixed: replaced with poll_health() loop, 5s/svc budget, 50ms poll.
- C3 WARN · LoadConfig silent fallback when file missing. Fixed:
  emits slog.Warn with path + hint when path given but file absent.

Single-reviewer fixes:
- S1 WARN · slog.SetDefault inside Run() mutated global state from a
  library function. Fixed: Run() no longer calls SetDefault.
- S2 WARN · os.IsNotExist → errors.Is(err, fs.ErrNotExist) idiom.
- S6 WARN · smoke double-curl collapsed to single curl -i parse.

Second-pass Opus review on post-fix code caught one more:
- head -1 on curl -i fragile against 1xx interim lines. Fixed:
  awk picks the last HTTP/* status line (robust to 100 Continue).

Accepted with rationale (deferred or planned):
- S3 secrets-in-lakehouse.toml: D2.3 SecretsProvider already planned
- S4 5x cmd/*/main.go duplication: defer until D2 reveals real
  per-service config consumption
- S5 /health log volume: defer post-G0, not on k8s yet
- 2nd-pass theoreticals: clean-exit-no-Shutdown path doesn't trigger,
  defensive defer ln.Close() aspirational, etc.

Verification:
- go build ./cmd/...  exit 0
- go vet ./...         clean
- ./scripts/d1_smoke.sh  D1 acceptance gate: PASSED
- 3-lineage code review · 14 findings · 7 fixed · 0 deferred · 5
  accepted with rationale

Total D1 review coverage across the phase:
- 3 doc-review passes (Opus + Kimi + Qwen) — 13 findings, 10 fixed
- 1 runtime smoke — 1 finding (port 3100 collision), fixed
- 1 code-review parallel pass — 14 findings, 7 fixed
- 1 code-review second pass (Opus) — 1 actionable, fixed
- Cumulative: 29 findings · 19 fixed inline · 5 accepted · 5 deferred

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 07:07:50 -05:00

88 lines
2.8 KiB
Bash
Executable File

#!/usr/bin/env bash
# D1 smoke — proves the Day 1 acceptance gate end-to-end.
# Builds all 5 binaries, launches them, polls /health on each until
# ready, then runs the actual probes. Exits 0 on success.
#
# Per Opus + Qwen BLOCK #2 review: replaced the prior `sleep 0.5`
# liveness gate with a poll loop so cold-start CI boxes don't race
# the bind.
#
# Usage: ./scripts/d1_smoke.sh
set -euo pipefail
cd "$(dirname "$0")/.."
export PATH="$PATH:/usr/local/go/bin"
echo "[d1-smoke] building..."
go build -o bin/ ./cmd/...
PIDS=()
trap 'echo "[d1-smoke] cleanup"; kill ${PIDS[@]} 2>/dev/null || true; wait 2>/dev/null || true' EXIT INT TERM
echo "[d1-smoke] launching..."
for SVC in gateway storaged catalogd ingestd queryd; do
./bin/$SVC > /tmp/${SVC}.log 2>&1 &
PIDS+=($!)
done
# Poll /health on each service until it returns 200 or we hit the
# 5s budget. Cheaper than a fixed sleep AND deterministic — first
# bind error surfaces immediately, slow boxes wait as long as needed.
poll_health() {
local name="$1" port="$2" deadline=$(($(date +%s) + 5))
while [ "$(date +%s)" -lt "$deadline" ]; do
if curl -sS --max-time 1 "http://127.0.0.1:$port/health" >/dev/null 2>&1; then
return 0
fi
sleep 0.05
done
echo " [d1-smoke] $name (:$port) failed to bind within 5s — log:"
tail -5 "/tmp/${name}.log" | sed 's/^/ /'
return 1
}
echo "[d1-smoke] waiting for /health (poll up to 5s/svc)..."
for SPEC in "gateway:3110" "storaged:3211" "catalogd:3212" "ingestd:3213" "queryd:3214"; do
NAME="${SPEC%:*}"; PORT="${SPEC#*:}"
if ! poll_health "$NAME" "$PORT"; then
exit 1
fi
done
echo "[d1-smoke] /health probes:"
FAILED=0
for SPEC in "gateway:3110" "storaged:3211" "catalogd:3212" "ingestd:3213" "queryd:3214"; do
NAME="${SPEC%:*}"; PORT="${SPEC#*:}"
RESP="$(curl -sS --max-time 2 "http://127.0.0.1:$PORT/health" || echo FAIL)"
if echo "$RESP" | grep -q "\"service\":\"$NAME\""; then
echo "$NAME (:$PORT) → $RESP"
else
echo "$NAME (:$PORT) → $RESP"
FAILED=1
fi
done
# Single curl with -i grabs both code + headers in one pass, per Opus
# WARN #6 — was 2 calls per route, doubling load + creating window.
echo "[d1-smoke] gateway 501 stub probes:"
for ROUTE in /v1/ingest /v1/sql; do
RESP="$(curl -sS -i --max-time 2 -X POST "http://127.0.0.1:3110$ROUTE")"
# Per Opus 2nd-pass WARN: head -1 fails on 1xx interim lines.
# awk picks the LAST HTTP/* status line — robust to 100 Continue.
CODE="$(echo "$RESP" | awk '/^HTTP\//{code=$2} END{print code}')"
HDR="$(echo "$RESP" | grep -i 'X-Lakehouse-Stub' || true)"
if [ "$CODE" = "501" ] && [ -n "$HDR" ]; then
echo " ✓ POST $ROUTE → 501 + $HDR"
else
echo " ✗ POST $ROUTE → code=$CODE hdr=$HDR"
FAILED=1
fi
done
if [ "$FAILED" -ne 0 ]; then
echo "[d1-smoke] FAILED"
exit 1
fi
echo "[d1-smoke] D1 acceptance gate: PASSED"