proof harness: fix queryd refresh-tick race in 04_query_correctness

Caught by the audit rerun: with cache-warm binaries, 04 fires its first SELECT faster than queryd's 500ms refresh tick — Q1 returned 400 ("table not found") even though 03_ingest had registered the manifest. Subsequent queries (after the next tick) succeeded. This is an eventual-consistency wait, not a retry — queryd's contract is that views appear within one tick of catalogd having the manifest. Production code does not need changing. Added to lib/http.sh: proof_wait_for_sql <budget_sec> <sql> polls a SQL probe until it returns 200 or budget elapses; emits no evidence (test setup, not a claim). Used in 04_query_correctness: Wait up to 5s for queryd to have the view before running the 5 SQL assertions. Skip-with-loud-reason if the view never appears. Verified: integration mode back to 104 pass / 0 fail / 1 skip after fix. The skip is the unchanged GOLAKE-085 informational record. This is exactly the kind of finding the harness was designed to surface — the regression existed in the codebase the moment Phase D shipped, but only fired when the next compare run hit cache-warm timing. Without the harness, it would have surfaced on a CI run weeks from now and been hard to bisect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:36:28 -05:00 · 2026-04-29 05:36:28 -05:00 · 4840c10311
commit 4840c10311
parent 4bb6548cbc
2 changed files with 35 additions and 0 deletions
--- a/tests/proof/cases/04_query_correctness.sh
+++ b/tests/proof/cases/04_query_correctness.sh
@ -26,6 +26,15 @@ substitute_table() {
    sed "s/FROM workers/FROM ${DATASET}/g; s/from workers/from ${DATASET}/g"
 }

+# Wait for queryd to have the view from 03's ingest. queryd refreshes
+# every 500ms (proof override of the 30s prod default); on cache-warm
+# runs cases fire faster than the next tick. Up to 5s budget.
+if ! proof_wait_for_sql 5 "SELECT 1 FROM ${DATASET} LIMIT 0"; then
+    proof_skip "$CASE_ID" "queryd view ${DATASET} never appeared in 5s" \
+        "queryd refresh ticker may be stalled or 03_ingest registration failed"
+    return 0 2>/dev/null || exit 0
+fi
+
 # Iterate the 5 queries.
 n=$(jq '.queries | length' "$EXPECTED_FILE")
 for i in $(seq 0 $((n-1))); do
--- a/tests/proof/lib/http.sh
+++ b/tests/proof/lib/http.sh
@ -108,6 +108,32 @@ proof_call() {
    _proof_http_run "$case_id" "$probe" "$method" "$url" "$@"
 }

+# proof_wait_for_sql: wait for a SQL probe to return 200, up to budget
+# seconds. Use when a case follows an ingest and queryd's view-refresh
+# (default 500ms tick) may not have fired yet. NOT a retry — a wait
+# for a known eventual-consistency event. No evidence emitted (this
+# is test setup, not a claim).
+#
+#   proof_wait_for_sql <budget_sec> <sql>
+#
+# Returns 0 if the probe succeeded; 1 on timeout.
+proof_wait_for_sql() {
+    local budget="${1:-10}" sql="$2"
+    local deadline=$(($(date +%s) + budget))
+    local body
+    body=$(jq -nc --arg s "$sql" '{sql:$s}')
+    while [ "$(date +%s)" -lt "$deadline" ]; do
+        if curl -sf --max-time 1 -X POST \
+                -H 'Content-Type: application/json' \
+                -d "$body" \
+                "${PROOF_GATEWAY_URL}/v1/sql" >/dev/null 2>&1; then
+            return 0
+        fi
+        sleep 0.1
+    done
+    return 1
+}
+
 # Helper accessors — reads the per-probe JSON.
 proof_status_of() {
    local case_id="$1" probe="$2"