proof harness: fix queryd refresh-tick race in 04_query_correctness

Caught by the audit rerun: with cache-warm binaries, 04 fires its
first SELECT faster than queryd's 500ms refresh tick — Q1 returned
400 ("table not found") even though 03_ingest had registered the
manifest. Subsequent queries (after the next tick) succeeded.

This is an eventual-consistency wait, not a retry — queryd's
contract is that views appear within one tick of catalogd having the
manifest. Production code does not need changing.

Added to lib/http.sh:
  proof_wait_for_sql <budget_sec> <sql>
    polls a SQL probe until it returns 200 or budget elapses; emits
    no evidence (test setup, not a claim).

Used in 04_query_correctness:
  Wait up to 5s for queryd to have the view before running the 5
  SQL assertions. Skip-with-loud-reason if the view never appears.

Verified: integration mode back to 104 pass / 0 fail / 1 skip after
fix. The skip is the unchanged GOLAKE-085 informational record.

This is exactly the kind of finding the harness was designed to
surface — the regression existed in the codebase the moment Phase D
shipped, but only fired when the next compare run hit cache-warm
timing. Without the harness, it would have surfaced on a CI run
weeks from now and been hard to bisect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-29 05:36:28 -05:00
parent 4bb6548cbc
commit 4840c10311
2 changed files with 35 additions and 0 deletions

View File

@ -26,6 +26,15 @@ substitute_table() {
sed "s/FROM workers/FROM ${DATASET}/g; s/from workers/from ${DATASET}/g"
}
# Wait for queryd to have the view from 03's ingest. queryd refreshes
# every 500ms (proof override of the 30s prod default); on cache-warm
# runs cases fire faster than the next tick. Up to 5s budget.
if ! proof_wait_for_sql 5 "SELECT 1 FROM ${DATASET} LIMIT 0"; then
proof_skip "$CASE_ID" "queryd view ${DATASET} never appeared in 5s" \
"queryd refresh ticker may be stalled or 03_ingest registration failed"
return 0 2>/dev/null || exit 0
fi
# Iterate the 5 queries.
n=$(jq '.queries | length' "$EXPECTED_FILE")
for i in $(seq 0 $((n-1))); do

View File

@ -108,6 +108,32 @@ proof_call() {
_proof_http_run "$case_id" "$probe" "$method" "$url" "$@"
}
# proof_wait_for_sql: wait for a SQL probe to return 200, up to budget
# seconds. Use when a case follows an ingest and queryd's view-refresh
# (default 500ms tick) may not have fired yet. NOT a retry — a wait
# for a known eventual-consistency event. No evidence emitted (this
# is test setup, not a claim).
#
# proof_wait_for_sql <budget_sec> <sql>
#
# Returns 0 if the probe succeeded; 1 on timeout.
proof_wait_for_sql() {
local budget="${1:-10}" sql="$2"
local deadline=$(($(date +%s) + budget))
local body
body=$(jq -nc --arg s "$sql" '{sql:$s}')
while [ "$(date +%s)" -lt "$deadline" ]; do
if curl -sf --max-time 1 -X POST \
-H 'Content-Type: application/json' \
-d "$body" \
"${PROOF_GATEWAY_URL}/v1/sql" >/dev/null 2>&1; then
return 0
fi
sleep 0.1
done
return 1
}
# Helper accessors — reads the per-probe JSON.
proof_status_of() {
local case_id="$1" probe="$2"