lakehouse/scripts/lance_smoke.sh
root 7bb66f08c3
Some checks failed
lakehouse/auditor 9 blocking issues: cloud: claim not backed — "Verified live (post-restart): scale_test_10m doc-fetch 4-15ms across"
lance: scrum-driven sanitizer + smoke-gate fixes (opus 2026-05-02 BLOCK)
Cross-lineage scrum on the lance wave (4 bundles, 33 distinct findings)
surfaced 1 real BLOCK and 2 real WARNs from opus that the kimi/qwen
lineages missed. Per feedback_cross_lineage_review.md, opus is the
load-bearing reviewer; cross-lineage convergence is noise unless verified.

BLOCK fix — sanitize_lance_err path-stripping was unsound:
  err.split("/home/").next().unwrap_or(&err)
returns Some("") when err STARTS with "/home/", erasing the entire
message. Replaced truncation with redact_paths() — a hand-rolled scanner
that walks the input once, replacing path-shaped substrings with
[REDACTED] while preserving surrounding error context. Catches:
- absolute paths under /root/.cargo, /home, /var, /tmp, /etc, /usr, /opt
- relative variants (Lance occasionally strips leading slash —
  observed live "Dataset at path home/profit/lakehouse/data/lance/x
  was not found")
- multiple occurrences in one error
- preserves quote/comma/whitespace terminators

WARN fix #1 — is_not_found heuristic was too broad:
  lower.contains("not found")
caught real 500s like "column not found", "field not found in schema".
Narrowed to require dataset-shape phrasing AND exclude the
column/field/schema patterns explicitly.

WARN fix #2 — lance_smoke.sh `grep -qvE` was an unsound regression gate.
  bash -c "echo '$BODY' | grep -qvE 'pat'"
With -v -q, exits 0 if ANY line lacks the pattern — so a multi-line
body with one leak line + any clean line FALSE-PASSES. Replaced with
the correct "pattern absent" form: `! grep -qE 'pat'`. Also expanded
the pattern set (added /var/, /tmp/) since the scrum surfaced these
as additional leak vectors.

Also unblocks pre-existing pathway_memory test compile error (stale
PathwayTrace init missing 6 Mem0-versioning fields added in 6ac7f61).
Tests filled in with sensible defaults — needed to run sanitize_tests.

10/10 new sanitize tests pass. Smoke 9/9 PASS against rebuilt+restarted
gateway. Live missing-index probe now returns:
  "lance dataset not found: no-such-11205" + HTTP 404
(was: leaked absolute paths + HTTP 500 → leaked absolute and relative
paths post-first-fix → clean message + 404 now.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:34:54 -05:00

96 lines
4.6 KiB
Bash
Executable File

#!/usr/bin/env bash
# lance smoke — gates the 5 /vectors/lance/* HTTP routes (search, doc,
# index, append, migrate). Only the read paths are exercised here so a
# CI run doesn't mutate state. Migrate + index + append have shape
# probes (request bodies are well-formed) but ride the not-found path
# that the 2026-05-02 audit added.
#
# Targets the live gateway at $LH_GATEWAY (default :3100). Uses an
# existing on-disk Lance dataset — `workers_500k_v1` — so no
# migration setup is needed. If the dataset is missing the smoke
# fails loudly with a clear message.
#
# Surfaced 2026-05-02: the lance crates had zero tests + no smoke;
# substrate change to lance_backend.rs would silently break the live
# surface. This smoke is the regression gate.
#
# Usage:
# ./scripts/lance_smoke.sh
# LH_GATEWAY=http://127.0.0.1:3100 ./scripts/lance_smoke.sh
set -euo pipefail
GATEWAY="${LH_GATEWAY:-http://127.0.0.1:3100}"
DATASET="${LH_LANCE_DATASET:-workers_500k_v1}"
PREFIX="$GATEWAY/vectors/lance"
PASS=0; FAIL=0
PROBE() { local label="$1"; shift; "$@" && { echo "$label"; PASS=$((PASS+1)); } || { echo "$label"; FAIL=$((FAIL+1)); }; }
echo "[lance-smoke] gateway=$GATEWAY dataset=$DATASET"
# ── 0. Gateway alive ─────────────────────────────────────────────
PROBE "gateway /v1/health responds" \
bash -c "curl -sf -m 3 $GATEWAY/v1/health -o /dev/null"
# ── 1. Search returns IVF_PQ results on existing dataset ────────
RESP=$(curl -sS -m 30 -X POST "$PREFIX/search/$DATASET" \
-H 'Content-Type: application/json' \
-d '{"query":"forklift operator","top_k":3}' 2>/dev/null || echo '{}')
PROBE "search/$DATASET returns top-3 lance_ivf_pq results" \
bash -c "echo '$RESP' | jq -e '.method == \"lance_ivf_pq\" and (.results | length) == 3' >/dev/null"
# Capture one doc_id from those results so the next probe has something real to fetch.
DOC_ID=$(echo "$RESP" | jq -r '.results[0].doc_id // ""')
# ── 2. get_doc by id returns the row ────────────────────────────
PROBE "doc/$DATASET/<known-id> returns full row" \
bash -c "[ -n '$DOC_ID' ] && curl -sf -m 5 '$PREFIX/doc/$DATASET/$DOC_ID' | jq -e '.row.doc_id == \"$DOC_ID\"' >/dev/null"
# ── 3. get_doc with bogus id returns 404 (not 500) ──────────────
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_404.json -w '%{http_code}' \
"$PREFIX/doc/$DATASET/W500K-NOT-A-REAL-ID-00000")
PROBE "doc/$DATASET/<missing-id> → 404" \
test "$STATUS" = "404"
# ── 4. search on missing dataset returns 404 + sanitized message ─
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_500.json -w '%{http_code}' \
-X POST "$PREFIX/search/no-such-dataset-${RANDOM}" \
-H 'Content-Type: application/json' \
-d '{"query":"x","top_k":1}')
BODY=$(cat /tmp/lance_smoke_500.json)
PROBE "search/<missing> → 404 (was 500 pre-2026-05-02)" \
test "$STATUS" = "404"
# Assert "pattern absent" — `! grep -qE` (NOT `grep -qvE` which is unsound:
# -v -q exits 0 if ANY line lacks the pattern, so a multi-line body containing
# both a leak line AND any clean line would false-PASS. Caught 2026-05-02 by
# opus scrum on the lance backend wave.)
PROBE "search/<missing> body sanitized — no filesystem leak" \
bash -c "! echo '$BODY' | grep -qE '/home/|/root/\.cargo/|/var/|/tmp/'"
# ── 5. build_index on missing dataset also sanitized ────────────
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_idx.json -w '%{http_code}' \
-X POST "$PREFIX/index/no-such-dataset-${RANDOM}" \
-H 'Content-Type: application/json' \
-d '{}')
BODY=$(cat /tmp/lance_smoke_idx.json)
PROBE "index/<missing> body sanitized" \
bash -c "! echo '$BODY' | grep -qE '/home/|/root/\.cargo/|/var/|/tmp/'"
# ── 6. append validates input shape (rejects empty rows array) ──
STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
-X POST "$PREFIX/append/$DATASET" \
-H 'Content-Type: application/json' \
-d '{"rows":[]}')
PROBE "append with empty rows[] → 400" \
test "$STATUS" = "400"
# ── 7. migrate route is reachable (POST without body returns a real error, not 404) ──
STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
-X POST "$PREFIX/migrate/probe-not-real-${RANDOM}?bucket=primary" 2>/dev/null)
# Should be 4xx (bad request shape), NOT 404 (route registered) and NOT 200.
PROBE "migrate route registered (non-404, non-200 on empty body)" \
bash -c "[ '$STATUS' != '404' ] && [ '$STATUS' != '200' ]"
echo "[lance-smoke] $PASS PASS / $FAIL FAIL"
[ "$FAIL" -eq 0 ]