Some checks failed
lakehouse/auditor 12 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:"
Surfaced by the 2026-05-02 audit (vectord-lance + lance-bench + glue
existed and worked but had no tests, no smoke, leaked server paths
on missing-index search, and the ADR-019 10M re-bench was deferred).
## 1. Fix: missing-index search returned 500 + leaked filesystem path
Pre-fix:
$ POST /vectors/lance/search/no-such-index
HTTP 500
Dataset at path home/profit/lakehouse/data/lance/no-such-index was
not found: Not found: home/profit/lakehouse/data/lance/no-such-index/
_versions, /root/.cargo/registry/src/index.crates.io-...-1949cf8c.../
lance-table-4.0.0/src/io/commit.rs:364:26, ...
Post-fix:
HTTP 404
lance dataset not found: no-such-index
Added `sanitize_lance_err()` in crates/vectord/src/service.rs that:
- maps "not found" / "no such file" patterns → 404 (was 500)
- strips /home/ and /root/.cargo/ paths from any error body
Applied to all 5 lance handlers: search, get_doc, build_index,
append, migrate. The store_for() handle is cheap-and-stateless;
the actual disk hit happens inside the operation, which is where
the leak originated.
## 2. scripts/lance_smoke.sh — first regression gate
9-probe smoke against the live HTTP surface. Exercises only read
paths (no state mutation in CI). Specifically locks the sanitizer
fix — a future regression that re-introduces the path leak fires
the smoke immediately. 9/9 PASS against the live :3100 today.
## 3. Unit tests on vectord-lance/src/lib.rs (was: zero tests)
7 tests covering the public LanceVectorStore API:
- fresh_store_reports_no_state — handle is lazy
- migrate_then_count_and_fetch — Parquet → Lance round-trip
- get_by_doc_id_missing_returns_none — Ok(None) vs Err contract
that lets the HTTP handler return 404 cleanly
- append_grows_count_and_new_rows_fetchable — ADR-019's
structural-difference claim verified at the unit level
- append_dim_mismatch_errors — guards against silently breaking
search by accepting inconsistent-dim rows
- search_returns_nearest — exact-vector match → top-1
- stats_reports_post_migrate_state — locks the field shape
7/7 PASS. cargo test -p vectord-lance --lib green.
## 4. 10M re-bench (deferred from ADR-019)
reports/lance_10m_rebench_2026-05-02.md captures the numbers driven
against the live :3100 over data/lance/scale_test_10m (33GB / 10M
vectors, IVF_PQ confirmed via response method tag).
Headline:
Search cold (10 diverse queries): median ~32ms, mean ~46ms
Search warm (5x same query): ~20ms p50
Doc fetch (5x same id): ~100ms p50
Search latency at 10M is acceptable for batch / async workloads,
too slow for sub-10ms voice/recommendation paths. ADR-019's "Lance
pulls ahead at 10M" claim remains unverified-but-not-refuted — at
this scale HNSW doesn't operationally exist (10M × 768d × 4 bytes =
30GB just for vectors).
Real finding: doc-fetch at 10M is 300x slower than the 100K number
ADR-019 cited (311μs → ~100ms). Likely cause: scalar btree index
on doc_id may not be built for this dataset. Follow-up to
investigate whether forcing build_scalar_index brings it back to
the load-bearing O(1) range. Captured in the report.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
93 lines
4.3 KiB
Bash
Executable File
93 lines
4.3 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# lance smoke — gates the 5 /vectors/lance/* HTTP routes (search, doc,
|
|
# index, append, migrate). Only the read paths are exercised here so a
|
|
# CI run doesn't mutate state. Migrate + index + append have shape
|
|
# probes (request bodies are well-formed) but ride the not-found path
|
|
# that the 2026-05-02 audit added.
|
|
#
|
|
# Targets the live gateway at $LH_GATEWAY (default :3100). Uses an
|
|
# existing on-disk Lance dataset — `workers_500k_v1` — so no
|
|
# migration setup is needed. If the dataset is missing the smoke
|
|
# fails loudly with a clear message.
|
|
#
|
|
# Surfaced 2026-05-02: the lance crates had zero tests + no smoke;
|
|
# substrate change to lance_backend.rs would silently break the live
|
|
# surface. This smoke is the regression gate.
|
|
#
|
|
# Usage:
|
|
# ./scripts/lance_smoke.sh
|
|
# LH_GATEWAY=http://127.0.0.1:3100 ./scripts/lance_smoke.sh
|
|
|
|
set -euo pipefail
|
|
|
|
GATEWAY="${LH_GATEWAY:-http://127.0.0.1:3100}"
|
|
DATASET="${LH_LANCE_DATASET:-workers_500k_v1}"
|
|
PREFIX="$GATEWAY/vectors/lance"
|
|
PASS=0; FAIL=0
|
|
PROBE() { local label="$1"; shift; "$@" && { echo " ✓ $label"; PASS=$((PASS+1)); } || { echo " ✗ $label"; FAIL=$((FAIL+1)); }; }
|
|
|
|
echo "[lance-smoke] gateway=$GATEWAY dataset=$DATASET"
|
|
|
|
# ── 0. Gateway alive ─────────────────────────────────────────────
|
|
PROBE "gateway /v1/health responds" \
|
|
bash -c "curl -sf -m 3 $GATEWAY/v1/health -o /dev/null"
|
|
|
|
# ── 1. Search returns IVF_PQ results on existing dataset ────────
|
|
RESP=$(curl -sS -m 30 -X POST "$PREFIX/search/$DATASET" \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"query":"forklift operator","top_k":3}' 2>/dev/null || echo '{}')
|
|
PROBE "search/$DATASET returns top-3 lance_ivf_pq results" \
|
|
bash -c "echo '$RESP' | jq -e '.method == \"lance_ivf_pq\" and (.results | length) == 3' >/dev/null"
|
|
|
|
# Capture one doc_id from those results so the next probe has something real to fetch.
|
|
DOC_ID=$(echo "$RESP" | jq -r '.results[0].doc_id // ""')
|
|
|
|
# ── 2. get_doc by id returns the row ────────────────────────────
|
|
PROBE "doc/$DATASET/<known-id> returns full row" \
|
|
bash -c "[ -n '$DOC_ID' ] && curl -sf -m 5 '$PREFIX/doc/$DATASET/$DOC_ID' | jq -e '.row.doc_id == \"$DOC_ID\"' >/dev/null"
|
|
|
|
# ── 3. get_doc with bogus id returns 404 (not 500) ──────────────
|
|
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_404.json -w '%{http_code}' \
|
|
"$PREFIX/doc/$DATASET/W500K-NOT-A-REAL-ID-00000")
|
|
PROBE "doc/$DATASET/<missing-id> → 404" \
|
|
test "$STATUS" = "404"
|
|
|
|
# ── 4. search on missing dataset returns 404 + sanitized message ─
|
|
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_500.json -w '%{http_code}' \
|
|
-X POST "$PREFIX/search/no-such-dataset-${RANDOM}" \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"query":"x","top_k":1}')
|
|
BODY=$(cat /tmp/lance_smoke_500.json)
|
|
PROBE "search/<missing> → 404 (was 500 pre-2026-05-02)" \
|
|
test "$STATUS" = "404"
|
|
# The sanitizer fix specifically: no /home/ or /root/.cargo/ in body.
|
|
PROBE "search/<missing> body sanitized — no filesystem leak" \
|
|
bash -c "echo '$BODY' | grep -qvE '/home/|/root/\.cargo/'"
|
|
|
|
# ── 5. build_index on missing dataset also sanitized ────────────
|
|
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_idx.json -w '%{http_code}' \
|
|
-X POST "$PREFIX/index/no-such-dataset-${RANDOM}" \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{}')
|
|
BODY=$(cat /tmp/lance_smoke_idx.json)
|
|
PROBE "index/<missing> body sanitized" \
|
|
bash -c "echo '$BODY' | grep -qvE '/home/|/root/\.cargo/'"
|
|
|
|
# ── 6. append validates input shape (rejects empty rows array) ──
|
|
STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
|
|
-X POST "$PREFIX/append/$DATASET" \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"rows":[]}')
|
|
PROBE "append with empty rows[] → 400" \
|
|
test "$STATUS" = "400"
|
|
|
|
# ── 7. migrate route is reachable (POST without body returns a real error, not 404) ──
|
|
STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
|
|
-X POST "$PREFIX/migrate/probe-not-real-${RANDOM}?bucket=primary" 2>/dev/null)
|
|
# Should be 4xx (bad request shape), NOT 404 (route registered) and NOT 200.
|
|
PROBE "migrate route registered (non-404, non-200 on empty body)" \
|
|
bash -c "[ '$STATUS' != '404' ] && [ '$STATUS' != '200' ]"
|
|
|
|
echo "[lance-smoke] $PASS PASS / $FAIL FAIL"
|
|
[ "$FAIL" -eq 0 ]
|