lakehouse

History

root 352f99de0f Hybrid SQL+Vector search — the gap is closed

POST /vectors/hybrid takes a question + SQL WHERE clause. Pipeline:
1. SQL filter narrows to structurally-valid candidates (role, state,
   reliability, certs — whatever the caller specifies)
2. Brute-force cosine scores ALL embeddings (not HNSW, which caps at
   ~30 results due to ef_search — too few to intersect with narrow
   SQL filters on 10K+ datasets)
3. Filter vector results to only SQL-verified IDs
4. LLM generates answer from verified-correct records

Tested on the exact query that failed the staffing simulation:
"forklift operators in IL with reliability > 0.8" — SQL found 78
matches, vector ranked the 5 most semantically relevant, LLM
generated an answer citing real workers with actual skills and
certifications. Every source marked sql_verified=true.

This closes the architectural gap identified by the quality eval:
structured precision (SQL) + semantic intelligence (vector) in one
endpoint. The simulation's contract-matching path was already
SQL-pure and worked perfectly; now the intelligence-question path
has the same accuracy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-16 22:49:48 -05:00

aibridge

Phases 16.2 + L2 + 17 VRAM gate + MySQL + 18 Lance hybrid milestone

2026-04-16 20:24:46 -05:00

catalogd

Phases 16.2 + L2 + 17 VRAM gate + MySQL + 18 Lance hybrid milestone

2026-04-16 20:24:46 -05:00

gateway

Phase E: Scheduled ingest — the substrate runs itself

2026-04-16 20:36:04 -05:00

ingestd

PDF OCR via Tesseract — scanned documents now ingestible