diff --git a/docs/PRD.md b/docs/PRD.md index ad747a2..b97aeb3 100644 --- a/docs/PRD.md +++ b/docs/PRD.md @@ -1,8 +1,8 @@ # PRD: Lakehouse — Rust-First Substrate for Versioned Knowledge Stores -**Status:** Active — Phases 0-18 shipped; hybrid Parquet+HNSW ⊕ Lance operational; scheduled ingest live; PDF OCR live; entering horizon items +**Status:** Active — Phases 0-18 shipped; hybrid SQL+Vector search operational; IVF_PQ recall tuned to 0.97; 10K real staffing profiles ingested from Ethereal; autonomous agent + staffing simulation verified **Created:** 2026-03-27 -**Last reframed:** 2026-04-17 — hybrid architecture proven end-to-end on 100K vectors (see §Phase 18 + ADR-019) +**Last updated:** 2026-04-17 — hybrid search closes the SQL+vector gap; quality eval surfaced and fixed real bugs **Owner:** J --- @@ -110,8 +110,8 @@ User question → gateway **Semantic (RAG):** "Find candidates who could do data engineering work" → Embed question → vector search across resume embeddings → retrieve top chunks → LLM answers -**Hybrid:** "Which clients are we losing money on, and why?" -→ SQL for margin calculations + RAG over client notes/emails for context → LLM synthesizes +**Hybrid (shipped 2026-04-17):** "Find reliable forklift operators in Illinois with OSHA certs" +→ `POST /vectors/hybrid` with `sql_filter` + `question`: SQL narrows to structurally-valid candidates (role, state, reliability, certs), brute-force cosine ranks by semantic relevance within the filtered set, LLM generates answer from SQL-verified records only. Zero hallucinations on the staffing simulation (16/16 positions filled, all workers verified against golden data). ### Invariants