From 296bdaa7463e6362777a253ba8f903e2167631a9 Mon Sep 17 00:00:00 2001 From: root Date: Thu, 16 Apr 2026 23:10:56 -0500 Subject: [PATCH] PRD: hybrid search is operational, Ethereal data integrated Status updated to reflect hybrid SQL+vector search, IVF_PQ 0.97 recall, 10K Ethereal worker profiles, autonomous agent validation. Query Paths section updated with the shipped hybrid endpoint and its verified zero-hallucination results from the staffing simulation. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/PRD.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/PRD.md b/docs/PRD.md index ad747a2..b97aeb3 100644 --- a/docs/PRD.md +++ b/docs/PRD.md @@ -1,8 +1,8 @@ # PRD: Lakehouse — Rust-First Substrate for Versioned Knowledge Stores -**Status:** Active — Phases 0-18 shipped; hybrid Parquet+HNSW ⊕ Lance operational; scheduled ingest live; PDF OCR live; entering horizon items +**Status:** Active — Phases 0-18 shipped; hybrid SQL+Vector search operational; IVF_PQ recall tuned to 0.97; 10K real staffing profiles ingested from Ethereal; autonomous agent + staffing simulation verified **Created:** 2026-03-27 -**Last reframed:** 2026-04-17 — hybrid architecture proven end-to-end on 100K vectors (see §Phase 18 + ADR-019) +**Last updated:** 2026-04-17 — hybrid search closes the SQL+vector gap; quality eval surfaced and fixed real bugs **Owner:** J --- @@ -110,8 +110,8 @@ User question → gateway **Semantic (RAG):** "Find candidates who could do data engineering work" → Embed question → vector search across resume embeddings → retrieve top chunks → LLM answers -**Hybrid:** "Which clients are we losing money on, and why?" -→ SQL for margin calculations + RAG over client notes/emails for context → LLM synthesizes +**Hybrid (shipped 2026-04-17):** "Find reliable forklift operators in Illinois with OSHA certs" +→ `POST /vectors/hybrid` with `sql_filter` + `question`: SQL narrows to structurally-valid candidates (role, state, reliability, certs), brute-force cosine ranks by semantic relevance within the filtered set, LLM generates answer from SQL-verified records only. Zero hallucinations on the staffing simulation (16/16 positions filled, all workers verified against golden data). ### Invariants