lakehouse/data/_catalog/manifests/75bb6855-488b-4300-89c2-970871bd99cc.json
root 26fc98c885 Phase 7: Vector index + RAG pipeline
- vectord crate: chunk → embed → store → search → RAG
- chunker: configurable chunk size + overlap, sentence-boundary aware splitting
- store: embeddings as Parquet (binary blob f32 vectors), portable format
- search: brute-force cosine similarity (works up to ~100K vectors)
- rag: full pipeline — embed question → search index → retrieve context → LLM answer
- Endpoints: POST /vectors/index, /vectors/search, /vectors/rag
- Gateway wired with vectord service
- Tested: 200 candidate resumes indexed in 5.4s, semantic search + RAG working
- 20 unit tests passing (chunker, search, ingestd, shared)
- AI gives honest "no match found" when context doesn't support an answer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:12:28 -05:00

15 lines
386 B
JSON

{
"id": "75bb6855-488b-4300-89c2-970871bd99cc",
"name": "email_log",
"schema_fingerprint": "auto",
"objects": [
{
"bucket": "data",
"key": "datasets/email_log.parquet",
"size_bytes": 1873775,
"created_at": "2026-03-27T13:11:42.757205427Z"
}
],
"created_at": "2026-03-27T13:11:42.757211105Z",
"updated_at": "2026-03-27T13:11:42.757211105Z"
}