lakehouse/data/_catalog/manifests/4be87c74-10b4-463c-b69d-f20c9cd18ed7.json
root 26fc98c885 Phase 7: Vector index + RAG pipeline
- vectord crate: chunk → embed → store → search → RAG
- chunker: configurable chunk size + overlap, sentence-boundary aware splitting
- store: embeddings as Parquet (binary blob f32 vectors), portable format
- search: brute-force cosine similarity (works up to ~100K vectors)
- rag: full pipeline — embed question → search index → retrieve context → LLM answer
- Endpoints: POST /vectors/index, /vectors/search, /vectors/rag
- Gateway wired with vectord service
- Tested: 200 candidate resumes indexed in 5.4s, semantic search + RAG working
- 20 unit tests passing (chunker, search, ingestd, shared)
- AI gives honest "no match found" when context doesn't support an answer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:12:28 -05:00

15 lines
388 B
JSON

{
"id": "4be87c74-10b4-463c-b69d-f20c9cd18ed7",
"name": "candidates",
"schema_fingerprint": "auto",
"objects": [
{
"bucket": "data",
"key": "datasets/candidates.parquet",
"size_bytes": 2003395,
"created_at": "2026-03-27T13:11:41.341589905Z"
}
],
"created_at": "2026-03-27T13:11:41.341599187Z",
"updated_at": "2026-03-27T13:11:41.341599187Z"
}