profit
|
5b1fcf6d27
|
Phase 28-36 body of work
Accumulated since a6f12e2 (Phase 21 Rust port + Phase 27 versioning):
- Phase 36: embed_semaphore on VectorState (permits=1) serializes
seed embed calls — prevents sidecar socket collisions under
concurrent /seed stress load
- Phase 31+: run_stress.ts 6-task diverse stress scaffolding;
run_e2e_rated.ts + orchestrator.ts tightening
- Catalog dedupe cleanup: 16 duplicate manifests removed; canonical
candidates.parquet (10.5MB -> 76KB) + placements.parquet (1.2MB ->
11KB) regenerated post-dedupe; fresh manifests for active datasets
- vectord: harness EvalSet refinements (+181), agent portfolio
rotation + ingest triggers (+158), autotune + rag adjustments
- catalogd/storaged/ingestd/mcp-server: misc tightening
- docs: Phase 28-36 PRD entries + DECISIONS ADR additions;
control-plane pivot banner added to top of docs/PRD.md (pointing
at docs/CONTROL_PLANE_PRD.md which lands in next commit)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-22 02:41:15 -05:00 |
|
root
|
26fc98c885
|
Phase 7: Vector index + RAG pipeline
- vectord crate: chunk → embed → store → search → RAG
- chunker: configurable chunk size + overlap, sentence-boundary aware splitting
- store: embeddings as Parquet (binary blob f32 vectors), portable format
- search: brute-force cosine similarity (works up to ~100K vectors)
- rag: full pipeline — embed question → search index → retrieve context → LLM answer
- Endpoints: POST /vectors/index, /vectors/search, /vectors/rag
- Gateway wired with vectord service
- Tested: 200 candidate resumes indexed in 5.4s, semantic search + RAG working
- 20 unit tests passing (chunker, search, ingestd, shared)
- AI gives honest "no match found" when context doesn't support an answer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-27 08:12:28 -05:00 |
|