root 3b695cd592 Dual-pipeline supervisor for embedding ingestion
- 4 parallel pipelines (tuned for i9 + A4000)
- Range-based work splitting (2500 chunks per range)
- Round-robin retry on failure (3 attempts before dead-letter)
- Checkpointing to disk every 1000 chunks (crash recovery)
- On restart, loads checkpoint and skips completed ranges
- Dead-letter queue for permanently failed ranges
- Vectors assembled in order after all pipelines finish
- Batch size 64 for GPU throughput

Architecture:
  Supervisor → splits 100K chunks into 40 ranges
    ├── Pipeline 0: grabs range, embeds, reports progress
    ├── Pipeline 1: grabs range, embeds, reports progress
    ├── Pipeline 2: grabs range, embeds, reports progress
    └── Pipeline 3: grabs range, embeds, reports progress
    Failed range → back to queue → next available pipeline retries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 09:06:28 -05:00
..