Dual-pipeline supervisor for embedding ingestion
- 4 parallel pipelines (tuned for i9 + A4000)
- Range-based work splitting (2500 chunks per range)
- Round-robin retry on failure (3 attempts before dead-letter)
- Checkpointing to disk every 1000 chunks (crash recovery)
- On restart, loads checkpoint and skips completed ranges
- Dead-letter queue for permanently failed ranges
- Vectors assembled in order after all pipelines finish
- Batch size 64 for GPU throughput
Architecture:
Supervisor → splits 100K chunks into 40 ranges
├── Pipeline 0: grabs range, embeds, reports progress
├── Pipeline 1: grabs range, embeds, reports progress
├── Pipeline 2: grabs range, embeds, reports progress
└── Pipeline 3: grabs range, embeds, reports progress
Failed range → back to queue → next available pipeline retries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>