lakehouse

profit/lakehouse

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	1565f536eb	Fix: job tracker field name mismatch — the overnight killer ROOT CAUSE: Python scripts polled status.get("processed", 0) but the Rust Job struct serialized as "embedded_chunks". Scripts always saw 0, looped forever printing "unknown: 0/50000" for 8+ hours. Fix (both sides): - Rust: added "processed" alias field + "total" field to Job struct, kept in sync on every update_progress() and complete() call - Python: fixed autonomous_agent.py and overnight_proof.sh to read "embedded_chunks" as primary key The actual embedding pipeline was working the whole time — 673K real chunks embedded overnight. Only the monitoring was blind. One-word bug, 8 hours of zombie output. This is why you test the monitoring, not just the pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 10:41:32 -05:00
root	2e455919b7	Overnight proof — 5-step unattended test with real embeddings Runs autonomously via cron (every 3 min, state machine): 1. Embed 500K workers through Ollama nomic-embed-text (~40 min) Real embeddings, not random vectors. This is what matters. 2. Build HNSW + Lance IVF_PQ on real clustered data 3. Measure recall — HNSW vs Lance on real embeddings 4. 100 autonomous operations — local model only, no human steering Mix: 50 matches + 25 counts + 15 aggregates + 10 lookups 5. 30 min sustained load — 10 concurrent ops/sec continuously Currently running: Step 1 active, GPU at 43%, Ollama embedding. Monitor: tail -f /home/profit/lakehouse/logs/overnight_proof.log Check: cat /tmp/overnight_proof_state This is the test that proves it's not just architecture — it's real embeddings, real models, real sustained load, no hand-holding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 01:22:07 -05:00

Author

SHA1

Message

Date

root

1565f536eb

Fix: job tracker field name mismatch — the overnight killer

ROOT CAUSE: Python scripts polled status.get("processed", 0) but the
Rust Job struct serialized as "embedded_chunks". Scripts always saw 0,
looped forever printing "unknown: 0/50000" for 8+ hours.

Fix (both sides):
- Rust: added "processed" alias field + "total" field to Job struct,
  kept in sync on every update_progress() and complete() call
- Python: fixed autonomous_agent.py and overnight_proof.sh to read
  "embedded_chunks" as primary key

The actual embedding pipeline was working the whole time — 673K real
chunks embedded overnight. Only the monitoring was blind.

One-word bug, 8 hours of zombie output. This is why you test the
monitoring, not just the pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-17 10:41:32 -05:00

root

2e455919b7

Overnight proof — 5-step unattended test with real embeddings

Runs autonomously via cron (every 3 min, state machine):
  1. Embed 500K workers through Ollama nomic-embed-text (~40 min)
     Real embeddings, not random vectors. This is what matters.
  2. Build HNSW + Lance IVF_PQ on real clustered data
  3. Measure recall — HNSW vs Lance on real embeddings
  4. 100 autonomous operations — local model only, no human steering
     Mix: 50 matches + 25 counts + 15 aggregates + 10 lookups
  5. 30 min sustained load — 10 concurrent ops/sec continuously

Currently running: Step 1 active, GPU at 43%, Ollama embedding.
Monitor: tail -f /home/profit/lakehouse/logs/overnight_proof.log
Check: cat /tmp/overnight_proof_state

This is the test that proves it's not just architecture — it's
real embeddings, real models, real sustained load, no hand-holding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-17 01:22:07 -05:00

2 Commits