lakehouse

profit/lakehouse

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	1565f536eb	Fix: job tracker field name mismatch — the overnight killer ROOT CAUSE: Python scripts polled status.get("processed", 0) but the Rust Job struct serialized as "embedded_chunks". Scripts always saw 0, looped forever printing "unknown: 0/50000" for 8+ hours. Fix (both sides): - Rust: added "processed" alias field + "total" field to Job struct, kept in sync on every update_progress() and complete() call - Python: fixed autonomous_agent.py and overnight_proof.sh to read "embedded_chunks" as primary key The actual embedding pipeline was working the whole time — 673K real chunks embedded overnight. Only the monitoring was blind. One-word bug, 8 hours of zombie output. This is why you test the monitoring, not just the pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 10:41:32 -05:00
root	13660a017e	Autonomous stress-test agent — recursive playbooks, hot-swap, error pipeline Python agent that exercises the full Lakehouse substrate as a real consumer would: ingests 10 Postgres tables (1,356 rows), embeds 5,415 chunks into 2 vector indexes, creates hot-swap profiles (Parquet+HNSW with qwen2.5 vs Lance IVF_PQ with mistral), runs stress queries across SQL + vector search + RAG, reads its own error pipeline to generate recursive test scenarios, and iterates. 50/50 tests pass across 2 iterations with zero errors. Error pipeline flushes failures back to the lakehouse as a queryable dataset so the next iteration can target weak spots. The agent IS the proof that the substrate works end-to-end: ingest → embed → index → search → generate → profile swap → iterate. Every capability we built today gets exercised in one script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 22:00:13 -05:00

Author

SHA1

Message

Date

root

1565f536eb

Fix: job tracker field name mismatch — the overnight killer

ROOT CAUSE: Python scripts polled status.get("processed", 0) but the
Rust Job struct serialized as "embedded_chunks". Scripts always saw 0,
looped forever printing "unknown: 0/50000" for 8+ hours.

Fix (both sides):
- Rust: added "processed" alias field + "total" field to Job struct,
  kept in sync on every update_progress() and complete() call
- Python: fixed autonomous_agent.py and overnight_proof.sh to read
  "embedded_chunks" as primary key

The actual embedding pipeline was working the whole time — 673K real
chunks embedded overnight. Only the monitoring was blind.

One-word bug, 8 hours of zombie output. This is why you test the
monitoring, not just the pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-17 10:41:32 -05:00

root

13660a017e

Autonomous stress-test agent — recursive playbooks, hot-swap, error pipeline

Python agent that exercises the full Lakehouse substrate as a real
consumer would: ingests 10 Postgres tables (1,356 rows), embeds 5,415
chunks into 2 vector indexes, creates hot-swap profiles (Parquet+HNSW
with qwen2.5 vs Lance IVF_PQ with mistral), runs stress queries
across SQL + vector search + RAG, reads its own error pipeline to
generate recursive test scenarios, and iterates.

50/50 tests pass across 2 iterations with zero errors. Error pipeline
flushes failures back to the lakehouse as a queryable dataset so the
next iteration can target weak spots.

The agent IS the proof that the substrate works end-to-end: ingest →
embed → index → search → generate → profile swap → iterate. Every
capability we built today gets exercised in one script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-16 22:00:13 -05:00

2 Commits