lakehouse

Go to file

profit 4458c94f45 tests/real-world: enrich_prd_pipeline — architecture stress test

Real end-to-end test of the Lakehouse pipeline at scale. Runs the
PRD (63 KB, 901 lines → 93 chunks) through 6 iterations with cloud
inference, intentional failure injection, and tight context budget
to force every Phase 21 primitive to fire.

What the test exercises:
- Sidecar /embed for 93 chunks (nomic-embed-text)
- In-memory cosine retrieval for top-K per iteration
- Tree-split (shard → summarize → scratchpad → merge) when context
  chunks exceed the 4000-char budget
- Scratchpad truncation to keep compounding context bounded
- Cloud inference via /v1/chat provider=ollama_cloud (gpt-oss:120b)
- Injected primary-cloud failure on iter 3 (invalid model name) +
  rescue with gpt-oss:20b — proves catch-and-retry isn't dead code
- Playbook seeding per iteration (real HTTP against gateway)
- Prior-iteration answer injection for compounding (not just IDs —
  the first version passed IDs only and the model ignored them)

Live run results (tests/real-world/runs/moamj810/):
  6/6 iterations complete, 42 cloud calls total, 245s end-to-end
  tree-splits: 6/6 (every iter overflowed 4K budget)
  continuations: 0 (no responses hit max_tokens)
  rescues: 1 (iter 3 injected failure → gpt-oss:20b → valid answer)
  iter 6 answer explicitly cites [pb:pb-seed-82e1] — compounding real
  scratchpad truncation fired on iter 6 as designed

What this PROVES:
- Tree-split primitives work under real context pressure, not just
  in unit tests. The 4000-char budget forced every iteration to
  shard 12 chunks → 6 shards → scratchpad → final answer.
- Rescue on primary failure is wired and produces answers from a
  weaker model rather than erroring out.
- Compounding context injection works: iter 6's prompt had the 5
  prior answers in its citation block, and the cloud model
  acknowledged at least one via [pb:...] notation.
- The existence claims in Phase 21 (continuation + tree-split) are
  backed by executable evidence, not just unit tests.

What this DOESN'T prove (deliberate — scoped for follow-up):
- Continuation retries (no iter hit max_tokens in this run; would
  need a harder prompt or lower max_tokens to force)
- Real integration with /vectors/hybrid endpoint (test does in-memory
  cosine instead, bypassing gateway vector surface)
- Observer consumption of these runs (nothing posted to :3800 during
  the test — adding that is Phase A integration, handled separately)

Files:
  tests/real-world/enrich_prd_pipeline.ts (333 LOC)
  tests/real-world/runs/moamj810/{iter_1..6.json, summary.json}
    — artifacts from the stress run, committed for inspection

Follow-ups worth doing:
1. Lower max_tokens / harder prompt to force continuation path
2. Route retrieval through /vectors/hybrid for real Phase 19 boost
3. POST per-iteration summary to observer :3800 so runs accumulate
   like scenario runs do

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 17:33:24 -05:00

auditor

Auditor: poller + live end-to-end proof

2026-04-22 04:02:36 -05:00

bot

Control-plane pivot: Phase 38-44 plan + bot scaffold

2026-04-22 02:43:31 -05:00

config

qwen3.5 executor + continuation primitive + think:false

2026-04-20 20:19:02 -05:00

crates

Phase 45 slice 3: doc_drift check + resolve endpoints