chore: add real content that was sitting untracked

Surfaced by today's untracked-files audit. None of these are accidents —
multiple are referenced by name in CLAUDE.md and memory files but were
never added.

Categories:
- docs/PHASE_AUDIT_GUIDE.md (106 LOC) — Claude Code phase audit guidance
- ops/systemd/lakehouse-langfuse-bridge.service — Langfuse bridge unit
- package.json — top-level npm manifest
- scripts/e2e_pipeline_check.sh + production_smoke.sh — real test scripts
- reports/kimi/audit-last-week*.md — the "Two reports live" CLAUDE.md cites
- tests/multi-agent/scenarios/ — 44 staffing scenarios (cutover decision A)
- tests/multi-agent/playbooks/ — 102 playbook records
- tests/battery/, tests/agent_test/PRD.md, tests/real-world/* — real tests
- sidecar/sidecar/{lab_ui,pipeline_lab}.py — 888 LOC dev-only UIs that
  remain in service post-sidecar-drop (commit ba928b1 explicitly kept them)

Sensitivity check: scenarios use synthetic company names ("Heritage Foods",
"Cornerstone Fabrication"); audit reports describe code findings only;
no PII or secrets surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-02 22:22:10 -05:00
parent 6e34ef7baf
commit 41b0a99ed2
788 changed files with 107142 additions and 0 deletions

107
docs/PHASE_AUDIT_GUIDE.md Normal file
View File

@ -0,0 +1,107 @@
# Phase Audit Guidance for Claude Code
## Purpose
This document provides the proper workflow for auditing completed phases in the Lakehouse project.
## ⚠️ Important: Do NOT Skip Steps
Each phase requires BOTH:
1. PRD spec verification (check code exists)
2. Full SCRUM execution (6 commands)
## Proper Phase Audit Workflow
### Step 1: Read PRD Specification
For each phase, read the PRD to understand what's supposed to ship:
```bash
# Read from docs/PRD.md or docs/PHASES.md
cat docs/PHASES.md | grep -A20 "Phase N:"
```
### Step 2: Verify Code Exists
Check that each deliverable from the PRD spec has corresponding code:
```bash
# Example - check for specific implementations
grep -r "function_name" crates/*/src/
ls crates/*/src/*.rs
```
### Step 3: Run Full SCRUM (6 Commands)
In order, execute ALL of these for the phase's crates:
```bash
# 1. Build
cargo build -p <crate-name>
# 2. Test
cargo test -p <crate-name>
# 3. Clippy (if installed)
cargo clippy -p <crate-name> -- -D warnings
# 4. Format check
cargo fmt -p <crate-name> -- --check
# 5. Cargo check
cargo check -p <crate-name>
# 6. Doc check
cargo doc -p <crate-name> --no-deps
```
### Step 4: Fix Issues
If any SCRUM command fails:
- Fix the code
- Re-run the failing command
- Re-run ALL 6 commands to verify
### Step 5: Update Phase Documentation
Only mark as ✅ after ALL 6 SCRUM commands pass:
```markdown
## Phase N: [Name] ✅
- [x] spec item 1
- [x] spec item 2
- SCRUM: build ✅ test ✅ clippy ✅ fmt ✅ check ✅ doc ✅
```
## Current Phase Status
| Phase | Status | Notes |
|-------|--------|-------|
| 0 | ✅ | Bootstrap complete |
| 1 | ✅ | Storage + Catalog |
| 2 | ✅ | Query Engine |
| 3 | ✅ | AI Integration |
| 4 | ✅ | Frontend |
| 5 | ✅ | Hardening |
| 6-42 | ✅ | See docs/PHASES.md |
## Notes from Previous Session
- Clippy and rustfmt are NOT installed on this system
- Run `rustup component add clippy rustfmt` to install
- Some crates have 0 unit tests (expected for service crates)
- 28 warnings remain in unused code paths (ui/vectord)
## Key Files
- `docs/PHASES.md` - Phase tracker with checkboxes
- `docs/PRD.md` - Full product requirements
- `docs/CONTROL_PLANE_PRD.md` - Phases 38+ specifications
- `crates/*/` - All crate implementations
## Quick Reference
```bash
# Full workspace SCRUM
cargo build --workspace
cargo test --workspace
# (clippy if installed)
cargo fmt -- --check
cargo check --workspace
cargo doc --no-deps
# Per-crate
cargo build -p <crate>
cargo test -p <crate>
cargo check -p <crate>
```

View File

@ -0,0 +1,28 @@
[Unit]
Description=Lakehouse Langfuse → observer bridge — forwards LLM trace metadata to :3800 so KB learns from cost/latency/provider deltas
Documentation=file:///home/profit/lakehouse/mcp-server/langfuse_bridge.ts
After=network.target
# No hard dependency on either Langfuse or observer — if either is down,
# the bridge retries on the next tick without crashing. That's the
# whole point of the cursor state file.
[Service]
Type=simple
WorkingDirectory=/home/profit/lakehouse
ExecStart=/home/profit/.bun/bin/bun run /home/profit/lakehouse/mcp-server/langfuse_bridge.ts
Restart=on-failure
RestartSec=30
# Credentials resolved from env. Matches how
# crates/gateway/src/v1/langfuse_trace.rs reads them so both producer
# (gateway emitter) and consumer (this bridge) share the same config.
EnvironmentFile=-/etc/lakehouse/langfuse.env
Environment=LANGFUSE_URL=http://localhost:3001
Environment=OBSERVER_URL=http://localhost:3800
Environment=LANGFUSE_POLL_MS=30000
Environment=LANGFUSE_BATCH_LIMIT=50
Environment=LANGFUSE_STATE_FILE=/var/lib/lakehouse-guard/langfuse_last_seen.json
KillSignal=SIGTERM
TimeoutStopSec=5
[Install]
WantedBy=multi-user.target

5
package.json Normal file
View File

@ -0,0 +1,5 @@
{
"dependencies": {
"langfuse": "^3.38.20"
}
}

View File

@ -0,0 +1,45 @@
# Kimi Forensic Audit (FULL FILES) — distillation v1.0.0
**Generated:** 2026-04-27 by `kimi-for-coding` via gateway /v1/chat
**Latency:** 270.6s | **finish:** stop | **usage:** {'prompt_tokens': 66338, 'completion_tokens': 10159, 'total_tokens': 76497}
**Input:** /tmp/kimi-audit-full.md (238KB · 12 commits · 15 files · line-numbered, no truncation)
---
## Verdict
**Hold**: the substrates TypeScript pipeline is architecturally coherent and the SFT firewall is genuine, but committed Rust tests fail to compile, drift detection hardcodes an unverified integrity assertion, and deterministic guarantees leak wall-clock time in multiple places.
## What's solid
- **Three-layer SFT contamination firewall is real.** Schema enum restricts `quality_score` to `["accepted", "partially_accepted"]` (`sft_sample.ts:13,62`), exporter constant `SFT_NEVER` blocks rejected/needs_human_review before synthesis (`export_sft.ts:51,205`), and `receipts.ts` re-reads the output to fail loud if any forbidden score leaked (`receipts.ts:231-236`).
- **Core scorer is pure and deterministic.** `scoreRecord` takes an `EvidenceRecord`, performs no I/O, no LLM calls, and uses no mutable state (`scorer.ts:1-5,257-273`).
- **Quarantine is exhaustive and observable.** Every exporter routes skips to structured `exports/quarantine/<exporter>.jsonl` with typed reasons; silent drops are impossible by construction (`quarantine.ts:1-6,14-26`).
- **Evidence provenance is mandatory on every row.** Every `EvidenceRecord` carries `source_file`, `line_offset`, `sig_hash`, and `recorded_at` (`build_evidence_index.ts:27-34`).
- **Local-first replay reduces cloud calls.** `replay.ts` defaults to a local model, augments via RAG retrieval, and only escalates on validation failure, directly supporting the cloud-call reduction claim (`replay.ts:24,349-376`).
## What's risky
1. **receipts.ts:495** hardcodes `input_hash_match: true` in drift reports while comments on lines 467-469 admit input-hash comparison is unimplemented; this is false telemetry in a forensic system.
2. **score_runs.ts:159** deduplicates scored runs by `scored.provenance.sig_hash` (the *evidence* hash), not by a composite of evidence + scorer version, so scorer logic or `SCORER_VERSION` updates are silently ignored on re-runs against existing partition files.
3. **transforms.ts:181** `auto_apply` transform falls back to `new Date().toISOString()` when `row.ts` is missing, injecting wall-clock time into the supposedly deterministic materialization layer.
4. **mode.rs:1035,1042** Rust test code assigns `Some("...".into())` and `None` to a `Vec<String>` field (`matrix_corpus`), which would fail `cargo test` compilation; this contradicts the claim that the tag is fully tested.
5. **export_sft.ts:109-133** synthesizes fake instruction templates per source stem instead of using actual historical prompts; the SFT firewall prevents category contamination but not prompt-fidelity distortion.
## Specific findings
- **mode.rs:1035** — Compile error in test helper: `matrix_corpus: Some("distilled_procedural_v1".into())` mismatches the `Vec<String>` type declared at line 172. **Rationale:** Direct struct construction in the test module uses an `Option` where a `Vec` is required, so the Rust test suite cannot compile.
- **receipts.ts:495** — Drift detection hardcodes `input_hash_match: true`. **Rationale:** The adjacent comment admits input-hash comparison is simplified and unimplemented (lines 467-469); asserting a verified match is misleading telemetry that will hide real input-side regressions.
- **score_runs.ts:159** — Scored-run dedup ignores scorer version. **Rationale:** `loadSeenHashes` and the skip logic key only on the EvidenceRecord `sig_hash`, meaning an existing scored-run file from yesterday will block updated scores even if `SCORER_VERSION` or scorer logic changed today.
- **transforms.ts:181** — Non-deterministic timestamp fallback in `auto_apply` transform. **Rationale:** `row.ts ?? new Date().toISOString()` injects wall-clock time when the source row lacks a timestamp, violating the header claim that transforms are “deterministic by construction” and breaking bit-identical reproducibility for that stream.
- **export_sft.ts:126** — Unsafe property access via `as any`. **Rationale:** `(ev as any).contractor` bypasses the `EvidenceRecord` type contract; if the property is absent the template silently emits `"<contractor>"`, degrading SFT data quality without a type error.
- **scorer.ts:30** — Environmental dependency in deterministic scorer. **Rationale:** `process.env.LH_SCORER_VERSION` means identical evidence inputs produce different `scorer_version` stamps (and different downstream receipts) depending on the runtime environment, undermining bit-identical claims.
- **replay.ts:378** — Non-deterministic run identifier. **Rationale:** `` `replay:${task_hash.slice(0, 16)}:${Date.now()}` `` makes replay evidence rows non-reproducible and risks collision under rapid successive calls.
- **export_sft.ts:109-133** — Synthetic instruction generation replaces ground-truth prompts. **Rationale:** The exporter fabricates instruction strings from metadata (e.g., hardcoded scrum review phrasing) rather than retrieving the actual historical prompt, so the resulting SFT dataset trains on reconstructed, not authentic, user instructions.
## Direction recommendation
**Pause the staffing audit and harden the substrate first.** Before building the staffing inference mode (`staffing_inference_lakehouse` in `mode.rs:54`) on top of this substrate:
1. Fix the Rust test compile errors (`mode.rs:1035,1042`) and ensure `cargo test` runs in CI.
2. Replace the hardcoded `input_hash_match: true` in drift detection (`receipts.ts:495`) with a real hash comparison or remove the field until it is implemented.
3. Change scored-run dedup (`score_runs.ts:159`) to key on a composite hash of `evidence_sig_hash + scorer_version + SCORER_VERSION` so scorer updates force re-scoring.
4. Remove the `new Date().toISOString()` fallback in `transforms.ts:181` or fail the row so determinism is preserved.
5. Audit all `as any` casts in the export layer (`export_sft.ts:126`) for type-safe alternatives.
Once those fixes land and acceptance re-runs pass, proceed to the staffing audit wave; the architecture is sound enough to support it, but the forensic guarantees must be honest before downstream teams depend on them.

View File

@ -0,0 +1,36 @@
# Kimi Forensic Audit — distillation v1.0.0 (last week)
**Generated:** 2026-04-27 by `kimi-for-coding` via gateway /v1/chat
**Latency:** 157.6s | **finish:** stop | **usage:** {'prompt_tokens': 14014, 'completion_tokens': 6356, 'total_tokens': 20370}
**Input:** /tmp/kimi-audit-input.md (56k chars · 12 commits · 6 files)
---
## Verdict
**hold** — Runtime lock-in, integration mismatches, and truncated source files in the v1.0.0 payload make the tag unshippable without rework.
## What's solid
- `scorer.ts` is a pure, deterministic function with no I/O, no LLM calls, and an explicit version stamp (`scorer.ts:22`).
- SFT export enforces defense-in-depth contamination firewalls via `SFT_NEVER` and schema validators (`export_sft.ts:49-50`; `sft_sample.ts:43-48`).
- Evidence materialization is idempotent across reruns using `sig_hash` deduplication (`build_evidence_index.ts:114-126`).
- Mode router falls back to a safe built-in default if config parsing fails (`mode.rs:208-228`).
- Quarantine writer abstraction isolates bad records instead of failing the export (`export_sft.ts`).
## What's risky
- **Integration mismatch**: `replay.ts` posts to `/v1/chat`, but the provided gateway only declares `/v1/mode` and `/v1/mode/execute` (`replay.ts:186` vs `mode.rs:13-18`), suggesting an undocumented or broken proxy contract.
- **Bun runtime lock-in**: Multiple files depend on `Bun.CryptoHasher`, which throws in Node.js (`export_sft.ts:235`; `build_evidence_index.ts:89`).
- **Unauditable files in scope**: Critical files listed in the diff—`transforms.ts`, `receipts.ts`, `quarantine.ts`, `score_runs.ts`—were not provided, so their logic is unseen.
- **Every shown implementation file is truncated**: `scorer.ts`, `export_sft.ts`, `build_evidence_index.ts`, `replay.ts`, and `mode.rs` all end mid-block, hiding error handling, receipt finalization, and gateway dispatch code.
- **Type safety escape**: `(ev as any).contractor` in SFT synthesis bypasses the schema layer (`export_sft.ts:138`).
## Specific findings
1. `scripts/distillation/scorer.ts:22``SCORER_VERSION` reads from `process.env`, introducing environment-dependent output drift that contradicts the files “identical input → identical output forever” contract.
2. `scripts/distillation/export_sft.ts:138``(ev as any).contractor` is an unguarded `any` cast; a malformed `EvidenceRecord` will inject the string `"undefined"` or crash at runtime inside the SFT instruction template.
3. `scripts/distillation/export_sft.ts:235``new Bun.CryptoHasher("sha256")` is a Bun-only API; this path will fail under Node.js/Deno and makes the substrate non-portable.
4. `scripts/distillation/build_evidence_index.ts:89` — Same Bun crypto lock-in in `sha256OfFile`, fragmenting the hashing implementation (here `Bun.CryptoHasher`, elsewhere `canonicalSha256`).
5. `scripts/distillation/replay.ts:178` — Provider routing relies on fragile string heuristics (`model.includes("/")`, prefix lists); models with unexpected names will route to the wrong backend or hit the `ollama` default incorrectly.
6. `scripts/distillation/replay.ts:186``fetch(`${gatewayUrl()}/v1/chat`` targets an endpoint absent from the provided `mode.rs` router; without the missing gateway dispatch code, this call will 404.
7. `crates/gateway/src/v1/mode.rs:141``deserialize_string_or_vec` uses `serde_json::Value::deserialize` against a TOML source, which is non-idiomatic and risks mis-handling TOML-specific types (datetime, inline tables) compared to a native `toml::Value`.
8. `scripts/distillation/build_evidence_index.ts:185``await canonicalSha256(row)` is async, yet `sha256OfFile` is sync; the mixing of sync/async crypto calls in the same module hints at inconsistent I/O boundaries.
## Direction recommendation
Keep the substrate architecture, but **do not expand staffing audit work on top of v1.0.0 until three blockers are fixed**: (1) replace `Bun.CryptoHasher` with portable WebCrypto or Node `crypto` so the build is runtime-agnostic; (2) align `replay.ts` to the actual gateway contract (`/v1/mode/execute`) or document the `/v1/chat` proxy route; and (3) eliminate `any` casts in the export path. The schema firewalls, deterministic scorer, and receipt provenance are the right foundation—rework the runtime/contract gaps rather than rebuilding from scratch.

536
scripts/e2e_pipeline_check.sh Executable file
View File

@ -0,0 +1,536 @@
#!/usr/bin/env bash
# ------------------------------------------------------------
# End-to-end pipeline verification for Lakehouse.
#
# Generates realistic staffing-style data, runs it through every
# shipped pipeline stage, asserts correctness at each step, and
# cleans up after itself.
#
# Stages exercised:
# 0. Preflight — gateway + sidecar reachability
# 1. Data generation — 1000 candidates, 200 placements, 10 resumes
# 2. CSV ingest — Phase 6.1 (via ?name= query param)
# 3. NDJSON ingest — Phase 6.2
# 4. SQL queries + joins — Phase 2, Phase 8 hot cache
# 5. Content-hash re-ingest dedup — Phase 6.4
# 6. Idempotent register — ADR-020 (same-fingerprint path)
# 7. Schema-drift rejection — ADR-020 (409 Conflict path)
# 8. Catalog dedupe no-op — ADR-020 (clean state)
# 9. Metadata enrichment — Phase 10 POST
# 10. PII auto-detection audit — Phase 10
# 11. Vector index + search — Phase 7 (documents pulled via SQL)
# 12. Cleanup + baseline verify — no-orphan guarantee
#
# Usage:
# ./scripts/e2e_pipeline_check.sh # run all stages
# SKIP_VECTOR=1 ./scripts/e2e_pipeline_check.sh # skip Ollama-bound steps
# KEEP_DATA=1 ./scripts/e2e_pipeline_check.sh # leave /tmp artifacts
#
# Exit codes:
# 0 all assertions passed
# 1 one or more assertions failed
# 2 preflight failed (service unreachable)
# ------------------------------------------------------------
set -u
set -o pipefail
GATEWAY="${GATEWAY:-http://localhost:3100}"
SIDECAR="${SIDECAR:-http://localhost:3200}"
WORKDIR="${WORKDIR:-/tmp/lakehouse_e2e}"
DATA_ROOT="${DATA_ROOT:-/home/profit/lakehouse/data}"
SKIP_VECTOR="${SKIP_VECTOR:-0}"
KEEP_DATA="${KEEP_DATA:-0}"
RUN_ID="e2e_$(date +%s)"
CAND_DS="${RUN_ID}_candidates"
PLACE_DS="${RUN_ID}_placements"
RESUME_DS="${RUN_ID}_resumes"
VEC_IDX="${RESUME_DS}_v1"
# Color names use a CC_ prefix so they can't be shadowed by single-letter
# local variables like `R` that hold curl responses elsewhere in the script.
if [[ -t 1 ]]; then
CC_GRN=$'\033[0;32m'; CC_RED=$'\033[0;31m'; CC_YLW=$'\033[1;33m'
CC_BLU=$'\033[1;34m'; CC_DIM=$'\033[2m'; CC_RST=$'\033[0m'
else
CC_GRN=''; CC_RED=''; CC_YLW=''; CC_BLU=''; CC_DIM=''; CC_RST=''
fi
PASS=0; FAIL=0; WARN=0; STARTED_AT=$(date +%s)
FAILURES=()
pass() { printf ' %s✓%s %s\n' "$CC_GRN" "$CC_RST" "$1"; PASS=$((PASS+1)); }
fail() { printf ' %s✗%s %s\n' "$CC_RED" "$CC_RST" "$1"; FAIL=$((FAIL+1)); FAILURES+=("$1"); }
warn() { printf ' %s!%s %s\n' "$CC_YLW" "$CC_RST" "$1"; WARN=$((WARN+1)); }
step() { printf '\n%s== %s ==%s\n' "$CC_BLU" "$1" "$CC_RST"; }
info() { printf ' %s%s%s\n' "$CC_DIM" "$1" "$CC_RST"; }
die() { printf '%sFATAL: %s%s\n' "$CC_RED" "$1" "$CC_RST" >&2; cleanup; exit 2; }
assert_eq() {
if [[ "$1" == "$2" ]]; then pass "$3 ($1)"; else fail "$3: got '$1', expected '$2'"; fi
}
http_code() {
local method="$1" path="$2" data="${3:-}"
if [[ -n "$data" ]]; then
curl -s -o /dev/null -w '%{http_code}' -X "$method" "$GATEWAY$path" \
-H 'Content-Type: application/json' -d "$data"
else
curl -s -o /dev/null -w '%{http_code}' -X "$method" "$GATEWAY$path"
fi
}
# query_scalar <sql> -> first column of first row as string, sentinel on empty/error
query_scalar() {
local sql="$1"
local payload
payload=$(python3 -c 'import json,sys; print(json.dumps({"sql": sys.argv[1]}))' "$sql")
curl -s -X POST "$GATEWAY/query/sql" \
-H 'Content-Type: application/json' \
-d "$payload" \
| python3 -c '
import sys, json
try:
r = json.load(sys.stdin)
except Exception:
print("__PARSE_ERROR__"); sys.exit(0)
if isinstance(r, dict) and "error" in r:
sys.stderr.write("query error: " + str(r["error"]) + "\n")
print("__ERROR__"); sys.exit(0)
rows = r.get("rows") if isinstance(r, dict) else None
if not rows:
print("__NO_ROWS__"); sys.exit(0)
row = rows[0]
print(next(iter(row.values())))
'
}
cleanup() {
[[ "$KEEP_DATA" == "1" ]] && { info "KEEP_DATA=1 — leaving $WORKDIR"; return; }
info "cleaning up test datasets for $RUN_ID"
# Catch any previous-run zombies too: any catalog entry whose name
# starts with "e2e_" is definitionally ours. Using DELETE (added for
# this script's needs) purges both the live registry and the manifest
# file atomically, so the next run doesn't trip on zombie entries
# pointing at parquets we've already rm'd.
local names
names=$(curl -s "$GATEWAY/catalog/datasets" 2>/dev/null \
| python3 -c "
import sys, json
try: ds = json.load(sys.stdin)
except Exception: sys.exit(0)
for d in ds:
if d['name'].startswith('e2e_'):
print(d['name'])
" 2>/dev/null || true)
local removed=0
for n in $names; do
curl -s -o /dev/null -X DELETE "$GATEWAY/catalog/datasets/by-name/$n" && removed=$((removed+1))
done
# Delete any stray parquet + vector artifacts we can positively
# attribute to an e2e_ prefix.
rm -f "$DATA_ROOT/datasets/"e2e_*.parquet 2>/dev/null || true
rm -f "$DATA_ROOT/vectors/"e2e_*.parquet 2>/dev/null || true
rm -rf "$WORKDIR" 2>/dev/null || true
info "deleted $removed e2e datasets (covers this run + any prior zombies)"
}
trap cleanup EXIT
# ============================================================
# 0. Preflight
# ============================================================
step "0. Preflight"
curl -sf -m 3 "$GATEWAY/health" >/dev/null 2>&1 || die "gateway not reachable at $GATEWAY"
pass "gateway /health (200)"
SIDECAR_UP=0
if curl -sf -m 3 "$SIDECAR/health" >/dev/null 2>&1; then
SIDECAR_UP=1; pass "sidecar /health (200)"
else
warn "sidecar unreachable — vector stage will be skipped"
SKIP_VECTOR=1
fi
# Purge any e2e_* zombies from prior runs (stale registry entries that
# would otherwise break DataFusion schema inference for every query).
ZOMBIES=$(curl -s "$GATEWAY/catalog/datasets" 2>/dev/null \
| python3 -c "
import sys, json
try: ds = json.load(sys.stdin)
except Exception: sys.exit(0)
for d in ds:
if d['name'].startswith('e2e_'):
print(d['name'])
" 2>/dev/null || true)
if [[ -n "$ZOMBIES" ]]; then
ZCOUNT=$(echo "$ZOMBIES" | wc -l | tr -d ' ')
for n in $ZOMBIES; do
curl -s -o /dev/null -X DELETE "$GATEWAY/catalog/datasets/by-name/$n"
done
info "pre-cleaned $ZCOUNT e2e_ zombies from prior runs"
fi
BASELINE=$(curl -s "$GATEWAY/catalog/datasets" | python3 -c 'import sys,json; print(len(json.load(sys.stdin)))')
info "baseline dataset count: $BASELINE"
# ============================================================
# 1. Generate realistic data
# ============================================================
step "1. Generate realistic staffing data"
mkdir -p "$WORKDIR"
# Seed with RUN_ID (which embeds the wall-clock timestamp) so each run
# produces different content. Otherwise the content-hash dedup from
# Phase 6.4 keys off a stale hash that lingers in the live registry
# until the next gateway restart, and subsequent runs silently dedupe.
python3 - "$WORKDIR" "$RUN_ID" <<'PYEOF'
import csv, json, random, sys, os
workdir, run_id = sys.argv[1], sys.argv[2]
# Mix RUN_ID into the seed so content differs per run, but keep it
# deterministic within a single run.
random.seed(hash(run_id) & 0x7FFFFFFF)
FIRST = ['Aisha','Brandon','Carlos','Daria','Eli','Fiona','Gabriel','Hana','Ian','Julia',
'Kofi','Lena','Mateo','Nadia','Oscar','Priya','Quinn','Raj','Sofia','Tomas',
'Uma','Victor','Wendy','Xander','Yuki','Zara']
LAST = ['Adams','Brown','Chen','Davis','Evans','Fisher','Garcia','Hughes','Ibrahim','Johnson',
'Kim','Lopez','Martinez','Nguyen','Ortiz','Patel','Rossi','Singh','Thomas','Umar',
'Vargas','Williams','Xu','Young','Zhang','OConnor']
PLACES = [('Chicago','IL'),('New York','NY'),('San Francisco','CA'),('Austin','TX'),
('Seattle','WA'),('Denver','CO'),('Boston','MA'),('Atlanta','GA'),
('Miami','FL'),('Phoenix','AZ')]
SKILL_GROUPS = [
['Python','AWS','Docker'],['Java','Spring','Kubernetes'],
['React','TypeScript','Node'],['Go','PostgreSQL','gRPC'],
['Rust','DataFusion','Parquet'],['C#','.NET','Azure'],
['Ruby','Rails','Redis'],['Scala','Spark','Kafka'],
['Swift','iOS','CoreData'],['Kotlin','Android','Jetpack'],
]
STATUSES = ['active','placed','inactive','blocked']
STATUS_WEIGHTS = [60, 25, 10, 5]
with open(os.path.join(workdir, 'candidates.csv'), 'w', newline='') as f:
w = csv.DictWriter(f, fieldnames=[
'candidate_id','first_name','last_name','email','phone',
'city','state','skills','years_experience','hourly_rate_usd','status'])
w.writeheader()
for i in range(1, 1001):
fn, ln = random.choice(FIRST), random.choice(LAST)
city, state = random.choice(PLACES)
w.writerow({
'candidate_id': f'CAND-{i:05d}',
'first_name': fn, 'last_name': ln,
'email': f'{fn.lower()}.{ln.lower()}{i}@example.com',
'phone': f'({random.randint(200,999)}) {random.randint(200,999)}-{random.randint(1000,9999)}',
'city': city, 'state': state,
'skills': ','.join(random.choice(SKILL_GROUPS)),
'years_experience': random.randint(0, 20),
'hourly_rate_usd': random.randint(35, 185),
'status': random.choices(STATUSES, weights=STATUS_WEIGHTS)[0],
})
CLIENTS = ['Acme Corp','Globex','Initech','Umbrella','Wayne Enterprises',
'Stark Industries','Tyrell','Cyberdyne','Massive Dynamic','Oscorp']
with open(os.path.join(workdir, 'placements.ndjson'), 'w') as f:
for i in range(1, 201):
f.write(json.dumps({
'placement_id': f'PLACE-{i:04d}',
'candidate_id': f'CAND-{random.randint(1,1000):05d}',
'client': random.choice(CLIENTS),
'start_date': f'2026-{random.randint(1,4):02d}-{random.randint(1,28):02d}',
'weekly_hours': random.choice([20,25,30,35,40]),
'bill_rate': random.randint(80, 250),
'placement_status': random.choice(['active','completed','terminated']),
}) + '\n')
RESUMES = [
'Senior Python engineer with 8 years of cloud infrastructure experience. Expert in AWS, Docker, and distributed systems design. Led migration of monolithic legacy system to microservices.',
'Full-stack React and TypeScript developer specializing in real-time dashboards. Built financial trading interfaces. GraphQL, WebSocket, performance optimization.',
'Data engineer with deep Apache Spark and Kafka expertise. Seven years on streaming analytics pipelines processing billions of events per day. Scala and Python.',
'Embedded systems engineer with C++ and Rust experience. Worked on automotive ADAS systems and industrial IoT devices. Low-level firmware, RTOS.',
'DevOps engineer with Kubernetes and Terraform expertise. Six years at hypergrowth startups. Prometheus, Grafana, and observability tooling.',
'Machine learning engineer specializing in NLP. Built production transformer-based systems. PyTorch, Hugging Face, fine-tuning large language models.',
'iOS developer with Swift and SwiftUI. Four years building consumer apps at mid-size tech companies. Offline-first architectures and CoreData.',
'Backend Go developer focused on high-throughput APIs. Built payment processing systems handling millions of transactions. PostgreSQL, gRPC, Redis.',
'Security engineer with penetration testing and threat modeling experience. OSCP certified. Web application security, AppSec code review, SAST and DAST tooling.',
'Site reliability engineer with Linux internals and performance tuning expertise. Ten years at large-scale infrastructure. Tracing, profiling, kernel-level debugging.',
]
with open(os.path.join(workdir, 'resumes.ndjson'), 'w') as f:
for i, r in enumerate(RESUMES, 1):
f.write(json.dumps({'doc_id': f'RES-{i:03d}', 'resume_text': r}) + '\n')
PYEOF
pass "candidates.csv (1000 rows, 11 cols)"
pass "placements.ndjson (200 rows, 7 cols)"
pass "resumes.ndjson (10 rows, 2 cols)"
# ============================================================
# 2. CSV ingest
# ============================================================
step "2. CSV ingest (Phase 6.1)"
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$CAND_DS" -F "file=@$WORKDIR/candidates.csv")
echo "$R" | python3 -c 'import sys,json; json.load(sys.stdin)' 2>/dev/null \
|| { fail "ingest response was not JSON: $(echo "$R" | head -c 200)"; R='{}'; }
ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
DEDUP=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("deduplicated","?"))' 2>/dev/null)
DS_NAME=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("dataset_name","?"))' 2>/dev/null)
assert_eq "$DS_NAME" "$CAND_DS" "ingest respected ?name= query param"
assert_eq "$ROWS" "1000" "ingest rows"
assert_eq "$DEDUP" "False" "first upload not deduplicated"
REG_ROWS=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS" \
| python3 -c 'import sys,json; print(json.load(sys.stdin).get("row_count","null"))')
assert_eq "$REG_ROWS" "1000" "manifest row_count reflects ingest"
# ============================================================
# 3. NDJSON ingest
# ============================================================
step "3. NDJSON ingest (Phase 6.2)"
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$PLACE_DS" -F "file=@$WORKDIR/placements.ndjson")
ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
assert_eq "$ROWS" "200" "placements NDJSON ingest rows"
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$RESUME_DS" -F "file=@$WORKDIR/resumes.ndjson")
ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
assert_eq "$ROWS" "10" "resumes NDJSON ingest rows"
# ============================================================
# 4. SQL queries + JOIN + cache
# ============================================================
step "4. SQL queries (Phase 2, Phase 8)"
N=$(query_scalar "SELECT COUNT(*) FROM $CAND_DS")
assert_eq "$N" "1000" "candidates COUNT(*)"
N=$(query_scalar "SELECT COUNT(*) FROM $CAND_DS WHERE status = 'active'")
if [[ "$N" =~ ^[0-9]+$ ]] && (( N > 400 && N < 700 )); then
pass "active candidates in plausible range ($N, expect ~600)"
else
fail "active candidates count out of range: $N"
fi
N=$(query_scalar "
SELECT COUNT(DISTINCT c.candidate_id)
FROM $CAND_DS c
JOIN $PLACE_DS p ON c.candidate_id = p.candidate_id
WHERE p.placement_status = 'active'
")
if [[ "$N" =~ ^[0-9]+$ ]] && (( N > 0 && N <= 200 )); then
pass "cross-dataset JOIN with filter returns $N rows"
else
fail "JOIN returned unexpected count: $N"
fi
AVG=$(query_scalar "SELECT AVG(hourly_rate_usd) FROM $CAND_DS")
if python3 -c "import sys; v=float('$AVG'); sys.exit(0 if 100 < v < 130 else 1)" 2>/dev/null; then
pass "average hourly rate in plausible range ($AVG, expect ~110)"
else
fail "average hourly rate out of range: $AVG"
fi
CODE=$(http_code POST "/query/cache/pin" "{\"dataset\":\"$CAND_DS\"}")
assert_eq "$CODE" "200" "cache pin HTTP"
# ============================================================
# 5. Content-hash re-ingest dedup (Phase 6.4)
# ============================================================
step "5. Content-hash re-ingest dedup"
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$CAND_DS" -F "file=@$WORKDIR/candidates.csv")
DEDUP=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("deduplicated","?"))' 2>/dev/null)
assert_eq "$DEDUP" "True" "re-upload same file is deduplicated"
# ============================================================
# 6. Idempotent register — same fingerprint (ADR-020)
# ============================================================
step "6. Idempotent register (ADR-020 same-fp path)"
DS=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS")
FP=$(echo "$DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["schema_fingerprint"])')
OBJS=$(echo "$DS" | python3 -c 'import sys,json,json as j; print(j.dumps(json.load(sys.stdin)["objects"]))')
ID_BEFORE=$(echo "$DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["id"])')
PAYLOAD=$(python3 -c "import json,sys; print(json.dumps({'name':sys.argv[1],'schema_fingerprint':sys.argv[2],'objects':json.loads(sys.argv[3])}))" "$CAND_DS" "$FP" "$OBJS")
CODE=$(http_code POST "/catalog/datasets" "$PAYLOAD")
assert_eq "$CODE" "201" "same-fp re-register returns 201"
ID_AFTER=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["id"])')
assert_eq "$ID_AFTER" "$ID_BEFORE" "same DatasetId after re-register"
COUNT=$(curl -s "$GATEWAY/catalog/datasets" | python3 -c "import sys,json; print(sum(1 for d in json.load(sys.stdin) if d['name']=='$CAND_DS'))")
assert_eq "$COUNT" "1" "no duplicate manifest created"
# ============================================================
# 7. Schema-drift rejection (409)
# ============================================================
step "7. Schema-drift rejection (ADR-020 409 path)"
PAYLOAD=$(python3 -c "import json,sys; print(json.dumps({'name':sys.argv[1],'schema_fingerprint':'deadbeefnotmatching','objects':json.loads(sys.argv[2])}))" "$CAND_DS" "$OBJS")
CODE=$(http_code POST "/catalog/datasets" "$PAYLOAD")
assert_eq "$CODE" "409" "different-fp rejected with 409"
# ============================================================
# 8. Dedupe no-op on clean catalog
# ============================================================
step "8. Dedupe no-op on clean state"
R=$(curl -s -X POST "$GATEWAY/catalog/dedupe")
GROUPS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin)["groups"])')
REMOVED=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin)["removed"])')
assert_eq "$GROUPS" "0" "dedupe groups (clean catalog)"
assert_eq "$REMOVED" "0" "dedupe removed count"
# ============================================================
# 9. Metadata enrichment (Phase 10)
# ============================================================
step "9. Metadata enrichment (Phase 10)"
CODE=$(http_code POST "/catalog/datasets/by-name/$CAND_DS/metadata" \
"{\"owner\":\"e2e-test\",\"description\":\"$RUN_ID synthetic candidates\",\"tags\":[\"test\",\"synthetic\"]}")
assert_eq "$CODE" "200" "POST metadata HTTP"
META=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS")
OWNER=$(echo "$META" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("owner",""))')
assert_eq "$OWNER" "e2e-test" "owner persisted"
# ============================================================
# 10. PII auto-detection (Phase 10)
# ============================================================
step "10. PII auto-detection (Phase 10)"
PII_COLS=$(echo "$META" | python3 -c '
import sys, json
m = json.load(sys.stdin)
pii = [c["name"] for c in m.get("columns",[]) if c.get("is_pii") or (isinstance(c.get("sensitivity"),str) and c["sensitivity"].lower()=="pii")]
print(" ".join(pii) if pii else "__NONE__")')
if [[ "$PII_COLS" == *"email"* ]] && [[ "$PII_COLS" == *"phone"* ]]; then
pass "email and phone flagged as PII ($PII_COLS)"
elif [[ "$PII_COLS" == "__NONE__" ]]; then
warn "no PII flagged — auto-detection may not run on this path"
else
warn "partial PII detection: $PII_COLS"
fi
# ============================================================
# 11. Vector index + semantic search (Phase 7)
# ============================================================
step "11. Vector index + semantic search (Phase 7)"
if [[ "$SKIP_VECTOR" == "1" ]]; then
warn "SKIP_VECTOR=1 — skipping vector pipeline"
else
# Pull documents out of the ingested resumes dataset via SQL,
# then feed to the inline /vectors/index body. This exercises
# the query→embed integration rather than pre-canned input.
DOCS=$(curl -s -X POST "$GATEWAY/query/sql" \
-H 'Content-Type: application/json' \
-d "$(python3 -c "import json; print(json.dumps({'sql': 'SELECT doc_id, resume_text FROM $RESUME_DS'}))")" \
| python3 -c '
import sys, json
r = json.load(sys.stdin)
docs = [{"id": row["doc_id"], "text": row["resume_text"]} for row in r.get("rows", [])]
print(json.dumps(docs))')
DOC_COUNT=$(echo "$DOCS" | python3 -c 'import sys,json; print(len(json.load(sys.stdin)))')
assert_eq "$DOC_COUNT" "10" "pulled docs via SQL for embedding"
PAYLOAD=$(python3 -c "
import json, sys
print(json.dumps({
'index_name': sys.argv[1],
'source': sys.argv[2],
'documents': json.loads(sys.argv[3]),
'chunk_size': 500,
'overlap': 50,
}))" "$VEC_IDX" "$RESUME_DS" "$DOCS")
R=$(curl -s -X POST "$GATEWAY/vectors/index" -H 'Content-Type: application/json' -d "$PAYLOAD")
JOB_ID=$(echo "$R" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d.get("job_id","__NONE__"))' 2>/dev/null)
if [[ "$JOB_ID" == "__NONE__" || -z "$JOB_ID" ]]; then
fail "vector index job rejected: $(echo "$R" | head -c 200)"
else
pass "embedding job accepted (job=$JOB_ID)"
# Poll up to 90s for 10 short resumes; Ollama cold-start can be slow.
JOB_STATUS="unknown"
for _ in $(seq 1 45); do
JOB_STATUS=$(curl -s "$GATEWAY/vectors/jobs/$JOB_ID" 2>/dev/null \
| python3 -c '
import sys, json
try: print(json.load(sys.stdin).get("status","?"))
except Exception: print("?")' 2>/dev/null)
[[ "$JOB_STATUS" == "completed" || "$JOB_STATUS" == "Completed" ]] && break
[[ "$JOB_STATUS" == "failed" || "$JOB_STATUS" == "Failed" ]] && break
sleep 2
done
case "$JOB_STATUS" in
completed|Completed)
pass "embedding job completed"
R=$(curl -s -X POST "$GATEWAY/vectors/search" \
-H 'Content-Type: application/json' \
-d "{\"index_name\":\"$VEC_IDX\",\"query\":\"fine-tuning large language models\",\"k\":3}")
TOP_DOC=$(echo "$R" | python3 -c '
import sys, json
r = json.load(sys.stdin)
if r.get("results"): print(r["results"][0].get("doc_id","?"))
else: print("__NONE__")' 2>/dev/null)
if [[ "$TOP_DOC" == "RES-006" ]]; then
pass "top match is ML/NLP resume (semantically correct)"
elif [[ "$TOP_DOC" == "__NONE__" ]]; then
fail "search returned no results"
else
warn "top match is $TOP_DOC (expected RES-006 — ranking may vary)"
fi ;;
*)
fail "embedding job did not complete (status=$JOB_STATUS)" ;;
esac
fi
fi
# ============================================================
# 12. Cleanup + baseline verify
# ============================================================
step "12. Cleanup + baseline verify"
cleanup
trap - EXIT
ON_DISK=$(ls "$DATA_ROOT/_catalog/manifests"/*.json 2>/dev/null | wc -l | tr -d ' ')
info "manifest files on disk now: $ON_DISK"
DISK_ORPHANS=0
if compgen -G "$DATA_ROOT/_catalog/manifests/*.json" > /dev/null; then
DISK_ORPHANS=$(grep -l "\"$RUN_ID" "$DATA_ROOT/_catalog/manifests"/*.json 2>/dev/null | wc -l | tr -d ' ')
fi
assert_eq "$DISK_ORPHANS" "0" "no orphan manifest files on disk for $RUN_ID"
LIVE_ORPHANS=$(curl -s "$GATEWAY/catalog/datasets" \
| python3 -c "import sys,json; print(sum(1 for d in json.load(sys.stdin) if d['name'].startswith('$RUN_ID')))")
if [[ "$LIVE_ORPHANS" != "0" ]]; then
warn "$LIVE_ORPHANS entries linger in live registry (clears on gateway restart; on-disk is ground truth)"
fi
# ============================================================
# Summary
# ============================================================
ELAPSED=$(( $(date +%s) - STARTED_AT ))
printf '\n%s─── Summary ───%s\n' "$CC_BLU" "$CC_RST"
printf ' run_id: %s\n' "$RUN_ID"
printf ' elapsed: %ss\n' "$ELAPSED"
printf ' passed: %s%d%s\n' "$CC_GRN" "$PASS" "$CC_RST"
printf ' failed: %s%d%s\n' "$CC_RED" "$FAIL" "$CC_RST"
printf ' warnings: %s%d%s\n' "$CC_YLW" "$WARN" "$CC_RST"
if (( FAIL > 0 )); then
printf '\n%sfailures:%s\n' "$CC_RED" "$CC_RST"
for f in "${FAILURES[@]}"; do printf ' - %s\n' "$f"; done
exit 1
fi
exit 0

157
scripts/production_smoke.sh Executable file
View File

@ -0,0 +1,157 @@
#!/usr/bin/env bash
# Production substrate smoke — single command that verifies every
# production-critical surface end-to-end. Exits non-zero on the first
# failure so an operator can run this before:
# - Swapping workers_500k.parquet → real Chicago contractor data
# - Spinning up the Asterisk voice agent against /v1/chat
# - Running staffing inference loops via /v1/iterate
# - Wiring the assistant against the gateway
#
# Usage:
# ./scripts/production_smoke.sh
#
# Tunable via env:
# GATEWAY=http://localhost:3100 # gateway base URL
# FAIL_FAST=1 # exit on first failure (default 1)
# VERBOSE=1 # print full responses on success too
set -e
GATEWAY="${GATEWAY:-http://localhost:3100}"
FAIL_FAST="${FAIL_FAST:-1}"
VERBOSE="${VERBOSE:-0}"
PASS=0
FAIL=0
FAILURES=()
check() {
local name="$1"
local expected_status="$2"
local cmd="$3"
echo -n " [$(($PASS + $FAIL + 1))] $name ... "
local resp
resp=$(eval "$cmd" 2>&1) || true
local status="${resp%%|||*}"
local body="${resp#*|||}"
if [ "$status" = "$expected_status" ]; then
PASS=$((PASS + 1))
echo "✓ ($status)"
if [ "$VERBOSE" = "1" ]; then echo " $body" | head -3 | sed 's/^/ /'; fi
else
FAIL=$((FAIL + 1))
FAILURES+=("$name: expected $expected_status, got $status")
echo "✗ (got $status, expected $expected_status)"
echo " $body" | head -3 | sed 's/^/ /'
[ "$FAIL_FAST" = "1" ] && { print_summary; exit 1; }
fi
}
curl_with_status() {
# Run curl, capture HTTP status + body, format as "status|||body"
local args=("$@")
curl -sS -w "\n%{http_code}" "${args[@]}" 2>&1 | awk '
{ lines[NR]=$0 }
END {
status=lines[NR]
body=""
for (i=1; i<NR; i++) body=body lines[i] (i<NR-1?"\n":"")
print status "|||" body
}
'
}
print_summary() {
echo ""
echo "═══════════════════════════════════════════════════════════════"
echo " $PASS passed · $FAIL failed"
if [ ${#FAILURES[@]} -gt 0 ]; then
echo " failures:"
for f in "${FAILURES[@]}"; do echo " - $f"; done
fi
echo "═══════════════════════════════════════════════════════════════"
}
echo "Production substrate smoke test against $GATEWAY"
echo ""
# ─── 1. Liveness ─────────────────────────────────────────────────────
echo "▶ Liveness"
check "gateway /health" "200" \
'curl_with_status -m 5 "$GATEWAY/health"'
# ─── 2. Operational health ──────────────────────────────────────────
echo "▶ Operational state"
HEALTH_RESP=$(curl -sS -m 10 "$GATEWAY/v1/health" 2>&1) || HEALTH_RESP="{}"
WORKERS_COUNT=$(echo "$HEALTH_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('workers_count',0))" 2>/dev/null || echo 0)
PROVIDERS_OK=$(echo "$HEALTH_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin).get('providers_configured',{}); print(sum(1 for v in d.values() if v))" 2>/dev/null || echo 0)
echo " workers_count: $WORKERS_COUNT"
echo " providers_configured (count): $PROVIDERS_OK"
if [ "$WORKERS_COUNT" -lt 1 ]; then
FAIL=$((FAIL + 1))
FAILURES+=("workers_count=0 — parquet load failed or empty")
echo " ✗ workers not loaded"
[ "$FAIL_FAST" = "1" ] && { print_summary; exit 1; }
else
PASS=$((PASS + 1))
echo " ✓ workers loaded"
fi
# ─── 3. Truth Layer ──────────────────────────────────────────────────
echo "▶ Truth Layer"
check "/v1/context returns rules" "200" \
'curl_with_status -m 10 "$GATEWAY/v1/context"'
# ─── 4. /v1/chat (provider=ollama) ──────────────────────────────────
echo "▶ /v1/chat (provider=ollama, fast model)"
check "/v1/chat ping" "200" \
'curl_with_status -m 60 -X POST "$GATEWAY/v1/chat" \
-H "content-type: application/json" \
-d "{\"provider\":\"ollama\",\"model\":\"qwen3.5:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"reply: PONG\"}],\"max_tokens\":30,\"temperature\":0,\"think\":false}"'
# ─── 5. /v1/validate (negative + positive) ──────────────────────────
echo "▶ /v1/validate"
check "phantom candidate_id → 422 Consistency" "422" \
'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
-H "content-type: application/json" \
-d "{\"kind\":\"fill\",\"artifact\":{\"fills\":[{\"candidate_id\":\"W-FAKE-0\",\"name\":\"Fake\"}]},\"context\":{\"target_count\":1}}"'
check "real worker (W-1) → 200 OK" "200" \
'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
-H "content-type: application/json" \
-d "{\"kind\":\"fill\",\"artifact\":{\"fills\":[{\"candidate_id\":\"W-1\",\"name\":\"Anyone\"}]},\"context\":{\"target_count\":1}}"'
check "SSN in body → 422 Policy" "422" \
'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
-H "content-type: application/json" \
-d "{\"kind\":\"email\",\"artifact\":{\"to\":\"a@b.com\",\"body\":\"Your SSN 123-45-6789 is on file.\"}}"'
# ─── 6. /v1/iterate (bounded retry loop) ───────────────────────────
# Phantom worker → expect 422 IterateFailure with history (not 200)
echo "▶ /v1/iterate (bounded retry)"
check "/v1/iterate phantom → bounded fail" "422" \
'curl_with_status -m 240 -X POST "$GATEWAY/v1/iterate" \
-H "content-type: application/json" \
-d "{\"kind\":\"fill\",\"provider\":\"ollama\",\"model\":\"qwen3.5:latest\",\"system\":\"Reply with ONLY: {\\\"fills\\\":[{\\\"candidate_id\\\":\\\"W-99999999\\\",\\\"name\\\":\\\"X\\\"}]}\",\"prompt\":\"emit it\",\"context\":{\"target_count\":1},\"max_iterations\":1,\"max_tokens\":200,\"temperature\":0}"'
# ─── 7. Doc-drift batch ─────────────────────────────────────────────
echo "▶ Doc-drift scan"
check "/vectors/playbook_memory/doc_drift/scan" "200" \
'curl_with_status -m 60 -X POST "$GATEWAY/vectors/playbook_memory/doc_drift/scan"'
# ─── 8. Usage tracking ──────────────────────────────────────────────
echo "▶ Usage tracking"
USAGE=$(curl -sS -m 10 "$GATEWAY/v1/usage" 2>&1)
USAGE_REQS=$(echo "$USAGE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('requests',0))" 2>/dev/null || echo 0)
echo " usage.requests: $USAGE_REQS (should be > 0 if /v1/chat fired)"
if [ "$USAGE_REQS" -ge 1 ]; then
PASS=$((PASS + 1))
echo " ✓ /v1/usage tracking"
else
FAIL=$((FAIL + 1))
FAILURES+=("/v1/usage didn't increment after /v1/chat call")
echo " ✗ /v1/usage didn't increment"
fi
print_summary
[ $FAIL -eq 0 ] && exit 0 || exit 1

385
sidecar/sidecar/lab_ui.py Normal file
View File

@ -0,0 +1,385 @@
"""Pipeline Lab notebook UI — served as a single HTML page.
Note: innerHTML usage in this file is intentional for building the UI.
All user-supplied text is escaped through the esc() function before insertion.
The only values rendered via innerHTML are pre-formatted HTML strings with
escaped user content no raw user input is ever injected unescaped.
"""
from fastapi import APIRouter
from fastapi.responses import HTMLResponse
router = APIRouter()
def _get_lab_html() -> str:
"""Return the Pipeline Lab HTML. Separated into a function for clarity."""
# The HTML is a self-contained notebook UI.
# All user-facing text is escaped via the esc() JS function.
return r"""<!DOCTYPE html>
<html lang="en"><head>
<meta charset="UTF-8"><meta name="viewport" content="width=device-width,initial-scale=1.0">
<title>Pipeline Lab Lakehouse</title>
<style>
:root{--bg:#08090c;--surface:rgba(14,16,22,0.9);--border:#2a2d35;--text:#e8e6e3;--text2:#7a7872;--accent:#4ade80;--gold:#e2b55a;--red:#e05252;--blue:#5b9cf5;--purple:#c084fc}
*{box-sizing:border-box;margin:0;padding:0}
body{font-family:'SF Mono','Menlo','Consolas',monospace;background:var(--bg);color:var(--text);min-height:100vh;padding:20px 28px;font-size:13px}
h1{font-size:18px;font-weight:700;margin-bottom:4px}h1 span{color:var(--accent)}
.subtitle{color:var(--text2);font-size:11px;margin-bottom:20px}
.cells{display:flex;flex-direction:column;gap:12px;max-width:1100px}
.cell{background:var(--surface);border:1px solid var(--border);border-radius:4px;overflow:hidden}
.cell.running{border-color:var(--gold)}
.cell-header{display:flex;align-items:center;gap:8px;padding:8px 12px;border-bottom:1px solid var(--border);font-size:10px;text-transform:uppercase;letter-spacing:1px;color:var(--text2)}
.cell-type{font-weight:700}
.cell-time{margin-left:auto;color:var(--text2)}
.cell-input{padding:12px;background:rgba(0,0,0,0.3)}
.cell-input textarea{width:100%;min-height:60px;background:transparent;border:none;color:var(--text);font-family:inherit;font-size:13px;resize:vertical;outline:none;line-height:1.6}
.cell-output{padding:12px;font-size:12px;line-height:1.6;white-space:pre-wrap;max-height:400px;overflow-y:auto;display:none}
.cell-output.has-data{display:block;border-top:1px solid var(--border)}
.toolbar{display:flex;gap:6px;padding:8px 12px;border-top:1px solid var(--border);flex-wrap:wrap}
.btn{font-family:inherit;font-size:10px;text-transform:uppercase;letter-spacing:0.5px;padding:5px 12px;border:1px solid var(--border);border-radius:3px;background:transparent;color:var(--text2);cursor:pointer}
.btn:hover{border-color:var(--accent);color:var(--accent)}
.btn.primary{border-color:var(--accent);color:var(--accent);background:rgba(74,222,128,0.06)}
.btn.gold{border-color:var(--gold);color:var(--gold)}
.btn.blue{border-color:var(--blue);color:var(--blue)}
.btn.purple{border-color:var(--purple);color:var(--purple)}
.btn.red{border-color:var(--red);color:var(--red)}
.top-bar{display:flex;gap:8px;margin-bottom:16px;align-items:center;flex-wrap:wrap}
.status-bar{display:flex;gap:12px;padding:8px 12px;background:var(--surface);border:1px solid var(--border);border-radius:4px;margin-bottom:16px;font-size:10px;color:var(--text2)}
.stat{display:flex;align-items:center;gap:4px}.stat b{color:var(--text)}
.result-row{display:flex;gap:8px;padding:6px 8px;border-bottom:1px solid rgba(42,45,53,0.3);align-items:center;font-size:11px}
.result-row:last-child{border-bottom:none}
.score-bar{width:60px;height:5px;background:rgba(0,0,0,0.2);border-radius:3px;overflow:hidden}
.score-fill{height:100%;border-radius:3px}
.benchmark-grid{display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px}
.bench-col{background:rgba(0,0,0,0.2);border-radius:3px;padding:10px}
.bench-label{font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700}
.threshold-slider{display:flex;align-items:center;gap:8px;padding:0 12px;margin:4px 0}
.threshold-slider input[type=range]{flex:1;accent-color:var(--accent)}
.threshold-slider .val{font-weight:700;min-width:36px;text-align:right}
</style></head><body>
<h1><span>Pipeline Lab</span> // Lakehouse</h1>
<div class="subtitle">Embedding-based screening vs LLM classification &#x2014; iterative experimentation</div>
<div class="status-bar" id="status-bar">
<div class="stat"><span>Exemplars:</span> <b id="st-exemplars">0</b></div>
<div class="stat"><span>Categories:</span> <b id="st-categories">0</b></div>
<div class="stat"><span>Pipelines:</span> <b id="st-pipelines">0</b></div>
<div class="stat" style="margin-left:auto"><span>Sidecar:</span> <b id="st-health" style="color:var(--text2)">...</b></div>
</div>
<div class="top-bar">
<button class="btn primary" onclick="addCell('exemplars')">+ Exemplars</button>
<button class="btn gold" onclick="addCell('screen')">+ Screen</button>
<button class="btn blue" onclick="addCell('classify')">+ Classify</button>
<button class="btn purple" onclick="addCell('benchmark')">+ Benchmark</button>
<button class="btn" onclick="addCell('similarity')">+ Similarity</button>
<button class="btn" onclick="addCell('generate')">+ Generate</button>
<button class="btn" onclick="addCell('pipeline')">+ Pipeline</button>
<span style="flex:1"></span>
<button class="btn red" onclick="clearCells()">Clear All</button>
</div>
<div class="cells" id="cells"></div>
<script>
var BASE = '';
var cellCounter = 0;
function esc(t){var d=document.createElement('span');d.textContent=String(t);return d.innerHTML}
async function api(path, body) {
var opts = body ? {method:'POST', headers:{'Content-Type':'application/json'}, body:JSON.stringify(body)} : {};
var r = await fetch(BASE + '/lab' + path, opts);
return r.json();
}
async function refreshStatus() {
try {
var ex = await api('/exemplars');
var pl = await api('/pipelines');
var h = await fetch(BASE + '/health').then(function(r){return r.json()});
document.getElementById('st-exemplars').textContent = ex.total || 0;
document.getElementById('st-categories').textContent = Object.keys(ex.categories || {}).length;
document.getElementById('st-pipelines').textContent = (pl.pipelines || []).length;
document.getElementById('st-health').textContent = h.status || 'ok';
document.getElementById('st-health').style.color = 'var(--accent)';
} catch(e) {
document.getElementById('st-health').textContent = 'error';
document.getElementById('st-health').style.color = 'var(--red)';
}
}
function addCell(type) {
var id = 'cell-' + (++cellCounter);
var cells = document.getElementById('cells');
var cell = document.createElement('div'); cell.className = 'cell'; cell.id = id;
var colors = {exemplars:'var(--accent)',screen:'var(--gold)',classify:'var(--blue)',benchmark:'var(--purple)',similarity:'var(--text2)',generate:'var(--text2)',pipeline:'var(--accent)'};
var labels = {exemplars:'EXEMPLARS',screen:'SCREEN',classify:'CLASSIFY (LLM)',benchmark:'BENCHMARK A/B',similarity:'SIMILARITY',generate:'GENERATE',pipeline:'PIPELINE'};
var placeholders = {
exemplars:'Category: decision\n---\nWe decided to use Parquet for all storage\nThe team chose React over Vue\nArchitecture decision: microservices',
screen:'Enter texts to classify via embedding similarity (one per line):\n\nWe decided to migrate to PostgreSQL\nThe weather is nice today\nArchitecture: chose event sourcing over CRUD',
classify:'Enter texts to classify via LLM (one per line):\n\nWe decided to migrate to PostgreSQL\nThe weather is nice today',
benchmark:'Enter texts to benchmark (one per line):\n\nWe decided to use Kubernetes for orchestration\nThe new hire starts Monday\nTechnical debt: refactor the auth module\nLunch menu looks good today',
similarity:'Enter texts to compare pairwise (one per line):\n\nWe chose React for the frontend\nReact was selected as our UI framework\nThe database uses PostgreSQL',
generate:'Enter a prompt for the LLM...',
pipeline:'Pipeline name: my-extraction\n---\nscreen | threshold=0.6\nclassify\nextract | prompt=Extract the key decision and its rationale\nvalidate | dedup_threshold=0.9'
};
var color = colors[type] || 'var(--text2)';
var label = labels[type] || type.toUpperCase();
var ph = placeholders[type] || '';
// Build cell using DOM methods
var header = document.createElement('div'); header.className = 'cell-header';
var typeSpan = document.createElement('span'); typeSpan.className = 'cell-type'; typeSpan.style.color = color; typeSpan.textContent = label; header.appendChild(typeSpan);
var numSpan = document.createElement('span'); numSpan.textContent = 'Cell #' + cellCounter; header.appendChild(numSpan);
var timeSpan = document.createElement('span'); timeSpan.className = 'cell-time'; timeSpan.id = id + '-time'; header.appendChild(timeSpan);
cell.appendChild(header);
var inputDiv = document.createElement('div'); inputDiv.className = 'cell-input';
var textarea = document.createElement('textarea'); textarea.id = id + '-input'; textarea.placeholder = ph; textarea.value = ph;
inputDiv.appendChild(textarea); cell.appendChild(inputDiv);
if (type === 'screen' || type === 'benchmark') {
var slider = document.createElement('div'); slider.className = 'threshold-slider';
var slLabel = document.createElement('span'); slLabel.style.cssText = 'font-size:10px;color:var(--text2)'; slLabel.textContent = 'Threshold:'; slider.appendChild(slLabel);
var range = document.createElement('input'); range.type = 'range'; range.min = '0.3'; range.max = '0.95'; range.step = '0.05'; range.value = '0.65'; range.id = id + '-threshold';
var valSpan = document.createElement('span'); valSpan.className = 'val'; valSpan.textContent = '0.65';
range.oninput = function() { valSpan.textContent = this.value; };
slider.appendChild(range); slider.appendChild(valSpan); cell.appendChild(slider);
}
var outputDiv = document.createElement('div'); outputDiv.className = 'cell-output'; outputDiv.id = id + '-output';
cell.appendChild(outputDiv);
var tb = document.createElement('div'); tb.className = 'toolbar';
var runBtn = document.createElement('button'); runBtn.className = 'btn primary'; runBtn.textContent = 'Run';
runBtn.onclick = function() { runCell(id, type); }; tb.appendChild(runBtn);
var rmBtn = document.createElement('button'); rmBtn.className = 'btn red'; rmBtn.textContent = 'Remove';
rmBtn.onclick = function() { removeCell(id); }; tb.appendChild(rmBtn);
cell.appendChild(tb);
cells.appendChild(cell);
textarea.focus();
return id;
}
function removeCell(id) { var el = document.getElementById(id); if (el) el.remove(); }
function clearCells() { document.getElementById('cells').textContent = ''; cellCounter = 0; }
function parseLines(text) { return text.split('\n').map(function(l){return l.trim()}).filter(function(l){return l && l.charAt(0) !== '#'}); }
async function runCell(id, type) {
var cell = document.getElementById(id);
var input = document.getElementById(id+'-input').value;
var output = document.getElementById(id+'-output');
var timeEl = document.getElementById(id+'-time');
cell.classList.add('running');
output.className = 'cell-output has-data';
output.textContent = 'Running...';
try {
var t0 = performance.now();
var result;
if (type === 'exemplars') {
var parts = input.split('---');
var catLine = (parts[0] || '').trim();
var category = catLine.replace(/^category:\s*/i, '').trim().toLowerCase();
var texts = parseLines(parts.slice(1).join('\n'));
if (!category || !texts.length) { output.textContent = 'Format: Category: name\\n---\\ntext1\\ntext2'; return; }
result = await api('/exemplars', {category: category, texts: texts});
output.textContent = 'Added ' + result.added + ' exemplars to "' + result.category + '" (total: ' + result.total + ')';
output.style.color = 'var(--accent)';
refreshStatus();
}
else if (type === 'screen') {
var texts = parseLines(input);
var threshold = parseFloat((document.getElementById(id+'-threshold') || {}).value || '0.65');
result = await api('/screen', {texts: texts, threshold: threshold});
renderScreenResults(output, result, threshold);
}
else if (type === 'classify') {
var texts = parseLines(input);
result = await api('/classify', {texts: texts});
renderClassifyResults(output, result);
}
else if (type === 'benchmark') {
var texts = parseLines(input);
var threshold = parseFloat((document.getElementById(id+'-threshold') || {}).value || '0.65');
result = await api('/benchmark', {texts: texts, threshold: threshold});
renderBenchmark(output, result);
}
else if (type === 'similarity') {
var texts = parseLines(input);
result = await api('/cell', {action:'similarity', texts: texts});
renderSimilarityMatrix(output, result);
}
else if (type === 'generate') {
result = await api('/cell', {action:'generate', text: input});
output.textContent = result.text || '(empty)';
}
else if (type === 'pipeline') {
var parts = input.split('---');
var nameLine = (parts[0] || '').trim();
var pName = nameLine.replace(/^pipeline\s*name:\s*/i, '').trim();
var stageLines = parseLines(parts.slice(1).join('\n'));
var stages = stageLines.map(function(line) {
var ps = line.split('|').map(function(s){return s.trim()});
var mode = ps[0];
var config = {};
ps.slice(1).forEach(function(p) {
var kv = p.split('='); if (kv.length===2) {
var v = kv[1].trim();
config[kv[0].trim()] = isNaN(parseFloat(v)) ? v : parseFloat(v);
}
});
return {name: mode, mode: mode, config: config};
});
await api('/pipelines', {name: pName, stages: stages, description: 'Created in Pipeline Lab'});
output.textContent = 'Pipeline "' + pName + '" saved (' + stages.length + ' stages). Use the API to run it: POST /lab/pipelines/run';
output.style.color = 'var(--accent)';
refreshStatus();
}
var elapsed = Math.round(performance.now() - t0);
timeEl.textContent = elapsed + 'ms' + (result && result.time_ms ? ' (server: '+result.time_ms+'ms)' : '');
} catch(e) {
output.textContent = 'Error: ' + e.message;
output.style.color = 'var(--red)';
} finally {
cell.classList.remove('running');
}
}
function renderScreenResults(el, results, threshold) {
el.textContent = '';
results.forEach(function(r) {
var row = document.createElement('div'); row.className = 'result-row';
var cat = document.createElement('span');
cat.style.cssText = 'min-width:80px;font-weight:700;color:' + (r.above_threshold ? 'var(--accent)' : 'var(--text2)');
cat.textContent = r.best_category || 'none'; row.appendChild(cat);
var sim = document.createElement('span'); sim.style.cssText = 'min-width:50px;font-weight:700';
sim.textContent = (r.similarity * 100).toFixed(1) + '%';
sim.style.color = r.similarity >= 0.7 ? 'var(--accent)' : r.similarity >= threshold ? 'var(--gold)' : 'var(--text2)';
row.appendChild(sim);
var bar = document.createElement('div'); bar.className = 'score-bar';
var fill = document.createElement('div'); fill.className = 'score-fill';
fill.style.width = (r.similarity * 100) + '%';
fill.style.background = r.similarity >= 0.7 ? 'var(--accent)' : r.similarity >= threshold ? 'var(--gold)' : 'var(--red)';
bar.appendChild(fill); row.appendChild(bar);
var txt = document.createElement('span'); txt.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
txt.textContent = r.text; row.appendChild(txt);
var badge = document.createElement('span');
badge.style.cssText = 'font-size:9px;padding:2px 6px;border-radius:2px;border:1px solid;' +
(r.above_threshold ? 'color:var(--accent);border-color:var(--accent)' : 'color:var(--text2);border-color:var(--border)');
badge.textContent = r.above_threshold ? 'PASS' : 'FILTERED'; row.appendChild(badge);
el.appendChild(row);
});
}
function renderClassifyResults(el, results) {
el.textContent = '';
results.forEach(function(r) {
var row = document.createElement('div'); row.className = 'result-row';
var cat = document.createElement('span'); cat.style.cssText = 'min-width:80px;font-weight:700;color:var(--blue)';
cat.textContent = r.category; row.appendChild(cat);
var conf = document.createElement('span');
conf.style.cssText = 'min-width:50px;font-size:10px;color:' + (r.confidence==='high'?'var(--accent)':r.confidence==='medium'?'var(--gold)':'var(--text2)');
conf.textContent = r.confidence; row.appendChild(conf);
var txt = document.createElement('span'); txt.style.flex = '1'; txt.textContent = r.text; row.appendChild(txt);
el.appendChild(row);
});
}
function renderBenchmark(el, result) {
el.textContent = '';
// Summary stats (using safe DOM construction)
var summary = document.createElement('div'); summary.style.cssText = 'display:flex;gap:16px;margin-bottom:12px;flex-wrap:wrap';
var stats = [
['Agreement', (result.agreement_rate*100).toFixed(1)+'%', result.agreement_rate>=0.8?'var(--accent)':'var(--gold)'],
['Speedup', result.speedup+'x', result.speedup>=2?'var(--accent)':'var(--text)'],
['Embed', result.embed_time_ms+'ms', 'var(--gold)'],
['LLM', result.llm_time_ms+'ms', 'var(--blue)'],
['Hybrid est.', result.hybrid_estimated_ms+'ms', 'var(--accent)'],
['Screened out', result.texts_screened_out+'/'+result.total_texts, 'var(--purple)']
];
stats.forEach(function(s) {
var box = document.createElement('div'); box.style.cssText = 'background:rgba(0,0,0,0.2);padding:6px 10px;border-radius:3px;text-align:center';
var lbl = document.createElement('div'); lbl.style.cssText = 'font-size:9px;color:var(--text2);text-transform:uppercase;letter-spacing:0.5px'; lbl.textContent = s[0]; box.appendChild(lbl);
var val = document.createElement('div'); val.style.cssText = 'font-size:16px;font-weight:700;color:'+s[2]; val.textContent = s[1]; box.appendChild(val);
summary.appendChild(box);
});
el.appendChild(summary);
// Side-by-side comparison
var grid = document.createElement('div'); grid.style.cssText = 'display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px';
// Embed column
var leftCol = document.createElement('div'); leftCol.style.cssText = 'background:rgba(0,0,0,0.2);border-radius:3px;padding:10px';
var leftTitle = document.createElement('div'); leftTitle.style.cssText = 'font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700;color:var(--gold)';
leftTitle.textContent = 'EMBEDDING SCREENING (' + result.embed_time_ms + 'ms)'; leftCol.appendChild(leftTitle);
(result.embed_results||[]).forEach(function(r) {
var row = document.createElement('div'); row.style.cssText = 'font-size:11px;padding:3px 0;display:flex;gap:6px;align-items:center';
var c = document.createElement('span'); c.style.cssText = 'min-width:60px;font-weight:700;color:'+(r.above_threshold?'var(--accent)':'var(--text2)'); c.textContent = r.best_category||'none'; row.appendChild(c);
var s = document.createElement('span'); s.style.cssText = 'min-width:40px;color:var(--text2)'; s.textContent = (r.similarity*100).toFixed(0)+'%'; row.appendChild(s);
var t = document.createElement('span'); t.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap'; t.textContent = r.text; row.appendChild(t);
leftCol.appendChild(row);
});
grid.appendChild(leftCol);
// LLM column
var rightCol = document.createElement('div'); rightCol.style.cssText = 'background:rgba(0,0,0,0.2);border-radius:3px;padding:10px';
var rightTitle = document.createElement('div'); rightTitle.style.cssText = 'font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700;color:var(--blue)';
rightTitle.textContent = 'LLM CLASSIFICATION (' + result.llm_time_ms + 'ms)'; rightCol.appendChild(rightTitle);
(result.llm_results||[]).forEach(function(r) {
var row = document.createElement('div'); row.style.cssText = 'font-size:11px;padding:3px 0;display:flex;gap:6px;align-items:center';
var c = document.createElement('span'); c.style.cssText = 'min-width:60px;font-weight:700;color:var(--blue)'; c.textContent = r.category; row.appendChild(c);
var s = document.createElement('span'); s.style.cssText = 'min-width:40px;color:'+(r.confidence==='high'?'var(--accent)':'var(--text2)'); s.textContent = r.confidence; row.appendChild(s);
var t = document.createElement('span'); t.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap'; t.textContent = r.text; row.appendChild(t);
rightCol.appendChild(row);
});
grid.appendChild(rightCol);
el.appendChild(grid);
}
function renderSimilarityMatrix(el, result) {
el.textContent = '';
var matrix = result.matrix || [];
var texts = result.texts || [];
if (!matrix.length) { el.textContent = 'No results'; return; }
var tbl = document.createElement('table'); tbl.style.cssText = 'border-collapse:collapse;font-size:11px;width:100%';
var hdr = document.createElement('tr');
var corner = document.createElement('th'); hdr.appendChild(corner);
texts.forEach(function(t) {
var th = document.createElement('th'); th.style.cssText = 'padding:4px;color:var(--text2);font-size:9px;max-width:100px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
th.textContent = t.substring(0, 20); th.title = t; hdr.appendChild(th);
});
tbl.appendChild(hdr);
matrix.forEach(function(row, i) {
var tr = document.createElement('tr');
var td0 = document.createElement('td'); td0.style.cssText = 'padding:4px;color:var(--text2);font-size:9px;max-width:100px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
td0.textContent = texts[i].substring(0, 20); tr.appendChild(td0);
row.forEach(function(v, j) {
var td = document.createElement('td');
var bg = i===j ? 'rgba(74,222,128,0.1)' : v>=0.8 ? 'rgba(74,222,128,0.15)' : v>=0.6 ? 'rgba(226,181,90,0.1)' : 'transparent';
td.style.cssText = 'padding:4px;text-align:center;font-weight:'+(v>=0.7?'700':'400')+';color:'+(v>=0.8?'var(--accent)':v>=0.6?'var(--gold)':'var(--text2)')+';background:'+bg;
td.textContent = v.toFixed(2); tr.appendChild(td);
});
tbl.appendChild(tr);
});
el.appendChild(tbl);
}
refreshStatus();
</script>
</body></html>"""
@router.get("", response_class=HTMLResponse)
async def lab_page():
return _get_lab_html()

View File

@ -0,0 +1,503 @@
"""Pipeline Lab — iterative embedding/LLM pipeline experimentation.
Provides:
- Exemplar-based embedding classification (fast screening)
- LLM-based classification (accurate but slow)
- A/B benchmarking between the two
- Pipeline definition and execution
- Notebook-style API for interactive experimentation
"""
import json
import math
import os
import time
from pathlib import Path
from typing import Optional
from fastapi import APIRouter, HTTPException
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from .ollama import client
router = APIRouter()
EMBED_MODEL = os.environ.get("EMBED_MODEL", "nomic-embed-text")
GEN_MODEL = os.environ.get("GEN_MODEL", "qwen2.5")
LAB_DIR = Path(os.environ.get("LAB_DIR", "./data/_pipeline_lab"))
LAB_DIR.mkdir(parents=True, exist_ok=True)
# ─── Vector math ─────────────────────────────────────────────
def cosine_similarity(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
norm_a = math.sqrt(sum(x * x for x in a))
norm_b = math.sqrt(sum(x * x for x in b))
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)
# ─── Exemplar store ──────────────────────────────────────────
# Exemplars are labeled text+embedding pairs used for classification.
# e.g. category="decision" texts=["We decided to use Parquet", "The team chose React"]
_exemplars: dict[str, list[dict]] = {} # category -> [{text, embedding}]
def _exemplar_file() -> Path:
return LAB_DIR / "exemplars.json"
def _load_exemplars():
global _exemplars
fp = _exemplar_file()
if fp.exists():
data = json.loads(fp.read_text())
_exemplars = data
return _exemplars
def _save_exemplars():
_exemplar_file().write_text(json.dumps(_exemplars, indent=2))
_load_exemplars()
# ─── Pipeline store ──────────────────────────────────────────
def _pipelines_dir() -> Path:
d = LAB_DIR / "pipelines"
d.mkdir(exist_ok=True)
return d
# ─── Embedding helper ────────────────────────────────────────
async def _embed_texts(texts: list[str], model: str = EMBED_MODEL) -> list[list[float]]:
embeddings = []
async with client() as c:
for text in texts:
resp = await c.post("/api/embed", json={"model": model, "input": text})
if resp.status_code != 200:
raise HTTPException(502, f"Ollama embed error: {resp.text}")
data = resp.json()
embeddings.extend(data.get("embeddings", []))
return embeddings
async def _generate(prompt: str, model: str = GEN_MODEL, temperature: float = 0.3) -> str:
async with client() as c:
resp = await c.post("/api/generate", json={
"model": model, "prompt": prompt, "stream": False,
"options": {"temperature": temperature, "num_predict": 1024}
})
if resp.status_code != 200:
raise HTTPException(502, f"Ollama generate error: {resp.text}")
return resp.json().get("response", "")
# ─── API: Exemplars ──────────────────────────────────────────
class ExemplarAdd(BaseModel):
category: str
texts: list[str]
class ExemplarList(BaseModel):
categories: dict[str, int] # category -> count
@router.post("/exemplars")
async def add_exemplars(req: ExemplarAdd):
"""Add labeled exemplar texts for a category. Embeddings generated automatically."""
category = req.category.strip().lower()
if not category or not req.texts:
raise HTTPException(400, "category and texts required")
embeddings = await _embed_texts(req.texts)
if category not in _exemplars:
_exemplars[category] = []
for text, emb in zip(req.texts, embeddings):
_exemplars[category].append({"text": text, "embedding": emb})
_save_exemplars()
return {"ok": True, "category": category, "added": len(req.texts),
"total": len(_exemplars[category])}
@router.get("/exemplars")
async def list_exemplars():
"""List all exemplar categories and counts."""
return {"categories": {k: len(v) for k, v in _exemplars.items()},
"total": sum(len(v) for v in _exemplars.values())}
@router.delete("/exemplars/{category}")
async def delete_exemplar_category(category: str):
if category in _exemplars:
del _exemplars[category]
_save_exemplars()
return {"ok": True}
# ─── API: Screen (embedding-based classification) ────────────
class ScreenRequest(BaseModel):
texts: list[str]
threshold: float = 0.65
top_k: int = 1
class ScreenResult(BaseModel):
text: str
best_category: str | None
similarity: float
above_threshold: bool
all_scores: dict[str, float]
@router.post("/screen", response_model=list[ScreenResult])
async def screen_texts(req: ScreenRequest):
"""Classify texts by cosine similarity to exemplar embeddings (fast path)."""
if not _exemplars:
raise HTTPException(400, "No exemplars defined. Add exemplars first.")
embeddings = await _embed_texts(req.texts)
results = []
for text, emb in zip(req.texts, embeddings):
category_scores = {}
for category, exemplar_list in _exemplars.items():
sims = [cosine_similarity(emb, ex["embedding"]) for ex in exemplar_list]
category_scores[category] = max(sims) if sims else 0.0
best_cat = max(category_scores, key=category_scores.get) if category_scores else None
best_sim = category_scores.get(best_cat, 0.0) if best_cat else 0.0
results.append(ScreenResult(
text=text[:200],
best_category=best_cat if best_sim >= req.threshold else None,
similarity=round(best_sim, 4),
above_threshold=best_sim >= req.threshold,
all_scores={k: round(v, 4) for k, v in sorted(category_scores.items(),
key=lambda x: x[1], reverse=True)},
))
return results
# ─── API: Classify (LLM-based classification) ────────────────
class ClassifyRequest(BaseModel):
texts: list[str]
categories: list[str] | None = None # if None, use exemplar category names
model: str | None = None
class ClassifyResult(BaseModel):
text: str
category: str
confidence: str
reasoning: str
@router.post("/classify", response_model=list[ClassifyResult])
async def classify_texts(req: ClassifyRequest):
"""Classify texts using LLM (slow but accurate path)."""
categories = req.categories or list(_exemplars.keys())
if not categories:
raise HTTPException(400, "No categories. Provide categories or add exemplars.")
model = req.model or GEN_MODEL
results = []
for text in req.texts:
prompt = (
f"Classify this text into exactly ONE of these categories: {', '.join(categories)}\n\n"
f"TEXT: {text[:500]}\n\n"
f"Respond with JSON: {{\"category\": \"...\", \"confidence\": \"high|medium|low\", "
f"\"reasoning\": \"one sentence\"}}"
)
raw = await _generate(prompt, model=model, temperature=0.1)
# Parse
try:
j_s, j_e = raw.find("{"), raw.rfind("}") + 1
parsed = json.loads(raw[j_s:j_e]) if j_s >= 0 and j_e > j_s else {}
except Exception:
parsed = {}
results.append(ClassifyResult(
text=text[:200],
category=parsed.get("category", "unknown"),
confidence=parsed.get("confidence", "low"),
reasoning=parsed.get("reasoning", raw[:200]),
))
return results
# ─── API: Benchmark (A/B comparison) ─────────────────────────
class BenchmarkRequest(BaseModel):
texts: list[str]
threshold: float = 0.65
model: str | None = None
class BenchmarkResult(BaseModel):
total_texts: int
# Embedding path
embed_time_ms: int
embed_results: list[dict]
# LLM path
llm_time_ms: int
llm_results: list[dict]
# Comparison
agreement_rate: float
speedup: float
texts_screened_out: int
texts_needing_llm: int
hybrid_estimated_ms: int
@router.post("/benchmark", response_model=BenchmarkResult)
async def benchmark(req: BenchmarkRequest):
"""Run same texts through embedding screening and LLM classification. Compare."""
if not _exemplars:
raise HTTPException(400, "No exemplars. Add exemplars first.")
categories = list(_exemplars.keys())
# Embedding path
t0 = time.monotonic()
embed_results = await screen_texts(ScreenRequest(
texts=req.texts, threshold=req.threshold
))
embed_ms = int((time.monotonic() - t0) * 1000)
# LLM path
t0 = time.monotonic()
llm_results = await classify_texts(ClassifyRequest(
texts=req.texts, categories=categories, model=req.model
))
llm_ms = int((time.monotonic() - t0) * 1000)
# Compare
agreements = 0
screened_out = 0
for er, lr in zip(embed_results, llm_results):
if not er.above_threshold:
screened_out += 1
if er.best_category == lr.category:
agreements += 1
needing_llm = len(req.texts) - screened_out
# Hybrid estimate: embed all + LLM only the uncertain ones
per_text_embed_ms = embed_ms / max(len(req.texts), 1)
per_text_llm_ms = llm_ms / max(len(req.texts), 1)
hybrid_ms = int(embed_ms + needing_llm * per_text_llm_ms)
return BenchmarkResult(
total_texts=len(req.texts),
embed_time_ms=embed_ms,
embed_results=[r.model_dump() for r in embed_results],
llm_time_ms=llm_ms,
llm_results=[r.model_dump() for r in llm_results],
agreement_rate=round(agreements / max(len(req.texts), 1), 3),
speedup=round(llm_ms / max(hybrid_ms, 1), 2),
texts_screened_out=screened_out,
texts_needing_llm=needing_llm,
hybrid_estimated_ms=hybrid_ms,
)
# ─── API: Pipeline definition & execution ────────────────────
class PipelineStage(BaseModel):
name: str
mode: str # "screen", "classify", "extract", "validate", "custom"
config: dict = {} # stage-specific config (threshold, prompt, etc.)
class PipelineDef(BaseModel):
name: str
stages: list[PipelineStage]
description: str = ""
class PipelineRunRequest(BaseModel):
pipeline_name: str
texts: list[str]
@router.post("/pipelines")
async def save_pipeline(pipeline: PipelineDef):
"""Save a pipeline definition."""
fp = _pipelines_dir() / f"{pipeline.name}.json"
fp.write_text(pipeline.model_dump_json(indent=2))
return {"ok": True, "name": pipeline.name}
@router.get("/pipelines")
async def list_pipelines():
"""List saved pipeline definitions."""
pipelines = []
for fp in _pipelines_dir().glob("*.json"):
try:
data = json.loads(fp.read_text())
pipelines.append({"name": data["name"], "stages": len(data["stages"]),
"description": data.get("description", "")})
except Exception:
pass
return {"pipelines": pipelines}
@router.get("/pipelines/{name}")
async def get_pipeline(name: str):
fp = _pipelines_dir() / f"{name}.json"
if not fp.exists():
raise HTTPException(404, "Pipeline not found")
return json.loads(fp.read_text())
@router.post("/pipelines/run")
async def run_pipeline(req: PipelineRunRequest):
"""Execute a pipeline on a set of texts. Returns per-stage results and timing."""
fp = _pipelines_dir() / f"{req.pipeline_name}.json"
if not fp.exists():
raise HTTPException(404, f"Pipeline '{req.pipeline_name}' not found")
pipeline = json.loads(fp.read_text())
results = {"pipeline": req.pipeline_name, "stages": [], "total_ms": 0}
current_texts = req.texts[:]
for stage_def in pipeline["stages"]:
stage_name = stage_def["name"]
mode = stage_def["mode"]
config = stage_def.get("config", {})
t0 = time.monotonic()
stage_result = {"name": stage_name, "mode": mode, "input_count": len(current_texts)}
if mode == "screen":
threshold = config.get("threshold", 0.65)
screen_res = await screen_texts(ScreenRequest(
texts=current_texts, threshold=threshold
))
passed = [r for r in screen_res if r.above_threshold]
stage_result["output_count"] = len(passed)
stage_result["filtered_out"] = len(current_texts) - len(passed)
stage_result["results"] = [r.model_dump() for r in screen_res]
# Pass only above-threshold texts to next stage
current_texts = [r.text for r in screen_res if r.above_threshold]
elif mode == "classify":
cls_res = await classify_texts(ClassifyRequest(
texts=current_texts,
categories=config.get("categories"),
model=config.get("model"),
))
stage_result["output_count"] = len(cls_res)
stage_result["results"] = [r.model_dump() for r in cls_res]
elif mode == "extract":
extract_prompt = config.get("prompt", "Extract key information from this text:")
extractions = []
for text in current_texts:
raw = await _generate(f"{extract_prompt}\n\nTEXT: {text[:800]}")
extractions.append({"text": text[:200], "extracted": raw})
stage_result["output_count"] = len(extractions)
stage_result["results"] = extractions
elif mode == "validate":
# Embedding-based dedup: find near-duplicate results
if len(current_texts) > 1:
embs = await _embed_texts(current_texts)
dupes = []
threshold = config.get("dedup_threshold", 0.92)
for i in range(len(embs)):
for j in range(i + 1, len(embs)):
sim = cosine_similarity(embs[i], embs[j])
if sim >= threshold:
dupes.append({"i": i, "j": j, "similarity": round(sim, 4),
"text_a": current_texts[i][:100],
"text_b": current_texts[j][:100]})
stage_result["duplicates_found"] = len(dupes)
stage_result["results"] = dupes
else:
stage_result["duplicates_found"] = 0
stage_result["results"] = []
stage_result["output_count"] = len(current_texts)
else:
stage_result["error"] = f"Unknown mode: {mode}"
stage_result["output_count"] = len(current_texts)
stage_ms = int((time.monotonic() - t0) * 1000)
stage_result["time_ms"] = stage_ms
results["stages"].append(stage_result)
results["total_ms"] += stage_ms
return results
# ─── API: REPL cell (free-form eval) ─────────────────────────
class CellRequest(BaseModel):
action: str # "embed", "generate", "similarity", "screen", "classify"
text: str = ""
texts: list[str] = []
params: dict = {}
@router.post("/cell")
async def run_cell(req: CellRequest):
"""Execute a single notebook cell. Flexible entry point for ad-hoc operations."""
t0 = time.monotonic()
result = {}
if req.action == "embed":
texts = req.texts or ([req.text] if req.text else [])
embs = await _embed_texts(texts)
result = {"embeddings_count": len(embs), "dimensions": len(embs[0]) if embs else 0,
"texts": texts}
elif req.action == "generate":
text = await _generate(req.text, **{k: v for k, v in req.params.items()
if k in ("model", "temperature")})
result = {"text": text}
elif req.action == "similarity":
if len(req.texts) < 2:
raise HTTPException(400, "Need at least 2 texts for similarity")
embs = await _embed_texts(req.texts)
matrix = []
for i in range(len(embs)):
row = []
for j in range(len(embs)):
row.append(round(cosine_similarity(embs[i], embs[j]), 4))
matrix.append(row)
result = {"matrix": matrix, "texts": [t[:80] for t in req.texts]}
elif req.action == "screen":
texts = req.texts or ([req.text] if req.text else [])
threshold = req.params.get("threshold", 0.65)
res = await screen_texts(ScreenRequest(texts=texts, threshold=threshold))
result = {"results": [r.model_dump() for r in res]}
elif req.action == "classify":
texts = req.texts or ([req.text] if req.text else [])
res = await classify_texts(ClassifyRequest(texts=texts))
result = {"results": [r.model_dump() for r in res]}
else:
raise HTTPException(400, f"Unknown action: {req.action}")
result["time_ms"] = int((time.monotonic() - t0) * 1000)
return result

90
tests/agent_test/PRD.md Normal file
View File

@ -0,0 +1,90 @@
# PRD: Chicago Permit Staffing Recommendation
## Mission
You are a staffing-intelligence assistant. Your job is to **analyze a Chicago building permit and produce a one-page staffing recommendation** for our staffing company.
The output is a markdown document that a human staffing coordinator will read in under 2 minutes to decide whether to pursue the contract for staffing fit.
## Critical rules
1. **DO NOT START WRITING THE FINAL ANALYSIS YET.**
- First, READ this PRD fully.
- Then, PLAN your approach in `note()` — what steps will you take, what tools will you call, what evidence will you need.
- Only after planning, begin executing.
2. **Never invent facts.** If you don't have evidence for a claim (from a tool call), do not make the claim. Say "no evidence available" instead.
3. **Cite your sources.** Every factual claim in the final output should reference either:
- The permit data you read (cite the permit ID)
- A matrix-retrieved chunk (cite as `[matrix:source:doc_id]`)
4. **Stay focused.** This is a one-page deliverable, not a research paper. Aim for 600-1000 words total.
## Tools available
- `list_permits(min_cost?: number, permit_type?: string)` — list permits matching filter; default returns top 5 by cost
- `read_permit(permit_id: string)` — get full details for one permit
- `query_matrix(query: string, top_k?: number)` — search the knowledge base for relevant context (contractor entities, prior permits, SEC tickers, LLM team patterns)
- `note(text: string)` — append to your working scratchpad (visible to you across iterations)
- `read_scratchpad()` — read your full scratchpad
- `done(summary: string)` — finish; pass your final markdown analysis as `summary`
## Required output structure
When you call `done(summary=...)`, the summary should contain:
```markdown
# Staffing Recommendation: Permit <ID>
## Permit Summary
[2-3 sentences: type, cost, address, scope of work]
## Contractor Profile
[What we know about the contractor(s) from matrix evidence. If no matrix hits, say so explicitly.]
## Staffing Implications
[What trades + headcount this permit implies. Ground in the work description.]
## Risk Signals
[Any matrix hits suggesting caution: debarment, prior incidents, low-quality history. If none, say so.]
## Recommendation
[Pursue / Pass / Investigate-Further, with one-sentence rationale.]
```
## Example workflow (do not copy verbatim)
1. Note your plan: "I will list 5 mid-range permits, pick one with a private contractor, read it fully, query the matrix for the contractor name, then write the recommendation."
2. Call `list_permits(min_cost=100000)` → see candidates
3. **PICK A PERMIT WITH A PRIVATE CONTRACTOR (a person's name or a private LLC), NOT a government agency** like CDOT, City of Chicago, etc. Government permits have no useful contractor profile to recommend on.
4. `read_permit(id)` → see all fields
5. Call `query_matrix("<contractor name> contractor Chicago renovation")` → see what the matrix has
6. Note any evidence found, gaps, surprises
7. Call `done(summary="<final markdown>")`
## Success criteria
- You called `done()` with a summary that follows the required structure
- Every factual claim has a source (permit ID or matrix citation)
- Total output is 600-1000 words
- You did not invent contractor names, prior incidents, or capabilities
- Plan was noted BEFORE execution started
## What "good" looks like
- Plan is concrete (which permit, which queries)
- Matrix queries are specific (contractor name + work type, not "find anything about this")
- When matrix returns nothing useful, you say so honestly
- Recommendation reflects the actual evidence, not boilerplate
## What "bad" looks like
- Skipping the plan and jumping to execution
- Making up contractor histories with no matrix evidence
- Generic recommendations that don't reference the actual permit
- Walls of text or structured padding to look thorough
## Begin
Start by acknowledging you've read this PRD and noting your plan via `note()`. Then proceed.

View File

@ -0,0 +1,404 @@
// Compounding Stress Battery — the rigorous smoke test.
//
// Three iterations against /v1/respond, each running:
// α baseline (3 easy tasks) — should complete local-only with boost
// β drift (3 niche tasks) — forces executor miss → overseer fires
// γ impossible (2 zero-supply) — must fail honestly, no token explosion
// δ distill outcomes — writes distilled_*.jsonl + vector indexes
// ε overseer meta-review — gpt-oss:120b judges the iteration
// ζ scrum judgment — gpt-oss:120b reviews overseer proposals
//
// Iteration N+1 runs the same tasks as iteration N. We measure compounding:
// does turns_per_task drop? does overseer_called_rate drop? does
// correction_effective rise? If 3/5 metrics trend favorably, architecture
// validated; otherwise the scrum verdict points at what to fix.
//
// Fail-fast: every error bubbles. No silent catches — the run ABORTS with
// the underlying stack so we see exactly where the architecture broke.
//
// Runtime: ~60-90 min. Cloud cost: ~24-32 gpt-oss calls (well under daily cap).
import { writeFile, mkdir, readFile } from "node:fs/promises";
import { join } from "node:path";
const GATEWAY = process.env.GATEWAY_URL ?? "http://localhost:3100";
const LLM_TEAM = process.env.LLM_TEAM_URL ?? "http://localhost:5000";
const BATTERY_DIR = process.env.BATTERY_DIR
?? "/home/profit/lakehouse/data/_kb/battery";
// 10-minute timeout per /v1/respond call — cloud executor on a hard task
// can chew for a while, and we want to see real behavior, not premature aborts.
const RESPOND_TIMEOUT_MS = 10 * 60 * 1000;
const META_TIMEOUT_MS = 5 * 60 * 1000;
interface Task {
task_class: string;
operation: string;
spec: Record<string, any>;
}
interface Tasks {
phases: {
alpha_baseline: Task[];
beta_drift: Task[];
gamma_impossible: Task[];
};
models: {
executor_cloud: string;
reviewer_cloud: string;
overseer_cloud: string;
};
}
interface RunResult {
status: "ok" | "failed" | "blocked";
iterations: number;
artifact: any;
log: any[];
error?: string | null;
_elapsed_ms: number;
}
interface TaskRun {
task: Task;
phase: "alpha" | "beta" | "gamma";
result: RunResult;
}
// ─── HTTP helpers ───
async function runRespond(task: Task, models: Tasks["models"]): Promise<RunResult> {
const body = {
task_class: task.task_class,
operation: task.operation,
spec: task.spec,
executor_model: models.executor_cloud,
reviewer_model: models.reviewer_cloud,
};
const start = Date.now();
const resp = await fetch(`${GATEWAY}/v1/respond`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify(body),
signal: AbortSignal.timeout(RESPOND_TIMEOUT_MS),
});
if (!resp.ok) {
const txt = await resp.text();
throw new Error(`/v1/respond HTTP ${resp.status}: ${txt.slice(0, 500)}`);
}
const j = (await resp.json()) as RunResult;
j._elapsed_ms = Date.now() - start;
return j;
}
async function runDistill(source: string): Promise<any[]> {
const body = { mode: "distill", prompt: "battery iteration distill", source };
const resp = await fetch(`${LLM_TEAM}/api/run?mode=distill`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify(body),
signal: AbortSignal.timeout(META_TIMEOUT_MS),
});
if (!resp.ok) throw new Error(`distill HTTP ${resp.status}`);
const text = await resp.text();
// SSE stream — parse data: lines, return parsed event objects
const events: any[] = [];
for (const line of text.split("\n")) {
if (!line.startsWith("data: ")) continue;
try { events.push(JSON.parse(line.slice(6))); } catch { /* skip */ }
}
return events;
}
async function cloudChat(
model: string,
prompt: string,
temperature: number,
think: boolean,
): Promise<string> {
const resp = await fetch(`${GATEWAY}/v1/chat`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({
model,
messages: [{ role: "user", content: prompt }],
temperature,
think,
provider: "ollama_cloud",
}),
signal: AbortSignal.timeout(META_TIMEOUT_MS),
});
if (!resp.ok) {
const txt = await resp.text();
throw new Error(`/v1/chat ${model} HTTP ${resp.status}: ${txt.slice(0, 500)}`);
}
const j = await resp.json() as any;
return j.choices?.[0]?.message?.content ?? "";
}
// ─── Meta-review + scrum ───
async function overseerReview(
iterNum: number,
artifacts: any,
models: Tasks["models"],
): Promise<string> {
const prompt = `You are the OVERSEER reviewing iteration ${iterNum} of a stress battery run against Lakehouse /v1/respond.
For each task in the battery below, examine: status (ok/failed/blocked), iterations used, error signature, whether the in-loop overseer fired, total tokens.
Produce a PR-style meta-review in markdown with these sections:
## What worked
List specific tasks (by operation string) that completed correctly and the evidence turns_used, citations, tokens. Be concrete.
## What failed
List specific tasks that failed or needed overseer correction. Classify: was it a real failure (impossible task), a drift we should repair, or a false positive from the test?
## Proposed changes for iteration ${iterNum + 1}
At least 3 concrete architectural changes, each with:
- **Target file** (e.g. \`crates/gateway/src/execution_loop/mod.rs\`)
- **Rationale** (what the metrics show)
- **Expected impact** (which metric should move in iter ${iterNum + 1})
Be honest about weaknesses. Do NOT propose generic best practices reference specific observations from the artifacts below.
ARTIFACTS (iteration ${iterNum}):
${JSON.stringify(artifacts, null, 2).slice(0, 30000)}`;
return cloudChat(models.overseer_cloud, prompt, 0.2, true);
}
async function scrumJudge(
iterNum: number,
review: string,
models: Tasks["models"],
): Promise<string> {
const prompt = `You are the SCRUM MASTER. The OVERSEER proposed these architectural changes for iteration ${iterNum + 1} based on iteration ${iterNum}'s results.
For each proposal, produce a verdict in markdown:
- **Proposal N**: <short name>
- **Verdict**: APPROVE | REVISE | REJECT
- **Reason**: why
- **If APPROVE**: is the expected impact realistic? what's the blast radius? is the target file correct?
- **If REVISE**: what should change about the proposal before applying?
- **If REJECT**: why is the proposal wrong or out of scope?
Final section:
## PR-ready changes
Bulleted list of only the APPROVE proposals, ready to apply.
Be rigorous. Don't rubber-stamp. If a proposal references a file that probably doesn't exist, REJECT and say so. If a proposal is a generic "improve X" without concrete plan, REVISE.
OVERSEER PROPOSED:
${review.slice(0, 15000)}`;
return cloudChat(models.overseer_cloud, prompt, 0.1, true);
}
// ─── Iteration driver ───
async function runIteration(iterNum: number, tasks: Tasks): Promise<any> {
console.log(`\n${"═".repeat(60)}`);
console.log(`▶ ITERATION ${iterNum}`);
console.log(`${"═".repeat(60)}\n`);
const iterDir = join(BATTERY_DIR, `iter_${iterNum}`);
await mkdir(iterDir, { recursive: true });
const runs: TaskRun[] = [];
for (const [phaseKey, phaseName] of [
["alpha_baseline", "alpha"],
["beta_drift", "beta"],
["gamma_impossible", "gamma"],
] as const) {
console.log(`\n── Phase ${phaseName} ──`);
for (const task of tasks.phases[phaseKey]) {
console.log(`${task.operation}`);
const result = await runRespond(task, tasks.models);
const overseerFired = (result.log ?? []).some(e => e.kind === "overseer_correction");
console.log(
` status=${result.status} turns=${result.iterations}` +
` tokens=${result.artifact?.usage?.total_tokens ?? 0}` +
` overseer=${overseerFired}` +
` elapsed=${Math.round(result._elapsed_ms / 1000)}s`
);
if (result.error) console.log(` error: ${result.error.slice(0, 200)}`);
runs.push({ task, phase: phaseName, result });
}
}
// Phase δ
console.log(`\n── Phase δ: distill outcomes_tail:20 ──`);
const distillEvents = await runDistill("outcomes_tail:20");
const distillFinal = [...distillEvents].reverse()
.find(e => e.role === "final") ?? distillEvents[distillEvents.length - 1];
const distillText = distillFinal?.text ?? JSON.stringify(distillFinal ?? {}).slice(0, 200);
console.log(` ${distillText.split("\n")[0]}`);
await writeFile(join(iterDir, "distill_output.txt"), distillText);
// Metrics
const collectPhase = (p: string) => runs.filter(r => r.phase === p);
const phaseMetrics = (p: string) => {
const ps = collectPhase(p);
if (ps.length === 0) return { count: 0 };
return {
count: ps.length,
ok: ps.filter(r => r.result.status === "ok").length,
failed: ps.filter(r => r.result.status === "failed").length,
avg_turns: ps.reduce((s, r) => s + (r.result.iterations || 0), 0) / ps.length,
total_tokens: ps.reduce((s, r) => s + (r.result.artifact?.usage?.total_tokens ?? 0), 0),
overseer_called: ps.filter(r => (r.result.log ?? []).some(e => e.kind === "overseer_correction")).length,
avg_elapsed_s: ps.reduce((s, r) => s + (r.result._elapsed_ms || 0), 0) / ps.length / 1000,
};
};
const metrics = {
iteration: iterNum,
total_tasks: runs.length,
ok_tasks: runs.filter(r => r.result.status === "ok").length,
failed_tasks: runs.filter(r => r.result.status === "failed").length,
blocked_tasks: runs.filter(r => r.result.status === "blocked").length,
total_tokens: runs.reduce((s, r) => s + (r.result.artifact?.usage?.total_tokens ?? 0), 0),
avg_turns_per_task: runs.reduce((s, r) => s + (r.result.iterations || 0), 0) / runs.length,
overseer_called_rate: runs.filter(r => (r.result.log ?? []).some(e => e.kind === "overseer_correction")).length / runs.length,
total_elapsed_s: runs.reduce((s, r) => s + (r.result._elapsed_ms || 0), 0) / 1000,
by_phase: {
alpha: phaseMetrics("alpha"),
beta: phaseMetrics("beta"),
gamma: phaseMetrics("gamma"),
},
};
console.log(`\n── Metrics ──`);
console.log(` total_tokens: ${metrics.total_tokens}`);
console.log(` avg_turns_per_task: ${metrics.avg_turns_per_task.toFixed(2)}`);
console.log(` overseer_called_rate: ${(metrics.overseer_called_rate * 100).toFixed(1)}%`);
console.log(` ok/total: ${metrics.ok_tasks}/${metrics.total_tasks}`);
await writeFile(join(iterDir, "runs.json"), JSON.stringify(runs, null, 2));
await writeFile(join(iterDir, "metrics.json"), JSON.stringify(metrics, null, 2));
// Phase ε: overseer review
console.log(`\n── Phase ε: overseer meta-review ──`);
const reviewInput = {
metrics,
task_summary: runs.map(r => ({
operation: r.task.operation,
phase: r.phase,
status: r.result.status,
iterations: r.result.iterations,
tokens: r.result.artifact?.usage?.total_tokens ?? 0,
overseer_called: (r.result.log ?? []).some(e => e.kind === "overseer_correction"),
error: r.result.error ?? null,
elapsed_s: Math.round((r.result._elapsed_ms || 0) / 1000),
})),
};
const review = await overseerReview(iterNum, reviewInput, tasks.models);
await writeFile(join(iterDir, "overseer_review.md"), review);
console.log(`${review.length} chars`);
// Phase ζ: scrum
console.log(`\n── Phase ζ: scrum judgment ──`);
const verdict = await scrumJudge(iterNum, review, tasks.models);
await writeFile(join(iterDir, "scrum_findings.md"), verdict);
console.log(`${verdict.length} chars`);
return metrics;
}
// ─── Main ───
async function main() {
const tasks = JSON.parse(
await readFile("/home/profit/lakehouse/tests/battery/tasks.json", "utf8"),
) as Tasks;
await mkdir(BATTERY_DIR, { recursive: true });
const iterations: any[] = [];
const batteryStart = Date.now();
for (let i = 1; i <= 3; i++) {
const m = await runIteration(i, tasks);
iterations.push(m);
}
const batteryElapsed = (Date.now() - batteryStart) / 1000;
// Summary
const delta = (k: keyof any, inverted = false) => {
const vals = iterations.map((m: any) => m[k]);
if (vals.some(v => v === undefined)) return "—";
const diff = vals[2] - vals[0];
const pct = vals[0] !== 0 ? (diff / vals[0]) * 100 : 0;
const arrow = inverted ? (diff < 0 ? "↓ better" : "↑ worse") : (diff > 0 ? "↑ better" : "↓ worse");
return `${arrow} (${diff > 0 ? "+" : ""}${diff.toFixed?.(2) ?? diff}, ${pct.toFixed(1)}%)`;
};
const rows = [
["total_tokens", "inverted", "want ↓ — fewer tokens for same work"],
["avg_turns_per_task", "inverted", "want ↓ — executor gets smarter"],
["overseer_called_rate", "inverted", "want ↓ — fewer cloud escalations"],
["ok_tasks", "normal", "want ↑ — more successes"],
["total_elapsed_s", "inverted", "want ↓ — faster iterations"],
];
let summary = `# Compounding Stress Battery — Summary\n\n`;
summary += `**Run:** ${new Date().toISOString()}\n`;
summary += `**Elapsed:** ${Math.round(batteryElapsed)}s (${(batteryElapsed/60).toFixed(1)} min)\n`;
summary += `**Models:** executor=${tasks.models.executor_cloud}, reviewer=${tasks.models.reviewer_cloud}, overseer=${tasks.models.overseer_cloud}\n\n`;
summary += `## Compounding Metrics\n\n`;
summary += `| Metric | iter 1 | iter 2 | iter 3 | Trend (1→3) | Goal |\n`;
summary += `|---|---|---|---|---|---|\n`;
for (const [key, inv, goal] of rows) {
const vals = iterations.map((m: any) => {
const v = m[key as string];
return typeof v === "number" ? v.toFixed(2) : String(v);
});
summary += `| ${key} | ${vals[0]} | ${vals[1]} | ${vals[2]} | ${delta(key as any, inv === "inverted")} | ${goal} |\n`;
}
summary += "\n";
// Count trending metrics
const trends = rows.map(([k, inv]) => {
const vs = iterations.map((m: any) => m[k as string]) as number[];
const improved = inv === "inverted" ? vs[2] < vs[0] : vs[2] > vs[0];
return { metric: k, improved };
});
const improvedCount = trends.filter(t => t.improved).length;
summary += `## Verdict\n\n`;
if (improvedCount >= 3) {
summary += `**✓ Architecture validated** — ${improvedCount}/${trends.length} compounding metrics improved from iteration 1 to 3.\n\n`;
} else {
summary += `**✗ Compounding NOT demonstrated** — only ${improvedCount}/${trends.length} metrics improved. See scrum_findings.md in each iter_N/ directory for the overseer's proposals and the scrum master's review of what to change.\n\n`;
}
summary += `Metrics that ${improvedCount >= 3 ? "improved" : "regressed"}:\n`;
for (const t of trends) {
summary += `- ${t.metric}: ${t.improved ? "✓ improved" : "✗ flat or worse"}\n`;
}
summary += `\n## Artifacts\n\n`;
summary += `- \`iter_1/\`, \`iter_2/\`, \`iter_3/\` — per-iteration runs.json, metrics.json, overseer_review.md, scrum_findings.md, distill_output.txt\n`;
summary += `- \`summary.md\` — this file\n`;
await writeFile(join(BATTERY_DIR, "summary.md"), summary);
console.log(`\n${"═".repeat(60)}`);
console.log(`✓ BATTERY COMPLETE — ${Math.round(batteryElapsed)}s`);
console.log(` Summary: ${join(BATTERY_DIR, "summary.md")}`);
console.log(`${"═".repeat(60)}\n`);
console.log(summary);
}
main().catch(e => {
console.error(`\n${"═".repeat(60)}`);
console.error(`✗ BATTERY FAILED: ${e.message}`);
console.error(`${"═".repeat(60)}\n`);
if (e.stack) console.error(e.stack);
process.exit(1);
});

57
tests/battery/tasks.json Normal file
View File

@ -0,0 +1,57 @@
{
"description": "Compounding stress battery tasks. Each iteration runs α (baseline) + β (drift) + γ (impossible) phases. The SAME tasks repeat across iterations so we can measure compounding (turns_used, overseer_called_rate, correction_effective).",
"phases": {
"alpha_baseline": [
{
"task_class": "staffing.fill",
"operation": "fill: Warehouse Associate x3 in Columbus, OH",
"spec": { "target_role": "Warehouse Associate", "target_count": 3, "target_city": "Columbus", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
},
{
"task_class": "staffing.fill",
"operation": "fill: Forklift Operator x2 in Toledo, OH",
"spec": { "target_role": "Forklift Operator", "target_count": 2, "target_city": "Toledo", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
},
{
"task_class": "staffing.fill",
"operation": "fill: Packer x4 in Cleveland, OH",
"spec": { "target_role": "Packer", "target_count": 4, "target_city": "Cleveland", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
}
],
"beta_drift": [
{
"task_class": "staffing.fill",
"operation": "fill: Machine Operator x2 in Youngstown, OH (requires OSHA 30 + bilingual Spanish)",
"spec": { "target_role": "Machine Operator", "target_count": 2, "target_city": "Youngstown", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1; prefer candidates with OSHA certification and Spanish" }
},
{
"task_class": "staffing.fill",
"operation": "fill: Welder x2 in Dayton, OH (AWS D1.1 certified, night shift)",
"spec": { "target_role": "Welder", "target_count": 2, "target_city": "Dayton", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1; filter by certification and shift flexibility" }
},
{
"task_class": "staffing.fill",
"operation": "fill: Assembler x5 in Akron, OH (SMT experience, cleanroom)",
"spec": { "target_role": "Assembler", "target_count": 5, "target_city": "Akron", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
}
],
"gamma_impossible": [
{
"task_class": "staffing.fill",
"operation": "fill: Underwater Welder x2 in Toledo, OH",
"spec": { "target_role": "Underwater Welder", "target_count": 2, "target_city": "Toledo", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1 (expected to fail — no supply)" }
},
{
"task_class": "staffing.fill",
"operation": "fill: Astronaut x1 in Springfield, OH",
"spec": { "target_role": "Astronaut", "target_count": 1, "target_city": "Springfield", "target_state": "OH", "approach_hint": "(expected to fail — out-of-domain role)" }
}
]
},
"models": {
"executor_cloud": "gpt-oss:20b",
"reviewer_cloud": "gpt-oss:20b",
"overseer_cloud": "gpt-oss:120b",
"notes": "gpt-oss:20b for hot path (faster, cheaper per call), gpt-oss:120b for meta-reviews. All cloud per 2026-04-23 'cloud modes are on' directive."
}
}

View File

@ -0,0 +1,45 @@
{
"generated_at": "2026-04-21T00:44:59.486489Z",
"runs": [
{
"label": "A(no-T3)",
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-30-54",
"ok_events": 0,
"total_events": 5,
"total_turns": 0,
"total_gaps": 5,
"total_citations": 0,
"prior_lessons_loaded": 0
},
{
"label": "B(T3-seed)",
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-37-04",
"ok_events": 0,
"total_events": 5,
"total_turns": 0,
"total_gaps": 5,
"total_citations": 0,
"prior_lessons_loaded": 1
},
{
"label": "C(T3-read)",
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-39-54",
"ok_events": 0,
"total_events": 5,
"total_turns": 0,
"total_gaps": 5,
"total_citations": 0,
"prior_lessons_loaded": 2
},
{
"label": "D(T3-cloud)",
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-43-44",
"ok_events": 0,
"total_events": 5,
"total_turns": 0,
"total_gaps": 5,
"total_citations": 0,
"prior_lessons_loaded": 3
}
]
}

View File

@ -0,0 +1,25 @@
# KB Measurement Report
Generated from 26 runs across 24 distinct signatures.
## Recommender confidence
- high: 23
- medium: 1
- low: 3
## Overall fill + citation
- Fill rate: **60/86** (69.8%)
- Avg citations per run: **1.38**
- Avg turns per run: 6.6
## Citation coverage by (role, city, state)
- Combos with ≥1 citation: 9
- Combos with ok fills but 0 citations: 31
## Item 3 decision signal
Non-zero: there are **combos that succeeded but never triggered playbook_memory boost**. Candidates for item 3 investigation:
- Machine Operator in Indianapolis, IN: 1/1 ok, 0 cites
- Assembler in Indianapolis, IN: 2/2 ok, 0 cites
- Warehouse Associate in Indianapolis, IN: 1/1 ok, 0 cites
- Forklift Operator in Cleveland, OH: 1/1 ok, 0 cites
- Assembler in Cleveland, OH: 2/2 ok, 0 cites

View File

@ -0,0 +1 @@
{"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-29048","name":"Raymond G. Ward","reason":"Raymond has a high availability score (0.9033) and relevant skills for the Machine Operator position in Toledo, OH. He is also endorsed by past similar fills."},{"candidate_id":"W500K-20613","name":"Pamela V. Green","reason":"Pamela has a high availability score (0.8496) and relevant skills for the Machine Operator position in Toledo, OH. She is also endorsed by past similar fills."}],"turns":5,"duration_secs":12.051,"pool_size":997,"playbook_citations":[]}

View File

@ -0,0 +1,17 @@
# Client emails — Riverfront Steel, 2026-04-21
## 10:30 recurring — Machine Operator x2
Subject: 2 Filled
Dear Riverfront Steel Team,
We are pleased to confirm that we have filled both Machine Operator roles for your Tuesday/Thursday shifts starting at 11:00 AM.
- Raymond G. Ward
- Pamela V. Green
Both candidates have high availability scores and relevant experience. Please note this is a recurring slot, and prior workers may still be available.
Best regards,
Dispatch Team Lakehouse

View File

@ -0,0 +1,45 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 28.9 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | 997 | ✓ 2 | 5 | 12.1 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 20.3 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 35.7 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 11.5 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
| undefined Raymond G. Ward | 10:30 | Machine Operator | Toledo, OH | confirmed |
| undefined Pamela V. Green | 10:30 | Machine Operator | Toledo, OH | confirmed |
## Gap signals
### drift_or_tool
- **08:00** — invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {"kind":"plan","steps":["Verify one candidate from the current list using sql tool for SQL verification.","Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH."]}
{"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id, name, role, city, state, availability FROM
- **12:15** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications
- **14:00** — no consensus after 14 turns
- **15:45** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search", "args":{"index_name":"workers_500k_v1","sql_filter":"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN ["EXCLUDE_WORKERS_ID1", "EXCLUDE_WORKERS_ID2"
### double_book
- **10:30** — undefined Pamela V. Green already booked for 10:30
### fairness
- _cross-event_ — Raymond G. Ward (undefined) booked 2 times today
### write_through_audit
- _post-run_ — playbook_memory has 33 entries (ran 5 events, expected ≥ 1 new entries from this run)
## Narrative
- 1/5 events reached consensus.
- Final roster: 2 bookings across 1 distinct workers.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,118 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 28.888,
"error": "invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {\"kind\":\"plan\",\"steps\":[\"Verify one candidate from the current list using sql tool for SQL verification.\",\"Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH.\"]}\n{\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state, availability FROM ",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {\"kind\":\"plan\",\"steps\":[\"Verify one candidate from the current list using sql tool for SQL verification.\",\"Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH.\"]}\n{\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state, availability FROM "
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": true,
"fills": [
{
"candidate_id": "W500K-29048",
"name": "Raymond G. Ward",
"reason": "Raymond has a high availability score (0.9033) and relevant skills for the Machine Operator position in Toledo, OH. He is also endorsed by past similar fills."
},
{
"candidate_id": "W500K-20613",
"name": "Pamela V. Green",
"reason": "Pamela has a high availability score (0.8496) and relevant skills for the Machine Operator position in Toledo, OH. She is also endorsed by past similar fills."
}
],
"turns": 5,
"duration_secs": 12.051,
"gap_signals": [
"double_book: undefined Pamela V. Green already booked for 10:30"
],
"sources_first_score": 0.6692528,
"sources_last_score": 0.64494026,
"pool_size": 997,
"playbook_citations": []
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 20.342,
"error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications ",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications "
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 35.727,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 11.518,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\", \"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN [\"EXCLUDE_WORKERS_ID1\", \"EXCLUDE_WORKERS_ID2\"",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\", \"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN [\"EXCLUDE_WORKERS_ID1\", \"EXCLUDE_WORKERS_ID2\""
]
}
]

View File

@ -0,0 +1,18 @@
[
{
"name": "Raymond G. Ward",
"booked_for": "10:30",
"role": "Machine Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Pamela V. Green",
"booked_for": "10:30",
"role": "Machine Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
}
]

View File

@ -0,0 +1,11 @@
# SMS drafts — Riverfront Steel, 2026-04-21
## 10:30 recurring — Machine Operator x2 in Toledo, OH
TO: Raymond G. Ward
Confirming your Machine Operator shift at Riverfront Steel in Toledo, OH starting at 11:00 AM on Tuesday/Thursday. Still available!
---
TO: Pamela V. Green
Your Machine Operator shift at Riverfront Steel in Toledo, OH starts at 11:00 AM on Tuesday/Thursday. Confirm your availability please.

View File

@ -0,0 +1 @@
# Client emails — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1 @@
# SMS drafts — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,22 @@
# Client emails — Riverfront Steel, 2026-04-21
## 12:15 expansion — Forklift Operator x5
Subject: 5 Workers Confirmed
Dear Riverfront Steel Team,
I am pleased to confirm that we have filled all five positions for Forklift Operators at your new warehouse location opening today starting at 1:00 PM. The workers are:
- Laura F. Morales
- Kyle F. Brooks
- Maria K. Cruz
- Jeffrey D. Taylor
- Charles T. Walker
All meet the criteria of being Forklift Operators in Toledo, OH.
Looking forward to a successful shift!
Best regards,
Dispatch Team Lakehouse

View File

@ -0,0 +1,26 @@
# SMS drafts — Riverfront Steel, 2026-04-21
## 12:15 expansion — Forklift Operator x5 in Toledo, OH
TO: Laura F. Morales
Confirming your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starting at 1:00 PM for the new warehouse location opening. W500K-22625
---
TO: Kyle F. Brooks
You're scheduled to start your Forklift Operator shift at Riverfront Steel in Toledo, OH at 1:00 PM for the new warehouse location opening. Confirm W500K-33961
---
TO: Maria K. Cruz
Your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starts at 1:00 PM for the new warehouse location opening. Please confirm your attendance. W500K-19588
---
TO: Jeffrey D. Taylor
Confirming your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starting at 1:00 PM for the new warehouse location opening. W500K-37729
---
TO: Charles T. Walker
You're scheduled to start your Forklift Operator shift at Riverfront Steel in Toledo, OH at 1:00 PM for the new warehouse location opening. Confirm your attendance. W500K-17543

View File

@ -0,0 +1,2 @@
{"at":"08:00","kind":"baseline_fill","operation":"fill: Warehouse Associate x3 in Toledo, OH","fills":[{"candidate_id":"W500K-49164","name":"Christopher Y. Phillips","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-40928","name":"Janet E. Hill","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-34704","name":"Fatima U. Rivera","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."}],"turns":7,"duration_secs":20.128,"pool_size":770,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.660.96)"}
{"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-19759","name":"Carmen Z. Rodriguez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.75, verified via sql tool."},{"candidate_id":"W500K-29298","name":"Robert W. Gonzalez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.74, not yet SQL verified but highly likely to meet requirements."}],"turns":5,"duration_secs":17.426,"pool_size":997,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.660.96)"}

View File

@ -0,0 +1,40 @@
# Client emails — Riverfront Steel, 2026-04-21
## 08:00 baseline_fill — Warehouse Associate x3
Subject: 3 Filled
Dear Riverfront Steel Staffing Team,
I am pleased to confirm that we have filled all three roles of Warehouse Associate for your Monday morning shift starting at 08:00 AM.
The workers assigned are:
- Christopher Y. Phillips
- Janet E. Hill
- Fatima U. Rivera
All three have confirmed their availability and are reliable team members.
Best regards,
Dispatch Team Lakehouse
## 10:30 recurring — Machine Operator x2
To: staffing@riverfrontsteel.example
From: dispatch@lakehouse.example
Subject: Confirmed
Dear Riverfront Steel Team,
We are pleased to confirm that we have filled both Machine Operator roles for your Tuesday/Thursday shifts starting at 11:00 AM. The workers assigned are:
- Carmen Z. Rodriguez
- Robert W. Gonzalez
Both are recurring Machine Operators in Toledo, OH with a score of 0.7.
Please note this is a recurring slot; prior workers may still be available.
Best regards,
Dispatch Team

View File

@ -0,0 +1,74 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | 770 | ✓ 3 | 7 | 20.1 | 0 | 2 |
| 10:30 | recurring | Machine Operator × 2 | 997 | ✓ 2 | 5 | 17.4 | 0 | 2 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 46.4 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 54.1 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 59.6 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
| undefined Christopher Y. Phillips | 08:00 | Warehouse Associate | Toledo, OH | no_show |
| undefined Janet E. Hill | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
| undefined Fatima U. Rivera | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
| undefined Carmen Z. Rodriguez | 10:30 | Machine Operator | Toledo, OH | confirmed |
| undefined Robert W. Gonzalez | 10:30 | Machine Operator | Toledo, OH | confirmed |
## Gap signals
### double_book
- **08:00** — undefined Janet E. Hill already booked for 08:00
- **08:00** — undefined Fatima U. Rivera already booked for 08:00
- **10:30** — undefined Carmen Z. Rodriguez already booked for 08:00
- **10:30** — undefined Robert W. Gonzalez already booked for 08:00
### drift_or_tool
- **12:15** — invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {"kind":"plan", "steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \'Toledo\' AND state = \'OH\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O
- **14:00** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
"args":{"index_name":"workers_500k_v1","sql_filter":"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor
- **15:45** — no consensus after 14 turns
### fairness
- _cross-event_ — Christopher Y. Phillips (undefined) booked 4 times today
### write_through_audit
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 2 new entries from this run)
## Workers touched across the week
6 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
| W500K-49164 | Christopher Y. Phillips | 08:00 baseline_fill | booked |
| W500K-40928 | Janet E. Hill | 08:00 baseline_fill | booked |
| W500K-34704 | Fatima U. Rivera | 08:00 baseline_fill | booked |
| W500K-19759 | Carmen Z. Rodriguez | 10:30 recurring | booked |
| W500K-29298 | Robert W. Gonzalez | 10:30 recurring | booked |
| undefined | Christopher Y. Phillips | 08:00 | no_show |
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.660.96)
- **10:30 recurring** (Machine Operator): Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.660.96)
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 2/5 events reached consensus.
- Final roster: 5 bookings across 1 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 6.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,146 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": true,
"fills": [
{
"candidate_id": "W500K-49164",
"name": "Christopher Y. Phillips",
"reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
},
{
"candidate_id": "W500K-40928",
"name": "Janet E. Hill",
"reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
},
{
"candidate_id": "W500K-34704",
"name": "Fatima U. Rivera",
"reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
}
],
"turns": 7,
"duration_secs": 20.128,
"gap_signals": [
"double_book: undefined Janet E. Hill already booked for 08:00",
"double_book: undefined Fatima U. Rivera already booked for 08:00"
],
"sources_first_score": 0.7124013,
"sources_last_score": 0.66623676,
"pool_size": 770,
"playbook_citations": [],
"discovered_pattern": "Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.660.96)"
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": true,
"fills": [
{
"candidate_id": "W500K-19759",
"name": "Carmen Z. Rodriguez",
"reason": "Recurring Machine Operator in Toledo, OH with a score of 0.75, verified via sql tool."
},
{
"candidate_id": "W500K-29298",
"name": "Robert W. Gonzalez",
"reason": "Recurring Machine Operator in Toledo, OH with a score of 0.74, not yet SQL verified but highly likely to meet requirements."
}
],
"turns": 5,
"duration_secs": 17.426,
"gap_signals": [
"double_book: undefined Carmen Z. Rodriguez already booked for 08:00",
"double_book: undefined Robert W. Gonzalez already booked for 08:00"
],
"sources_first_score": 0.72546995,
"sources_last_score": 0.6690281,
"pool_size": 997,
"playbook_citations": [],
"discovered_pattern": "Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.660.96)"
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 46.391,
"error": "invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \\'Toledo\\' AND state = \\'OH\\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \\'Toledo\\' AND state = \\'OH\\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O"
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 54.123,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00",
"exclude_worker_ids": [
null,
null,
null
]
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 59.593,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
}
]

View File

@ -0,0 +1,42 @@
[
{
"name": "Christopher Y. Phillips",
"booked_for": "08:00",
"role": "Warehouse Associate",
"city": "Toledo",
"state": "OH",
"status": "no_show"
},
{
"name": "Janet E. Hill",
"booked_for": "08:00",
"role": "Warehouse Associate",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Fatima U. Rivera",
"booked_for": "08:00",
"role": "Warehouse Associate",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Carmen Z. Rodriguez",
"booked_for": "10:30",
"role": "Machine Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Robert W. Gonzalez",
"booked_for": "10:30",
"role": "Machine Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
}
]

View File

@ -0,0 +1,26 @@
# SMS drafts — Riverfront Steel, 2026-04-21
## 08:00 baseline_fill — Warehouse Associate x3 in Toledo, OH
TO: Christopher Y. Phillips
Confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting 8 AM today.
---
TO: Janet E. Hill
Good morning! Confirming your shift as a Warehouse Associate from 8 AM onwards at our Toledo, OH location.
---
TO: Fatima U. Rivera
Morning Fatima! Just confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting at 8 AM.
## 10:30 recurring — Machine Operator x2 in Toledo, OH
TO: Carmen Z. Rodriguez
Confirming your shift as a Machine Operator at Riverfront Steel in Toledo, OH starting 11:00 AM on Tuesday/Thursday. Still available!
---
TO: Robert W. Gonzalez
Your recurring Tuesday/Thursday Machine Operator shift at Riverfront Steel in Toledo, OH starts at 11:00 AM. Confirm your availability please.

View File

@ -0,0 +1 @@
# Client emails — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,57 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 63.8 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 9.5 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 47.8 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 60.1 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 62.3 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
## Gap signals
### drift_or_tool
- **08:00** — aborted — 3 consecutive drift flags
- **10:30** — invalid JSON from executor: JSON Parse error: Unterminated string | raw: {"kind":"plan","steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})",
"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A
- **12:15** — aborted — 3 consecutive drift flags
- **14:00** — aborted — 3 consecutive drift flags
- **15:45** — invalid JSON from executor: JSON Parse error: Unterminated string | raw: {"kind": "plan", "steps": ["1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})",
"2.2. TOOL_CALL sql({'qu
### write_through_audit
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
## Workers touched across the week
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 0/5 events reached consensus.
- Final roster: 0 bookings across 0 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 0.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,104 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 63.815,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 9.538,
"error": "invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 47.797,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 60.115,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 62.283,
"error": "invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\": \"plan\", \"steps\": [\"1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})\",\n\"2.2. TOOL_CALL sql({'qu",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\": \"plan\", \"steps\": [\"1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})\",\n\"2.2. TOOL_CALL sql({'qu"
]
}
]

View File

@ -0,0 +1 @@
# SMS drafts — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1 @@
# Client emails — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,55 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `qwen2.5:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 6.4 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 16.8 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 7.2 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 54.0 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 49.3 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
## Gap signals
### drift_or_tool
- **08:00** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3"},"rationale":"verify top candidates via SQL query")}
- **10:30** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search","args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7","question":"machine operator Toledo OH high reliability","k":2}
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5"},"rationale":"verify top candidates via SQL query to me
- **14:00** — no consensus after 14 turns
- **15:45** — no consensus after 14 turns
### write_through_audit
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
## Workers touched across the week
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 0/5 events reached consensus.
- Final roster: 0 bookings across 0 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 0.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,104 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 6.434,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3\"},\"rationale\":\"verify top candidates via SQL query\")}",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3\"},\"rationale\":\"verify top candidates via SQL query\")}"
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 16.752,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7\",\"question\":\"machine operator Toledo OH high reliability\",\"k\":2}",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7\",\"question\":\"machine operator Toledo OH high reliability\",\"k\":2}"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 7.181,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5\"},\"rationale\":\"verify top candidates via SQL query to me",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5\"},\"rationale\":\"verify top candidates via SQL query to me"
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 54.028,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 49.298,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
}
]

View File

@ -0,0 +1 @@
# SMS drafts — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1 @@
# Client emails — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,55 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 47.4 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 40.4 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 9.4 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 44.7 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 45.1 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
## Gap signals
### drift_or_tool
- **08:00** — no consensus after 14 turns
- **10:30** — aborted — 3 consecutive drift flags
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"propose_done","args":{"fills":[{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"verified Toledo forklift op, reliability 0.9"}],"rationale":"one SQL-verified candidate from surfaced candidates"}
- **14:00** — aborted — 3 consecutive drift flags
- **15:45** — no consensus after 14 turns
### write_through_audit
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
## Workers touched across the week
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 0/5 events reached consensus.
- Final roster: 0 bookings across 0 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 0.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,104 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 47.404,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 40.374,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 9.414,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"propose_done\",\"args\":{\"fills\":[{\"candidate_id\":\"W500K-37736\",\"name\":\"Jennifer K. Robinson\",\"reason\":\"verified Toledo forklift op, reliability 0.9\"}],\"rationale\":\"one SQL-verified candidate from surfaced candidates\"}",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"propose_done\",\"args\":{\"fills\":[{\"candidate_id\":\"W500K-37736\",\"name\":\"Jennifer K. Robinson\",\"reason\":\"verified Toledo forklift op, reliability 0.9\"}],\"rationale\":\"one SQL-verified candidate from surfaced candidates\"}"
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 44.673,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 45.149,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
}
]

View File

@ -0,0 +1 @@
# SMS drafts — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,2 @@
{"at":"12:15","kind":"expansion","operation":"fill: Forklift Operator x5 in Toledo, OH","fills":[{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-33961","name":"Kyle F. Brooks","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-31297","name":"Jacob T. Diaz","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-40884","name":"Jerry M. Jones","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-37729","name":"Jeffrey D. Taylor","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."}],"turns":7,"duration_secs":28.23,"pool_size":687,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.660.96)"}
{"at":"14:00","kind":"emergency","operation":"fill: Loader x4 in Toledo, OH","fills":[{"candidate_id":"W500K-15305","name":"Mary R. Richardson","reason":"Verified availability score of 0.988 via SQL and ranked highest among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-12325","name":"Raj Torres","reason":"Ranked second among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-16975","name":"Brian X. Price","reason":"Ranked third among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-22851","name":"Fatima X. Gutierrez","reason":"Ranked fourth among the candidates with an availability score greater than 0.7."}],"turns":6,"duration_secs":22.25,"pool_size":380,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.660.96)"}

View File

@ -0,0 +1,40 @@
# Client emails — Riverfront Steel, 2026-04-21
## 12:15 expansion — Forklift Operator x5
Subject: 5 Workers Confirmed
Dear Riverfront Steel Team,
We are pleased to confirm that we have filled all five positions for Forklift Operators at your new warehouse location opening. The workers starting at 01:00 PM today are:
- Jennifer K. Robinson
- Kyle F. Brooks
- Jacob T. Diaz
- Jerry M. Jones
- Jeffrey D. Taylor
Each meets the criteria of being a Forklift Operator in Toledo, OH.
Best regards,
Dispatch Team Lakehouse
## 14:00 emergency — Loader x4
Subject: 4 Loader Workers Confirmed
Dear Riverfront Steel Team,
I am pleased to confirm that we have filled all four loader positions as requested:
- Mary R. Richardson
- Raj Torres
- Brian X. Price
- Fatima X. Gutierrez
All workers will start their shift at 04:00 PM today. Please note the walkoff incident requiring a replacement crew by 16:00 sharp.
Thank you for your trust in Lakehouse Dispatch.
Best regards,
Dispatch Team

View File

@ -0,0 +1,85 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 20.2 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 47.4 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | 687 | ✓ 5 | 7 | 28.2 | 0 | 4 |
| 14:00 | emergency | Loader × 4 | 380 | ✓ 4 | 6 | 22.3 | 0 | 4 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 52.5 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
| undefined Jennifer K. Robinson | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Kyle F. Brooks | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Jacob T. Diaz | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Jerry M. Jones | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Jeffrey D. Taylor | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Mary R. Richardson | 14:00 | Loader | Toledo, OH | confirmed |
| undefined Raj Torres | 14:00 | Loader | Toledo, OH | confirmed |
| undefined Brian X. Price | 14:00 | Loader | Toledo, OH | confirmed |
| undefined Fatima X. Gutierrez | 14:00 | Loader | Toledo, OH | confirmed |
## Gap signals
### drift_or_tool
- **08:00** — invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {"kind":"plan","steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = \'Warehouse Associate\' AND city = \'Toledo\' AND state = \'OH\' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})",
"TOOL_CALL sql({'query':'SELECT worker_i
- **10:30** — no consensus after 14 turns
- **15:45** — no consensus after 14 turns
### double_book
- **12:15** — undefined Kyle F. Brooks already booked for 12:15
- **12:15** — undefined Jacob T. Diaz already booked for 12:15
- **12:15** — undefined Jerry M. Jones already booked for 12:15
- **12:15** — undefined Jeffrey D. Taylor already booked for 12:15
- **14:00** — undefined Mary R. Richardson already booked for 12:15
- **14:00** — undefined Raj Torres already booked for 12:15
- **14:00** — undefined Brian X. Price already booked for 12:15
- **14:00** — undefined Fatima X. Gutierrez already booked for 12:15
### fairness
- _cross-event_ — Jennifer K. Robinson (undefined) booked 9 times today
### write_through_audit
- _post-run_ — playbook_memory has 167 entries (ran 5 events, expected ≥ 2 new entries from this run)
## Workers touched across the week
9 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
| W500K-37736 | Jennifer K. Robinson | 12:15 expansion | booked |
| W500K-33961 | Kyle F. Brooks | 12:15 expansion | booked |
| W500K-31297 | Jacob T. Diaz | 12:15 expansion | booked |
| W500K-40884 | Jerry M. Jones | 12:15 expansion | booked |
| W500K-37729 | Jeffrey D. Taylor | 12:15 expansion | booked |
| W500K-15305 | Mary R. Richardson | 14:00 emergency | booked |
| W500K-12325 | Raj Torres | 14:00 emergency | booked |
| W500K-16975 | Brian X. Price | 14:00 emergency | booked |
| W500K-22851 | Fatima X. Gutierrez | 14:00 emergency | booked |
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.660.96)
- **14:00 emergency** (Loader): Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.660.96)
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 2/5 events reached consensus.
- Final roster: 9 bookings across 1 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 9.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 15:45 misplacement.

View File

@ -0,0 +1,165 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 20.215,
"error": "invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = \\'Warehouse Associate\\' AND city = \\'Toledo\\' AND state = \\'OH\\' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_i",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = \\'Warehouse Associate\\' AND city = \\'Toledo\\' AND state = \\'OH\\' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_i"
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 47.392,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": true,
"fills": [
{
"candidate_id": "W500K-37736",
"name": "Jennifer K. Robinson",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-33961",
"name": "Kyle F. Brooks",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-31297",
"name": "Jacob T. Diaz",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-40884",
"name": "Jerry M. Jones",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-37729",
"name": "Jeffrey D. Taylor",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
}
],
"turns": 7,
"duration_secs": 28.23,
"gap_signals": [
"double_book: undefined Kyle F. Brooks already booked for 12:15",
"double_book: undefined Jacob T. Diaz already booked for 12:15",
"double_book: undefined Jerry M. Jones already booked for 12:15",
"double_book: undefined Jeffrey D. Taylor already booked for 12:15"
],
"sources_first_score": 0.6336688,
"sources_last_score": 0.55183524,
"pool_size": 687,
"playbook_citations": [],
"discovered_pattern": "Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.660.96)"
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": true,
"fills": [
{
"candidate_id": "W500K-15305",
"name": "Mary R. Richardson",
"reason": "Verified availability score of 0.988 via SQL and ranked highest among the candidates with an availability score greater than 0.7."
},
{
"candidate_id": "W500K-12325",
"name": "Raj Torres",
"reason": "Ranked second among the candidates with an availability score greater than 0.7."
},
{
"candidate_id": "W500K-16975",
"name": "Brian X. Price",
"reason": "Ranked third among the candidates with an availability score greater than 0.7."
},
{
"candidate_id": "W500K-22851",
"name": "Fatima X. Gutierrez",
"reason": "Ranked fourth among the candidates with an availability score greater than 0.7."
}
],
"turns": 6,
"duration_secs": 22.25,
"gap_signals": [
"double_book: undefined Mary R. Richardson already booked for 12:15",
"double_book: undefined Raj Torres already booked for 12:15",
"double_book: undefined Brian X. Price already booked for 12:15",
"double_book: undefined Fatima X. Gutierrez already booked for 12:15"
],
"sources_first_score": 0.73792297,
"sources_last_score": 0.7001053,
"pool_size": 380,
"playbook_citations": [],
"discovered_pattern": "Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.660.96)"
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 52.523,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
}
]

View File

@ -0,0 +1,74 @@
[
{
"name": "Jennifer K. Robinson",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Kyle F. Brooks",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Jacob T. Diaz",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Jerry M. Jones",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Jeffrey D. Taylor",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Mary R. Richardson",
"booked_for": "14:00",
"role": "Loader",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Raj Torres",
"booked_for": "14:00",
"role": "Loader",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Brian X. Price",
"booked_for": "14:00",
"role": "Loader",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Fatima X. Gutierrez",
"booked_for": "14:00",
"role": "Loader",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
}
]

View File

@ -0,0 +1,46 @@
# SMS drafts — Riverfront Steel, 2026-04-21
## 12:15 expansion — Forklift Operator x5 in Toledo, OH
TO: Jennifer K. Robinson
Confirming your shift as a Forklift Operator at Riverfront Steel's new warehouse in Toledo, OH starting 1:00 PM.
---
TO: Kyle F. Brooks
Your shift as a Forklift Operator at the new Toledo, OH warehouse starts at 1:00 PM today.
---
TO: Jacob T. Diaz
Confirm your shift as a Forklift Operator at Riverfront Steel's new Toledo, OH location starting at 1:00 PM.
---
TO: Jerry M. Jones
Your shift as a Forklift Operator at the new Toledo, OH warehouse starts at 1:00 PM today.
---
TO: Jeffrey D. Taylor
Confirming your shift as a Forklift Operator at Riverfront Steel's new warehouse in Toledo, OH starting 1:00 PM.
## 14:00 emergency — Loader x4 in Toledo, OH
TO: Mary R. Richardson
Confirming your shift start at 4 PM today as a replacement. See you at Toledo, OH.
---
TO: Raj Torres
Replacing shift starting now at 4 PM. Toledo, OH.
---
TO: Brian X. Price
You're on at 4 PM replacing the crew. Toledo, OH.
---
TO: Fatima X. Gutierrez
Confirming your walkoff shift start at 4 PM today. Toledo, OH.

View File

@ -0,0 +1,2 @@
{"after_event":"12:15","event_kind":"expansion","ok":true,"model":"gpt-oss:20b","duration_secs":10.228,"risk":"Forklift Operator JSON error","hint":"Ensure JSON is valid; test with a JSON validator; correct syntax before executing the tool call."}
{"after_event":"15:45","event_kind":"misplacement","ok":false,"model":"gpt-oss:20b","duration_secs":13.935,"hint":"(T3 unavailable)","risk":"generate returned empty text from gpt-oss:20b: {\"text\":\"\",\"model\":\"gpt-oss:20b\","}

View File

@ -0,0 +1 @@
{"at":"08:00","kind":"baseline_fill","operation":"fill: Warehouse Associate x3 in Toledo, OH","fills":[{"candidate_id":"W500K-49164","name":"Christopher Y. Phillips","reason":"SQL verified for high availability and semantic score of 0.63, making him the top candidate."},{"candidate_id":"W500K-34704","name":"Fatima U. Rivera","reason":"Semantic score of 0.61 and skills in cold storage make her a strong candidate."},{"candidate_id":"W500K-40928","name":"Janet E. Hill","reason":"Semantic score of 0.61, RF scanner skill, and high reliability score make her a suitable candidate."}],"turns":5,"duration_secs":19.474,"pool_size":770,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: communicator · reliability median 0.83 (range 0.750.96)"}

View File

@ -0,0 +1,18 @@
# Client emails — Riverfront Steel, 2026-04-21
## 08:00 baseline_fill — Warehouse Associate x3
Subject: 3 Filled
Dear Riverfront Steel Team,
I am pleased to confirm that we have filled all three positions with the following Warehouse Associates:
- Christopher Y. Phillips
- Fatima U. Rivera
- Janet E. Hill
Shift starts at 08:00 AM on a regular Monday morning, 8-hour shift.
Best regards,
Dispatch Team Lakehouse

View File

@ -0,0 +1,9 @@
# Cross-day lesson — Riverfront Steel, 2026-04-21
_Generated by `gpt-oss:20b` in 7.1s. Based on 5 events + 2 mid-day checkpoints._
**
Validate every JSON payload with a validator before invoking a tool; a malformed payload caused the Forklift Operator expansion to fail.
Confirm the GPT model is available and that the tool returns nonempty text; if it returns an empty string, retry or switch to a fallback model.
For recurring, expansion, and emergency events, prefetch the candidate pool and verify it meets the required count before attempting placement.
Log any tool failures immediately and update the risk mitigation plan for the next run.

View File

@ -0,0 +1,71 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest` Overview(T3): `gpt-oss:20b`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | 770 | ✓ 3 | 5 | 19.5 | 0 | 2 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 49.0 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 2.8 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 48.9 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 47.8 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
| undefined Christopher Y. Phillips | 08:00 | Warehouse Associate | Toledo, OH | no_show |
| undefined Fatima U. Rivera | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
| undefined Janet E. Hill | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
## Gap signals
### double_book
- **08:00** — undefined Fatima U. Rivera already booked for 08:00
- **08:00** — undefined Janet E. Hill already booked for 08:00
### drift_or_tool
- **10:30** — no consensus after 14 turns
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
"args":{"index_name":"workers_500k_v1",
"sql_filter":"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND worker_id NOT IN (42319, 68741, 34927)",
"rationale":"Se
- **14:00** — no consensus after 14 turns
- **15:45** — no consensus after 14 turns
### fairness
- _cross-event_ — Christopher Y. Phillips (undefined) booked 2 times today
### write_through_audit
- _post-run_ — playbook_memory has 1163 entries (ran 5 events, expected ≥ 1 new entries from this run)
## Workers touched across the week
4 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
| W500K-49164 | Christopher Y. Phillips | 08:00 baseline_fill | booked |
| W500K-34704 | Fatima U. Rivera | 08:00 baseline_fill | booked |
| W500K-40928 | Janet E. Hill | 08:00 baseline_fill | booked |
| undefined | Christopher Y. Phillips | 08:00 | no_show |
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: communicator · reliability median 0.83 (range 0.750.96)
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 1/5 events reached consensus.
- Final roster: 3 bookings across 1 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 4.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,130 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": true,
"fills": [
{
"candidate_id": "W500K-49164",
"name": "Christopher Y. Phillips",
"reason": "SQL verified for high availability and semantic score of 0.63, making him the top candidate."
},
{
"candidate_id": "W500K-34704",
"name": "Fatima U. Rivera",
"reason": "Semantic score of 0.61 and skills in cold storage make her a strong candidate."
},
{
"candidate_id": "W500K-40928",
"name": "Janet E. Hill",
"reason": "Semantic score of 0.61, RF scanner skill, and high reliability score make her a suitable candidate."
}
],
"turns": 5,
"duration_secs": 19.474,
"gap_signals": [
"double_book: undefined Fatima U. Rivera already booked for 08:00",
"double_book: undefined Janet E. Hill already booked for 08:00"
],
"sources_first_score": 0.6233225,
"sources_last_score": 0.55385745,
"pool_size": 770,
"playbook_citations": [],
"discovered_pattern": "Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: communicator · reliability median 0.83 (range 0.750.96)"
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 48.986,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 2.845,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\n\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND worker_id NOT IN (42319, 68741, 34927)\",\n\"rationale\":\"Se",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\n\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND worker_id NOT IN (42319, 68741, 34927)\",\n\"rationale\":\"Se"
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 48.905,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00",
"exclude_worker_ids": [
null,
null,
null
]
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 47.789,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
}
]

View File

@ -0,0 +1,26 @@
[
{
"name": "Christopher Y. Phillips",
"booked_for": "08:00",
"role": "Warehouse Associate",
"city": "Toledo",
"state": "OH",
"status": "no_show"
},
{
"name": "Fatima U. Rivera",
"booked_for": "08:00",
"role": "Warehouse Associate",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Janet E. Hill",
"booked_for": "08:00",
"role": "Warehouse Associate",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
}
]

View File

@ -0,0 +1,16 @@
# SMS drafts — Riverfront Steel, 2026-04-21
## 08:00 baseline_fill — Warehouse Associate x3 in Toledo, OH
TO: Christopher Y. Phillips
Confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting 08:00 AM today.
---
TO: Fatima U. Rivera
Your shift as a Warehouse Associate at Riverfront Steel is confirmed for 08:00 AM today.
---
TO: Janet E. Hill
Confirming your 08:00 AM shift as a Warehouse Associate at Riverfront Steel in Toledo, OH.

View File

@ -0,0 +1,2 @@
{"after_event":"12:15","event_kind":"expansion","ok":true,"model":"gpt-oss:20b","duration_secs":10.901,"risk":"JSON parse error","hint":"Validate JSON structure, close braces, escape quotes, and test with a JSON linter before executing hybrid_search."}
{"after_event":"15:45","event_kind":"misplacement","ok":true,"model":"gpt-oss:20b","duration_secs":11.83,"risk":"JSON parsing failure in tool call","hint":"Ensure JSON syntax is correct before invoking hybrid_search for Warehouse Associate in Toledo, OH. Validate tool call structure."}

View File

@ -0,0 +1 @@
# Client emails — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,6 @@
# Cross-day lesson — Riverfront Steel, 2026-04-21
_Generated by `gpt-oss:20b` in 4.0s. Based on 5 events + 2 mid-day checkpoints._
**
Always validate the JSON payload before calling `hybrid_search`. Ensure all braces are closed, quotes are escaped, and the structure matches the expected schema—use a linter or schema validator in a sandbox first. Construct the JSON programmatically or via a template rather than embedding raw text in the tool call. This prevents parse errors that cause job failures.

View File

@ -0,0 +1,58 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest` Overview(T3): `gpt-oss:20b`
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 33.1 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 35.1 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 55.3 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 14.7 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 28.8 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
## Gap signals
### drift_or_tool
- **08:00** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["TOOL_CALL",{"tool":"sql","args":{"query":"SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_id = '49164'"}},"TOOL_CALL",{"tool":"hybrid_search","args":{"index_name":"workers_500k_v1","sql_filter":"CAST(availability AS DOUBLE) > 0.5 AND role = 'Warehous
- **10:30** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})","TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_i
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
"args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND NOT worker_id IN (W500K-22375, W500K-19588, W500K-28024,
- **14:00** — aborted — 3 consecutive drift flags
- **15:45** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
"args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (CANDIDATES SURFACED SO FAR)",
"rationale":"Find a reliable Warehouse Associa
### write_through_audit
- _post-run_ — playbook_memory has 1163 entries (ran 5 events, expected ≥ 0 new entries from this run)
## Workers touched across the week
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 0/5 events reached consensus.
- Final roster: 0 bookings across 0 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 0.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,104 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 33.137,
"error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL\",{\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_id = '49164'\"}},\"TOOL_CALL\",{\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"CAST(availability AS DOUBLE) > 0.5 AND role = 'Warehous",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL\",{\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_id = '49164'\"}},\"TOOL_CALL\",{\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"CAST(availability AS DOUBLE) > 0.5 AND role = 'Warehous"
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 35.123,
"error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_i",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_i"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 55.269,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND NOT worker_id IN (W500K-22375, W500K-19588, W500K-28024, ",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND NOT worker_id IN (W500K-22375, W500K-19588, W500K-28024, "
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 14.719,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 28.761,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (CANDIDATES SURFACED SO FAR)\",\n\"rationale\":\"Find a reliable Warehouse Associa",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (CANDIDATES SURFACED SO FAR)\",\n\"rationale\":\"Find a reliable Warehouse Associa"
]
}
]

View File

@ -0,0 +1 @@
# SMS drafts — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,2 @@
{"after_event":"12:15","event_kind":"expansion","ok":false,"model":"gpt-oss:20b","duration_secs":14.223,"hint":"(T3 unavailable)","risk":"generate returned empty text from gpt-oss:20b: {\"text\":\"\",\"model\":\"gpt-oss:20b\","}
{"after_event":"15:45","event_kind":"misplacement","ok":false,"model":"gpt-oss:20b","duration_secs":15.447,"hint":"(T3 unavailable)","risk":"generate returned empty text from gpt-oss:20b: {\"text\":\"\",\"model\":\"gpt-oss:20b\","}

View File

@ -0,0 +1 @@
{"at":"12:15","kind":"expansion","operation":"fill: Forklift Operator x5 in Toledo, OH","fills":[{"candidate_id":"W500K-22375","name":"Matthew P. Garcia","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-19588","name":"Maria K. Cruz","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-3150","name":"Brenda Gutierrez","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-28024","name":"Nancy W. Ward","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."}],"turns":6,"duration_secs":21.998,"pool_size":687,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: leader · reliability median 0.83 (range 0.750.96)"}

View File

@ -0,0 +1,20 @@
# Client emails — Riverfront Steel, 2026-04-21
## 12:15 expansion — Forklift Operator x5
Subject: 5 Workers Confirmed
Dear Riverfront Steel Team,
We are pleased to confirm that we have filled all 5 positions for Forklift Operators as requested. The workers starting at 01:00 PM today are:
- Matthew P. Garcia
- Maria K. Cruz
- Jennifer K. Robinson
- Brenda Gutierrez
- Nancy W. Ward
This is in support of the new warehouse location opening, requiring a five-worker team.
Best regards,
Dispatch Team Lakehouse

View File

@ -0,0 +1,6 @@
# Cross-day lesson — Riverfront Steel, 2026-04-21
_Generated by `gpt-oss:20b` in 14.2s. Based on 5 events + 2 mid-day checkpoints._
**
Before any baseline, recurring, or emergency fill, query the pool size and turn count; missing data causes the job to fail. Replicate the expansion logic that pulls pool and turns for all event types. If the GPTOSS model is unavailable, switch to a local fallback or log a warning instead of returning empty risk text. Validate that gaps are accounted for before committing the fill to avoid singlegap failures.

View File

@ -0,0 +1,76 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest` Overview(T3): `gpt-oss:20b`
Prior lessons loaded into executor context: **0** (baseline — no prior T3 history)
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 47.6 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 29.5 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | 687 | ✓ 5 | 6 | 22.0 | 0 | 4 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 92.4 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 100.9 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
| undefined Matthew P. Garcia | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Maria K. Cruz | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Jennifer K. Robinson | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Brenda Gutierrez | 12:15 | Forklift Operator | Toledo, OH | confirmed |
| undefined Nancy W. Ward | 12:15 | Forklift Operator | Toledo, OH | confirmed |
## Gap signals
### drift_or_tool
- **08:00** — no consensus after 14 turns
- **10:30** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
"args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0",
"rationale":"Narrow down candidates to Machine Operators in Toledo, OH w
- **14:00** — no consensus after 14 turns
- **15:45** — no consensus after 14 turns
### double_book
- **12:15** — undefined Maria K. Cruz already booked for 12:15
- **12:15** — undefined Jennifer K. Robinson already booked for 12:15
- **12:15** — undefined Brenda Gutierrez already booked for 12:15
- **12:15** — undefined Nancy W. Ward already booked for 12:15
### fairness
- _cross-event_ — Matthew P. Garcia (undefined) booked 5 times today
### write_through_audit
- _post-run_ — playbook_memory has 1164 entries (ran 5 events, expected ≥ 1 new entries from this run)
## Workers touched across the week
5 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
| W500K-22375 | Matthew P. Garcia | 12:15 expansion | booked |
| W500K-19588 | Maria K. Cruz | 12:15 expansion | booked |
| W500K-37736 | Jennifer K. Robinson | 12:15 expansion | booked |
| W500K-3150 | Brenda Gutierrez | 12:15 expansion | booked |
| W500K-28024 | Nancy W. Ward | 12:15 expansion | booked |
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: leader · reliability median 0.83 (range 0.750.96)
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 1/5 events reached consensus.
- Final roster: 5 bookings across 1 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 5.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,137 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 47.571,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 29.546,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0\",\n\"rationale\":\"Narrow down candidates to Machine Operators in Toledo, OH w",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0\",\n\"rationale\":\"Narrow down candidates to Machine Operators in Toledo, OH w"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": true,
"fills": [
{
"candidate_id": "W500K-22375",
"name": "Matthew P. Garcia",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-19588",
"name": "Maria K. Cruz",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-37736",
"name": "Jennifer K. Robinson",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-3150",
"name": "Brenda Gutierrez",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
},
{
"candidate_id": "W500K-28024",
"name": "Nancy W. Ward",
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
}
],
"turns": 6,
"duration_secs": 21.998,
"gap_signals": [
"double_book: undefined Maria K. Cruz already booked for 12:15",
"double_book: undefined Jennifer K. Robinson already booked for 12:15",
"double_book: undefined Brenda Gutierrez already booked for 12:15",
"double_book: undefined Nancy W. Ward already booked for 12:15"
],
"sources_first_score": 0.6336688,
"sources_last_score": 0.55183524,
"pool_size": 687,
"playbook_citations": [],
"discovered_pattern": "Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: leader · reliability median 0.83 (range 0.750.96)"
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 92.425,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 100.945,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
}
]

View File

@ -0,0 +1,42 @@
[
{
"name": "Matthew P. Garcia",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Maria K. Cruz",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Jennifer K. Robinson",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Brenda Gutierrez",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
},
{
"name": "Nancy W. Ward",
"booked_for": "12:15",
"role": "Forklift Operator",
"city": "Toledo",
"state": "OH",
"status": "confirmed"
}
]

View File

@ -0,0 +1,26 @@
# SMS drafts — Riverfront Steel, 2026-04-21
## 12:15 expansion — Forklift Operator x5 in Toledo, OH
TO: Matthew P. Garcia
Confirming your shift as a Forklift Operator at Riverfront Steel's new warehouse in Toledo, OH starting 1:00 PM.
---
TO: Maria K. Cruz
You're scheduled to start your shift at 1:00 PM today at our new warehouse location in Toledo, OH.
---
TO: Jennifer K. Robinson
Confirming your shift as a Forklift Operator at Riverfront Steel's new warehouse opening in Toledo, OH starting 1:00 PM.
---
TO: Brenda Gutierrez
Your shift starts at 1:00 PM today at our new warehouse location for Riverfront Steel in Toledo, OH.
---
TO: Nancy W. Ward
Confirming your shift as a Forklift Operator at Riverfront Steel's new warehouse opening in Toledo, OH starting 1:00 PM.

View File

@ -0,0 +1 @@
# Client emails — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,59 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest` Overview(T3): `disabled`
Prior lessons loaded into executor context: **0** (baseline — no prior T3 history)
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 99.1 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 114.5 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 14.5 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 49.7 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 26.6 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
## Gap signals
### drift_or_tool
- **08:00** — aborted — 3 consecutive drift flags
- **10:30** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan", "steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0'})", "TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state FRO
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
"args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.75 AND CAST(reliability AS DOUBLE) > 0.9",
"question":"top 5 reliable forklift operators Toledo with h
- **14:00** — no consensus after 14 turns
- **15:45** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
"args":{"index_name":"workers_500k_v1","sql_filter":"CAST(availability AS DOUBLE) > 0.5 AND role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND worker_id NOT IN (49164,40928,34704,5749,22587,4091,23160,5114,15482,11915,36011,17171,11061,4
### write_through_audit
- _post-run_ — playbook_memory has 1164 entries (ran 5 events, expected ≥ 0 new entries from this run)
## Workers touched across the week
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 0/5 events reached consensus.
- Final roster: 0 bookings across 0 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 0.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,104 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 99.133,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 114.512,
"error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0'})\", \"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state FRO",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0'})\", \"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state FRO"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 14.525,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.75 AND CAST(reliability AS DOUBLE) > 0.9\",\n\"question\":\"top 5 reliable forklift operators Toledo with h",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.75 AND CAST(reliability AS DOUBLE) > 0.9\",\n\"question\":\"top 5 reliable forklift operators Toledo with h"
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 49.725,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 26.607,
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"CAST(availability AS DOUBLE) > 0.5 AND role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND worker_id NOT IN (49164,40928,34704,5749,22587,4091,23160,5114,15482,11915,36011,17171,11061,4",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"CAST(availability AS DOUBLE) > 0.5 AND role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND worker_id NOT IN (49164,40928,34704,5749,22587,4091,23160,5114,15482,11915,36011,17171,11061,4"
]
}
]

View File

@ -0,0 +1 @@
# SMS drafts — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,2 @@
{"after_event":"12:15","event_kind":"expansion","ok":false,"model":"gpt-oss:20b","duration_secs":14.287,"hint":"(T3 unavailable)","risk":"generate returned empty text from gpt-oss:20b: {\"text\":\"\",\"model\":\"gpt-oss:20b\","}
{"after_event":"15:45","event_kind":"misplacement","ok":true,"model":"gpt-oss:20b","duration_secs":14.587,"risk":"Forklift Operator skill gap","hint":"Verify forklift operator certification and tool compatibility for Toledo shift."}

View File

@ -0,0 +1 @@
# Client emails — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,6 @@
# Cross-day lesson — Riverfront Steel, 2026-04-21
_Generated by `gpt-oss:20b` in 6.0s. Based on 5 events + 2 mid-day checkpoints._
**
Before any event, prefetch the full pool roster and skill certification data for Toledo, OH; the missing pool data caused every shift to fail. Verify forklift operator certifications and tool compatibility ahead of time, as the misplacement risk highlighted a skill gap. Ensure the riskgeneration model (gptoss:20b) is online or have a manual fallback; the empty response after the expansion shows a T3 unavailability that halted risk assessment. Apply these checks for baseline, recurring, expansion, emergency, and misplacement events to avoid the singlegap failure pattern.

View File

@ -0,0 +1,28 @@
[
{
"date": "2026-04-21",
"client": "Riverfront Steel",
"cities": "Toledo",
"states": "OH",
"events_total": 5,
"events_ok": 1,
"checkpoint_count": 2,
"model": "gpt-oss:20b",
"cloud": false,
"lesson": "** \nBefore any baseline, recurring, or emergency fill, query the pool size and turn count; missing data causes the job to fail. Replicate the expansion logic that pulls pool and turns for all event types. If the GPTOSS model is unavailable, switch to a local fallback or log a warning instead of returning empty risk text. Validate that gaps are accounted for before committing the fill to avoid singlegap failures.",
"checkpoints": [
{
"after": "12:15",
"risk": "generate returned empty text from gpt-oss:20b: {\"text\":\"\",\"model\":\"gpt-oss:20b\",",
"hint": "(T3 unavailable)"
},
{
"after": "15:45",
"risk": "generate returned empty text from gpt-oss:20b: {\"text\":\"\",\"model\":\"gpt-oss:20b\",",
"hint": "(T3 unavailable)"
}
],
"created_at": "2026-04-21T00:34:20.521Z",
"file": "2026-04-21_Riverfront_Steel_1776731660521.json"
}
]

View File

@ -0,0 +1,60 @@
# Scenario retrospective — Riverfront Steel, 2026-04-21
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest` Overview(T3): `gpt-oss:20b`
Prior lessons loaded into executor context: **1** (from 2026-04-21)
## Events
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|---|---|---|---|---|---|---|---|---|
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 13.9 | 0 | 1 |
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 13.3 | 0 | 1 |
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 30.7 | 0 | 1 |
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 23.1 | 0 | 1 |
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 51.1 | 0 | 1 |
## Final roster
| Worker | Booked | Role | City, ST | Status |
|---|---|---|---|---|
## Gap signals
### drift_or_tool
- **08:00** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["TOOL_CALL","hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})","TOOL_CALL","sql({'query':'SELECT worker_id,
- **10:30** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["TOOL_CALL","tool":"hybrid_search","args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0"},"rationale":"Narrow the search to recurring Machine Opera
- **12:15** — invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {
"kind": "plan",
"steps": [
"TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = \'Forklift Operator\' AND city = \'Toledo\' AND state = \'OH\' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75', 'k': 10})",
"TOOL_CALL sql({'query'
- **14:00** — aborted — 3 consecutive drift flags
- **15:45** — no consensus after 14 turns
### write_through_audit
- _post-run_ — playbook_memory has 1164 entries (ran 5 events, expected ≥ 0 new entries from this run)
## Workers touched across the week
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
| Worker ID | Name | Events | Outcome |
|---|---|---|---|
## Discovered patterns (meta-index)
What the system identified across semantically-similar past fills as each event ran:
- **08:00 baseline_fill** (Warehouse Associate): —
- **10:30 recurring** (Machine Operator): —
- **12:15 expansion** (Forklift Operator): —
- **14:00 emergency** (Loader): —
- **15:45 misplacement** (Warehouse Associate): —
## Narrative
- 0/5 events reached consensus.
- Final roster: 0 bookings across 0 distinct workers.
- Workers touched (booked, failed, or otherwise decided): 0.
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.

View File

@ -0,0 +1,104 @@
[
{
"event": {
"kind": "baseline_fill",
"at": "08:00",
"role": "Warehouse Associate",
"count": 3,
"city": "Toledo",
"state": "OH",
"shift_start": "08:00 AM",
"scenario_note": "Regular Monday morning shift, 8-hour."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 13.874,
"error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL\",\"hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})\",\"TOOL_CALL\",\"sql({'query':'SELECT worker_id, ",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL\",\"hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})\",\"TOOL_CALL\",\"sql({'query':'SELECT worker_id, "
]
},
{
"event": {
"kind": "recurring",
"at": "10:30",
"role": "Machine Operator",
"count": 2,
"city": "Toledo",
"state": "OH",
"shift_start": "11:00 AM",
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 13.257,
"error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0\"},\"rationale\":\"Narrow the search to recurring Machine Opera",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND playbook_citations > 0\"},\"rationale\":\"Narrow the search to recurring Machine Opera"
]
},
{
"event": {
"kind": "expansion",
"at": "12:15",
"role": "Forklift Operator",
"count": 5,
"city": "Toledo",
"state": "OH",
"shift_start": "01:00 PM",
"scenario_note": "New warehouse location opening, five-worker team needed."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 30.707,
"error": "invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\n \"kind\": \"plan\",\n \"steps\": [\n \"TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = \\'Forklift Operator\\' AND city = \\'Toledo\\' AND state = \\'OH\\' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75', 'k': 10})\",\n \"TOOL_CALL sql({'query'",
"gap_signals": [
"drift_or_tool: invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\n \"kind\": \"plan\",\n \"steps\": [\n \"TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = \\'Forklift Operator\\' AND city = \\'Toledo\\' AND state = \\'OH\\' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75', 'k': 10})\",\n \"TOOL_CALL sql({'query'"
]
},
{
"event": {
"kind": "emergency",
"at": "14:00",
"role": "Loader",
"count": 4,
"city": "Toledo",
"state": "OH",
"shift_start": "04:00 PM same day",
"deadline": "16:00",
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 23.148,
"error": "aborted — 3 consecutive drift flags",
"gap_signals": [
"drift_or_tool: aborted — 3 consecutive drift flags"
]
},
{
"event": {
"kind": "misplacement",
"at": "15:45",
"role": "Warehouse Associate",
"count": 1,
"city": "Toledo",
"state": "OH",
"shift_start": "remainder of 08:00 shift",
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
"replaces_event": "08:00"
},
"ok": false,
"fills": [],
"turns": 0,
"duration_secs": 51.075,
"error": "no consensus after 14 turns",
"gap_signals": [
"drift_or_tool: no consensus after 14 turns"
]
}
]

View File

@ -0,0 +1 @@
# SMS drafts — Riverfront Steel, 2026-04-21

View File

@ -0,0 +1,2 @@
{"after_event":"12:15","event_kind":"expansion","ok":true,"model":"gpt-oss:20b","duration_secs":12.189,"risk":"JSON syntax error in tool calls","hint":"For the next Forklift Operator expansion, escape single quotes in SQL query or use a parameterized query; validate JSON with a linter before execution."}
{"after_event":"15:45","event_kind":"misplacement","ok":true,"model":"gpt-oss:20b","duration_secs":15.773,"risk":"Warehouse Associate JSON error","hint":"Escape quotes in SQL query; close JSON braces before sending to executor."}

Some files were not shown because too many files have changed in this diff Show More