Compare commits
24 Commits
main
...
scrum/auto
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
dcf4c9a8e7 | ||
|
|
3c6d2c5f74 | ||
|
|
8e1855e779 | ||
|
|
313eec3c6e | ||
|
|
51cc0a69cf | ||
|
|
528fded11b | ||
|
|
64700ea6da | ||
|
|
5225211e45 | ||
|
|
4f0b6fb9b3 | ||
|
|
8e781ac325 | ||
|
|
ee0450b7c3 | ||
|
|
0f9b4aa2fe | ||
|
|
e4eb0fa168 | ||
|
|
93081bed5c | ||
|
|
885a1acf19 | ||
|
|
97888e3775 | ||
|
|
9b8befaa94 | ||
|
|
2965b68a9d | ||
|
|
08c8debfff | ||
|
|
b02cf5b9e1 | ||
|
|
1ac8045924 | ||
|
|
52d2da2f44 | ||
|
|
d44ad3af1e | ||
|
|
89ac6a9b5b |
38
.gitignore
vendored
38
.gitignore
vendored
@ -12,41 +12,3 @@ data/headshots/_thumbs/
|
||||
# ComfyUI on-demand generated portraits (per-worker unique). Cached on first
|
||||
# request; fully regeneratable via /headshots/generate/:key.
|
||||
data/headshots_gen/
|
||||
|
||||
# Runtime data — all regeneratable from inputs or accumulated by daemons.
|
||||
# Anything under data/_<name>/ is internal state (auditor outputs, KB caches,
|
||||
# pathway memory snapshots, HNSW trial results, etc.). Anything under
|
||||
# data/datasets/ or data/vectors/ is generated by ingest/index pipelines.
|
||||
data/_*/
|
||||
data/lance/
|
||||
data/datasets/
|
||||
data/vectors/
|
||||
data/demo/
|
||||
data/evidence/
|
||||
data/face_test/
|
||||
data/headshots_role_pool/
|
||||
data/icons_pool/
|
||||
data/scored-runs/
|
||||
data/workspaces/
|
||||
data/catalog/
|
||||
data/**/*.bak-*
|
||||
data/**/*.pre-*-bak
|
||||
|
||||
# Logs
|
||||
logs/
|
||||
|
||||
# Build artifacts
|
||||
node_modules/
|
||||
exports/
|
||||
mcp-server/data/
|
||||
|
||||
# Per-run distillation reports (timestamp-named); keep the parent dir tracked
|
||||
# via .gitkeep if needed but don't carry every batch's report set.
|
||||
reports/distillation/[0-9]*/
|
||||
reports/distillation/*-*-*-*-*/
|
||||
|
||||
# Test scratch — scratchpads, traces, sessions are regenerated each run.
|
||||
# PRD/scenario fixtures stay tracked (they ARE the test).
|
||||
tests/agent_test/_*
|
||||
tests/agent_test/sessions/
|
||||
tests/real-world/runs/
|
||||
|
||||
1
Cargo.lock
generated
1
Cargo.lock
generated
@ -48,7 +48,6 @@ version = "0.1.0"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"axum",
|
||||
"lru",
|
||||
"reqwest",
|
||||
"serde",
|
||||
"serde_json",
|
||||
|
||||
@ -1,38 +1,12 @@
|
||||
# STATE OF PLAY — Lakehouse
|
||||
|
||||
**Last verified:** 2026-05-02 evening CDT
|
||||
**Verified by:** live probe (smoke 9/9, parity 32/32, gateway restarted), not memory.
|
||||
**Last verified:** 2026-04-27 ~20:35 CDT
|
||||
**Verified by:** live probe, not memory.
|
||||
|
||||
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||
|
||||
---
|
||||
|
||||
## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits this wave)
|
||||
|
||||
| Commit | What | Verified |
|
||||
|---|---|---|
|
||||
| `5d30b3d` | lance: auto-build doc_id btree in `lance_migrate` handler | doc-fetch ~5ms (was ~100ms full scan) on scale_test_10m |
|
||||
| `044650a` | lance-bench: same scalar build post-IVF (matches gateway) | cargo check clean |
|
||||
| `7594725` | lance: 4-pack — `sanitize_lance_err` + 7 unit tests + 9-probe smoke + 10M re-bench | smoke 9/9 PASS, tests 7/7 PASS |
|
||||
| `98b6647` | gateway: `IterateResponse.trace_id` echoed; session_log_path enabled | parity probes see one unified JSONL |
|
||||
| `57bde63` | gateway: trace-id propagation + coordinator session JSONL (Rust parity with Go wave) | session_log_parity 4/4 |
|
||||
| `ba928b1` | aibridge: drop Python sidecar from hot path; AiClient → direct Ollama | aibridge tests 32/32 PASS, /ai/embed live 768d |
|
||||
| `654797a` | gateway: pub `extract_json` + `parity_extract_json` bin | extract_json_parity 12/12 |
|
||||
| `c5654d4` | docs: pointer to `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md` | — |
|
||||
| `150cc3b` | aibridge: LRU embed cache, 236× RPS warm (78ms → 129us p50) | load test |
|
||||
| `9eed982` | mcp-server: /_go/* pass-through for G5 cutover slice | — |
|
||||
| `6e34ef7` | gitignore: stop tracking 100+ runtime ephemera (data/_*, lance, logs, node_modules) | untracked dropped 100+ → 0 |
|
||||
| `41b0a99` | chore: add 33 real items that were sitting untracked (scripts, scenarios, kimi reports, dev UIs) | clean working tree |
|
||||
|
||||
**Cross-runtime parity (post-this-wave):** 32/32 across 5 probes — `validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8)`. Run `cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done` to re-verify.
|
||||
|
||||
**Lance backend (was untested 5 days ago, now gauntlet-ready):**
|
||||
- `cargo test -p vectord-lance --release` → 7/7 PASS
|
||||
- `./scripts/lance_smoke.sh` → 9/9 PASS against live gateway
|
||||
- `reports/lance_10m_rebench_2026-05-02.md` — search warm ~20ms / cold ~46ms median, doc-fetch ~5ms post-btree
|
||||
|
||||
---
|
||||
|
||||
## VERIFIED WORKING RIGHT NOW
|
||||
|
||||
### The client demo (Staffing Co-Pilot)
|
||||
@ -118,10 +92,6 @@ OpenCode catalog (live):
|
||||
- **Decisions A/B/C/D from `synthetic-data-gap-report.md`** — all four scripts shipped today (`d56f08e`, `940737d`, `c3c9c21`). Do not "ask J for approval."
|
||||
- **`workers_500k.phone` type fixup** — already string. The fixup script is idempotent; running it is a no-op.
|
||||
- **`client_workerskjkk` typo dataset** — was breaking every SQL query (catalog had it registered, file didn't exist). Removed via `DELETE /catalog/datasets/by-name/client_workerskjkk` this session. Do not re-add. Adding a startup gate that errors on unrecognized parquet names is the long-term fix per now.md Step 2C.
|
||||
- **Python sidecar dropped from hot path 2026-05-02 (`ba928b1`)** — AiClient calls Ollama directly. Do not "wire python embedding back in." `lab_ui.py` + `pipeline_lab.py` keep running as dev-only UIs (not on the runtime path).
|
||||
- **Lance backend gauntlet (2026-05-02)** — sanitizer over all 5 routes, 7 unit tests, 9-probe smoke, 10M re-bench. The `doc_id` btree auto-builds inside `lance_migrate` AND `lance-bench`. Do not "discover" the missing scalar index again or the leaked filesystem paths in error bodies.
|
||||
- **Cross-runtime parity = 32/32** across 5 probes in `golangLAKEHOUSE/scripts/cutover/parity/`. Do not "build a parity probe for X" without checking — validator, extract_json, session_log, materializer, and embed are all already covered.
|
||||
- **Decisions tracker is `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`** — single living source of truth for cross-runtime decisions. As of 2026-05-02 it has 0 `_open_` code work items; only 2 strategic items left (Lance vs Parquet+HNSW-with-spilling, Go-vs-Rust primary cutover).
|
||||
|
||||
---
|
||||
|
||||
|
||||
@ -16,14 +16,12 @@ import type { Gap, Proposal } from "./types.ts";
|
||||
// Phase 44 migration (2026-04-27): bot/propose.ts now flows through
|
||||
// the gateway's /v1/chat instead of hitting the sidecar's /generate
|
||||
// directly. /v1/usage tracks the call, Langfuse traces it, observer
|
||||
// sees it. Gateway owns the routing.
|
||||
//
|
||||
// 2026-04-28: gpt-oss:120b → deepseek-v3.2 via Ollama Pro. Newer
|
||||
// DeepSeek revision, faster, still on the same OLLAMA_CLOUD_KEY.
|
||||
// sees it. Same upstream model (CLOUD_MODEL gpt-oss:120b on
|
||||
// Ollama Cloud) — gateway just owns the routing.
|
||||
const GATEWAY_URL = process.env.LH_GATEWAY_URL ?? "http://localhost:3100";
|
||||
const REPO_ROOT = "/home/profit/lakehouse";
|
||||
const PRD_PATH = `${REPO_ROOT}/docs/PRD.md`;
|
||||
const CLOUD_MODEL = process.env.LH_BOT_MODEL ?? "deepseek-v3.2";
|
||||
const CLOUD_MODEL = process.env.LH_BOT_MODEL ?? "gpt-oss:120b";
|
||||
const MAX_TOKENS = 6000;
|
||||
|
||||
export async function findGaps(): Promise<Gap[]> {
|
||||
|
||||
@ -44,10 +44,7 @@ name = "staffing_inference"
|
||||
# pattern generalizes beyond code review.
|
||||
preferred_mode = "staffing_inference_lakehouse"
|
||||
fallback_modes = ["ladder", "consensus", "pipeline"]
|
||||
# 2026-04-28: gpt-oss-120b:free → kimi-k2.6 via Ollama Pro. Coding-
|
||||
# specialized, faster than gpt-oss, on the same OLLAMA_CLOUD_KEY so
|
||||
# no extra provider hop.
|
||||
default_model = "kimi-k2.6"
|
||||
default_model = "openai/gpt-oss-120b:free"
|
||||
matrix_corpus = "workers_500k_v8"
|
||||
|
||||
[[task_class]]
|
||||
@ -61,9 +58,7 @@ matrix_corpus = "kb_team_runs_v1"
|
||||
name = "doc_drift_check"
|
||||
preferred_mode = "drift"
|
||||
fallback_modes = ["validator"]
|
||||
# 2026-04-28: gpt-oss:120b → gemini-3-flash-preview via Ollama Pro.
|
||||
# Speed leader on factual checking, same OLLAMA_CLOUD_KEY.
|
||||
default_model = "gemini-3-flash-preview"
|
||||
default_model = "gpt-oss:120b"
|
||||
matrix_corpus = "distilled_factual_v20260423095819"
|
||||
|
||||
[[task_class]]
|
||||
|
||||
@ -15,29 +15,22 @@
|
||||
|
||||
[[provider]]
|
||||
name = "ollama"
|
||||
base_url = "http://localhost:11434"
|
||||
base_url = "http://localhost:3200"
|
||||
auth = "none"
|
||||
default_model = "qwen3.5:latest"
|
||||
# Hot-path local inference. No bearer needed — direct to Ollama as of
|
||||
# 2026-05-02 (Python sidecar's pass-through wrapper retired). Model
|
||||
# names are bare (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
|
||||
# Hot-path local inference. No bearer needed — Python sidecar on
|
||||
# localhost handles the Ollama API. Model names are bare
|
||||
# (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
|
||||
|
||||
[[provider]]
|
||||
name = "ollama_cloud"
|
||||
base_url = "https://ollama.com"
|
||||
auth = "bearer"
|
||||
auth_env = "OLLAMA_CLOUD_KEY"
|
||||
default_model = "deepseek-v3.2"
|
||||
# Cloud-tier Ollama (Pro plan as of 2026-04-28). Key resolved from
|
||||
# OLLAMA_CLOUD_KEY at gateway boot; Pro tier upgraded the account so
|
||||
# rate limits + model access widen without a key change. Model-prefix
|
||||
# routing: "cloud/<model>" auto-routes here. 39-model fleet now
|
||||
# includes deepseek-v3.2, deepseek-v4-{flash,pro}, gemini-3-flash-
|
||||
# preview, glm-{5,5.1}, kimi-k2.6, qwen3-coder-next.
|
||||
# 2026-04-28: default upgraded gpt-oss:120b → deepseek-v3.2 (newest
|
||||
# DeepSeek revision). NOTE: kimi-k2:1t is upstream-broken (HTTP 500
|
||||
# on Ollama Pro probe 2026-04-28) — do not route to it. Use kimi-k2.6
|
||||
# instead, which is what staffing_inference points at.
|
||||
default_model = "gpt-oss:120b"
|
||||
# Cloud-tier Ollama. Key resolved from OLLAMA_CLOUD_KEY env at gateway
|
||||
# boot. Model-prefix routing: "cloud/<model>" auto-routes here
|
||||
# (see gateway::v1::resolve_provider).
|
||||
|
||||
[[provider]]
|
||||
name = "openrouter"
|
||||
@ -45,7 +38,7 @@ base_url = "https://openrouter.ai/api/v1"
|
||||
auth = "bearer"
|
||||
auth_env = "OPENROUTER_API_KEY"
|
||||
auth_fallback_files = ["/home/profit/.env", "/root/llm_team_config.json"]
|
||||
default_model = "x-ai/grok-4.1-fast"
|
||||
default_model = "openai/gpt-oss-120b:free"
|
||||
# Multi-provider gateway. Covers Anthropic, Google, OpenAI, MiniMax,
|
||||
# Qwen, Gemma, etc. Key resolved via crates/gateway/src/v1/openrouter.rs
|
||||
# resolve_openrouter_key() — env first, then fallback files.
|
||||
@ -81,10 +74,8 @@ auth_env = "KIMI_API_KEY"
|
||||
default_model = "kimi-for-coding"
|
||||
# Direct Kimi For Coding provider. `api.kimi.com` is a SEPARATE account
|
||||
# system from `api.moonshot.ai` and `api.moonshot.cn` — keys are NOT
|
||||
# interchangeable. Used as a fallback when Ollama Cloud's kimi-k2.6 is
|
||||
# unavailable and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited.
|
||||
# (Was `kimi-k2:1t` here pre-2026-05-03 — that model is upstream-broken
|
||||
# and removed from operator guidance.)
|
||||
# interchangeable. Used when Ollama Cloud's `kimi-k2:1t` is upstream-
|
||||
# broken and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited.
|
||||
# Model id: `kimi-for-coding` (kimi-k2.6 underneath).
|
||||
# Key file: /etc/lakehouse/kimi.env (loaded via systemd EnvironmentFile).
|
||||
# Model-prefix routing: "kimi/<model>" auto-routes here, prefix stripped.
|
||||
|
||||
@ -12,4 +12,3 @@ serde_json = { workspace = true }
|
||||
tracing = { workspace = true }
|
||||
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
|
||||
async-trait = "0.1"
|
||||
lru = "0.12"
|
||||
|
||||
@ -1,74 +1,28 @@
|
||||
use lru::LruCache;
|
||||
use reqwest::Client;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::num::NonZeroUsize;
|
||||
use std::sync::Mutex;
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
|
||||
/// HTTP client for Ollama (post-2026-05-02 — sidecar dropped).
|
||||
///
|
||||
/// `base_url` was historically the Python sidecar at `:3200`, which
|
||||
/// pass-through-proxied to Ollama at `:11434`. The sidecar added zero
|
||||
/// logic on the hot path (embed.py + generate.py + rerank.py +
|
||||
/// admin.py = ~120 LOC of pure Ollama wrappers), so this client now
|
||||
/// talks to Ollama directly and the sidecar process can be retired.
|
||||
///
|
||||
/// What stayed Python: `lab_ui.py` + `pipeline_lab.py` (~888 LOC of
|
||||
/// dev-mode Streamlit-shape UIs) — those aren't on the runtime hot
|
||||
/// path and continue running for prompt experimentation.
|
||||
/// HTTP client for the Python AI sidecar.
|
||||
///
|
||||
/// `generate()` has two transport modes:
|
||||
/// - When `gateway_url` is None (default), posts directly to Ollama's
|
||||
/// `${base_url}/api/generate`.
|
||||
/// - When `gateway_url` is `Some(url)`, posts to `${url}/v1/chat`
|
||||
/// with `provider="ollama"` so the call appears in `/v1/usage` and
|
||||
/// Langfuse traces.
|
||||
/// - When `gateway_url` is None (default), it posts to
|
||||
/// `${base_url}/generate` (sidecar direct).
|
||||
/// - When `gateway_url` is `Some(url)`, it posts to
|
||||
/// `${url}/v1/chat` with `provider="ollama"` so the call appears
|
||||
/// in `/v1/usage` and Langfuse traces.
|
||||
///
|
||||
/// `embed()`, `rerank()`, and admin methods always go direct to
|
||||
/// Ollama — no `/v1` equivalent for those surfaces yet.
|
||||
/// `embed()`, `rerank()`, and admin methods always go direct to the
|
||||
/// sidecar — no `/v1` equivalent yet, no point round-tripping.
|
||||
///
|
||||
/// Phase 44 part 2 (2026-04-27): the gateway URL is wired in by
|
||||
/// callers that want observability (vectord modules); it's left
|
||||
/// unset by callers that ARE the gateway internals (avoids self-loops
|
||||
/// + redundant hops).
|
||||
/// Per-text embed cache key. We key on (model, text) so different
|
||||
/// model selections produce distinct cache lines — a query embedded
|
||||
/// under nomic-embed-text-v2-moe must NOT collide with the same
|
||||
/// query under nomic-embed-text v1.
|
||||
#[derive(Eq, PartialEq, Hash, Clone)]
|
||||
struct EmbedCacheKey {
|
||||
model: String,
|
||||
text: String,
|
||||
}
|
||||
|
||||
/// Default LRU cache size — 4096 entries × ~6KB per 768-d f64
|
||||
/// vector ≈ 24MB. Sized for typical staffing-domain repetition
|
||||
/// (coordinator workflows have query repetition rates around 70-90%
|
||||
/// per session). Tunable via [aibridge].embed_cache_size in the
|
||||
/// config; 0 disables the cache entirely.
|
||||
const DEFAULT_EMBED_CACHE_SIZE: usize = 4096;
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct AiClient {
|
||||
client: Client,
|
||||
base_url: String,
|
||||
gateway_url: Option<String>,
|
||||
/// Closes the 63× perf gap with Go side. Mirrors the shape of
|
||||
/// Go's internal/embed/cached.go::CachedProvider — same
|
||||
/// (model, text) → vector caching, same nil-disable semantics.
|
||||
/// None = caching disabled (cache_size=0); Some = bounded LRU.
|
||||
embed_cache: Option<Arc<Mutex<LruCache<EmbedCacheKey, Vec<f64>>>>>,
|
||||
/// Hit / miss counters for /admin observability + load-test
|
||||
/// validation. Atomic so Clone'd AiClients share the same counts.
|
||||
embed_cache_hits: Arc<AtomicU64>,
|
||||
embed_cache_misses: Arc<AtomicU64>,
|
||||
/// Pinned at construction time so the EmbedResponse can carry
|
||||
/// dimension consistently even when every text was a cache hit
|
||||
/// (no fresh upstream call to learn the dim from). Set on first
|
||||
/// successful Ollama embed; checked on every cache hit.
|
||||
cached_dim: Arc<AtomicU64>,
|
||||
}
|
||||
|
||||
// -- Request/Response types --
|
||||
@ -141,49 +95,17 @@ pub struct RerankResponse {
|
||||
|
||||
impl AiClient {
|
||||
pub fn new(base_url: &str) -> Self {
|
||||
Self::with_embed_cache(base_url, DEFAULT_EMBED_CACHE_SIZE)
|
||||
}
|
||||
|
||||
/// Constructs an AiClient with an explicit embed-cache size.
|
||||
/// Pass 0 to disable the cache entirely (matches Go-side
|
||||
/// CachedProvider's nil-cache semantics).
|
||||
pub fn with_embed_cache(base_url: &str, cache_size: usize) -> Self {
|
||||
let client = Client::builder()
|
||||
.timeout(Duration::from_secs(120))
|
||||
.build()
|
||||
.expect("failed to build HTTP client");
|
||||
let embed_cache = if cache_size > 0 {
|
||||
// SAFETY: cache_size > 0 just verified, NonZeroUsize::new
|
||||
// returns Some.
|
||||
let cap = NonZeroUsize::new(cache_size).expect("cache_size > 0");
|
||||
Some(Arc::new(Mutex::new(LruCache::new(cap))))
|
||||
} else {
|
||||
None
|
||||
};
|
||||
Self {
|
||||
client,
|
||||
base_url: base_url.trim_end_matches('/').to_string(),
|
||||
gateway_url: None,
|
||||
embed_cache,
|
||||
embed_cache_hits: Arc::new(AtomicU64::new(0)),
|
||||
embed_cache_misses: Arc::new(AtomicU64::new(0)),
|
||||
cached_dim: Arc::new(AtomicU64::new(0)),
|
||||
}
|
||||
}
|
||||
|
||||
/// Cache hit/miss/size snapshot. Useful for /admin endpoints +
|
||||
/// load-test validation ("did the cache fire as expected?").
|
||||
pub fn embed_cache_stats(&self) -> (u64, u64, usize) {
|
||||
let hits = self.embed_cache_hits.load(Ordering::Relaxed);
|
||||
let misses = self.embed_cache_misses.load(Ordering::Relaxed);
|
||||
let len = self
|
||||
.embed_cache
|
||||
.as_ref()
|
||||
.map(|c| c.lock().map(|g| g.len()).unwrap_or(0))
|
||||
.unwrap_or(0);
|
||||
(hits, misses, len)
|
||||
}
|
||||
|
||||
/// Same as `new`, but every `generate()` is routed through
|
||||
/// `${gateway_url}/v1/chat` (provider=ollama) for observability.
|
||||
/// Use this for callers OUTSIDE the gateway. Inside the gateway
|
||||
@ -196,222 +118,50 @@ impl AiClient {
|
||||
c
|
||||
}
|
||||
|
||||
/// Reachability + version check. Hits Ollama's `/api/version`,
|
||||
/// returns a sidecar-shaped envelope so callers reading
|
||||
/// `.status` / `.ollama_url` don't break across the
|
||||
/// pre-/post-2026-05-02 cutover.
|
||||
pub async fn health(&self) -> Result<serde_json::Value, String> {
|
||||
let resp = self.client
|
||||
.get(format!("{}/api/version", self.base_url))
|
||||
.get(format!("{}/health", self.base_url))
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("ollama unreachable: {e}"))?;
|
||||
let body: serde_json::Value = resp.json().await
|
||||
.map_err(|e| format!("invalid response: {e}"))?;
|
||||
Ok(serde_json::json!({
|
||||
"status": "ok",
|
||||
"ollama_url": &self.base_url,
|
||||
"ollama_version": body.get("version"),
|
||||
}))
|
||||
.map_err(|e| format!("sidecar unreachable: {e}"))?;
|
||||
resp.json().await.map_err(|e| format!("invalid response: {e}"))
|
||||
}
|
||||
|
||||
/// Embed with per-text LRU caching. Mirrors Go-side
|
||||
/// CachedProvider behavior: cache key is (model, text);
|
||||
/// cache-hit texts skip the sidecar; cache-miss texts batch
|
||||
/// into a single sidecar call; results are interleaved in the
|
||||
/// caller's input order.
|
||||
///
|
||||
/// Closes ~95% of the load-test perf gap vs Go side (loadgen
|
||||
/// 2026-05-01: Rust 128 RPS → with cache ≥ 7000 RPS expected
|
||||
/// for warm-cache workloads). Cold-cache behavior unchanged
|
||||
/// (every text is a miss → single sidecar call, identical to
|
||||
/// pre-cache).
|
||||
pub async fn embed(&self, req: EmbedRequest) -> Result<EmbedResponse, String> {
|
||||
let model_key = req.model.clone().unwrap_or_default();
|
||||
let resp = self.client
|
||||
.post(format!("{}/embed", self.base_url))
|
||||
.json(&req)
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("embed request failed: {e}"))?;
|
||||
|
||||
// Fast path: cache disabled → original behavior.
|
||||
let Some(cache) = self.embed_cache.as_ref() else {
|
||||
return self.embed_uncached(&req).await;
|
||||
};
|
||||
if req.texts.is_empty() {
|
||||
return self.embed_uncached(&req).await;
|
||||
if !resp.status().is_success() {
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
return Err(format!("embed error ({}): {text}", text.len()));
|
||||
}
|
||||
|
||||
// First pass: check cache for each text. Track which positions
|
||||
// need a sidecar fetch.
|
||||
let mut embeddings: Vec<Option<Vec<f64>>> = vec![None; req.texts.len()];
|
||||
let mut miss_indices: Vec<usize> = Vec::new();
|
||||
let mut miss_texts: Vec<String> = Vec::new();
|
||||
{
|
||||
let mut guard = cache.lock().map_err(|e| format!("cache lock poisoned: {e}"))?;
|
||||
for (i, text) in req.texts.iter().enumerate() {
|
||||
let key = EmbedCacheKey { model: model_key.clone(), text: text.clone() };
|
||||
if let Some(vec) = guard.get(&key) {
|
||||
embeddings[i] = Some(vec.clone());
|
||||
self.embed_cache_hits.fetch_add(1, Ordering::Relaxed);
|
||||
} else {
|
||||
miss_indices.push(i);
|
||||
miss_texts.push(text.clone());
|
||||
self.embed_cache_misses.fetch_add(1, Ordering::Relaxed);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// All hit? Return immediately. Use cached_dim to populate
|
||||
// the response dimension (no sidecar to ask).
|
||||
if miss_indices.is_empty() {
|
||||
let dim = self.cached_dim.load(Ordering::Relaxed) as usize;
|
||||
let dim = if dim == 0 { embeddings[0].as_ref().map(|v| v.len()).unwrap_or(0) } else { dim };
|
||||
return Ok(EmbedResponse {
|
||||
embeddings: embeddings.into_iter().map(|opt| opt.expect("filled")).collect(),
|
||||
model: req.model.unwrap_or_else(|| "nomic-embed-text".to_string()),
|
||||
dimensions: dim,
|
||||
});
|
||||
}
|
||||
|
||||
// Second pass: fetch the misses in one sidecar call.
|
||||
let miss_req = EmbedRequest { texts: miss_texts.clone(), model: req.model.clone() };
|
||||
let resp = self.embed_uncached(&miss_req).await?;
|
||||
if resp.embeddings.len() != miss_texts.len() {
|
||||
return Err(format!(
|
||||
"embed cache: sidecar returned {} embeddings for {} texts",
|
||||
resp.embeddings.len(),
|
||||
miss_texts.len()
|
||||
));
|
||||
}
|
||||
|
||||
// Pin cached_dim on first successful response.
|
||||
if resp.dimensions > 0 {
|
||||
self.cached_dim.store(resp.dimensions as u64, Ordering::Relaxed);
|
||||
}
|
||||
|
||||
// Insert misses into cache + fill response slots.
|
||||
{
|
||||
let mut guard = cache.lock().map_err(|e| format!("cache lock poisoned: {e}"))?;
|
||||
for (j, idx) in miss_indices.iter().enumerate() {
|
||||
let key = EmbedCacheKey {
|
||||
model: model_key.clone(),
|
||||
text: miss_texts[j].clone(),
|
||||
};
|
||||
let vec = resp.embeddings[j].clone();
|
||||
guard.put(key, vec.clone());
|
||||
embeddings[*idx] = Some(vec);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(EmbedResponse {
|
||||
embeddings: embeddings.into_iter().map(|opt| opt.expect("filled")).collect(),
|
||||
model: resp.model,
|
||||
dimensions: resp.dimensions,
|
||||
})
|
||||
}
|
||||
|
||||
/// Direct Ollama call — used internally by embed() for cache-miss
|
||||
/// batches and as the transparent fallback when the cache is
|
||||
/// disabled. Loops per-text against `${base_url}/api/embed`,
|
||||
/// matching the sidecar's pre-2026-05-02 behavior. Ollama 0.4+
|
||||
/// supports batch input but per-text keeps compatibility broader
|
||||
/// + lets cache-miss-only batches share the loop with cold runs.
|
||||
async fn embed_uncached(&self, req: &EmbedRequest) -> Result<EmbedResponse, String> {
|
||||
let model = req.model.clone().unwrap_or_else(|| "nomic-embed-text".to_string());
|
||||
let mut embeddings: Vec<Vec<f64>> = Vec::with_capacity(req.texts.len());
|
||||
|
||||
for text in &req.texts {
|
||||
let resp = self.client
|
||||
.post(format!("{}/api/embed", self.base_url))
|
||||
.json(&serde_json::json!({
|
||||
"model": &model,
|
||||
"input": text,
|
||||
}))
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("embed request failed: {e}"))?;
|
||||
|
||||
if !resp.status().is_success() {
|
||||
let body = resp.text().await.unwrap_or_default();
|
||||
return Err(format!("ollama embed error: {body}"));
|
||||
}
|
||||
// Ollama returns {"embeddings": [[...]], "model": "...", ...}.
|
||||
// The outer `embeddings` is always a list; for a scalar input
|
||||
// we get a single inner vector.
|
||||
let parsed: serde_json::Value = resp.json().await
|
||||
.map_err(|e| format!("embed parse error: {e}"))?;
|
||||
let arr = parsed.get("embeddings")
|
||||
.and_then(|v| v.as_array())
|
||||
.ok_or_else(|| format!("ollama embed: missing 'embeddings' field in {parsed}"))?;
|
||||
if arr.is_empty() {
|
||||
return Err("ollama embed: empty embeddings array".to_string());
|
||||
}
|
||||
let first = arr[0].as_array()
|
||||
.ok_or_else(|| "ollama embed: embeddings[0] not an array".to_string())?;
|
||||
let vec: Vec<f64> = first.iter()
|
||||
.filter_map(|n| n.as_f64())
|
||||
.collect();
|
||||
if vec.is_empty() {
|
||||
return Err("ollama embed: numeric coercion produced empty vector".to_string());
|
||||
}
|
||||
embeddings.push(vec);
|
||||
}
|
||||
|
||||
let dimensions = embeddings.first().map(|v| v.len()).unwrap_or(0);
|
||||
Ok(EmbedResponse {
|
||||
embeddings,
|
||||
model,
|
||||
dimensions,
|
||||
})
|
||||
resp.json().await.map_err(|e| format!("embed parse error: {e}"))
|
||||
}
|
||||
|
||||
pub async fn generate(&self, req: GenerateRequest) -> Result<GenerateResponse, String> {
|
||||
if let Some(gw) = self.gateway_url.as_deref() {
|
||||
return self.generate_via_gateway(gw, req).await;
|
||||
}
|
||||
// Direct Ollama path. Used by gateway internals (so the ollama
|
||||
// provider can call upstream without a self-loop through
|
||||
// /v1/chat) and by any consumer that wants raw transport
|
||||
// without /v1/usage accounting.
|
||||
let model = req.model.clone().unwrap_or_else(|| "qwen3.5:latest".to_string());
|
||||
let mut body = serde_json::json!({
|
||||
"model": &model,
|
||||
"prompt": &req.prompt,
|
||||
"stream": false,
|
||||
});
|
||||
let mut options = serde_json::Map::new();
|
||||
if let Some(t) = req.temperature {
|
||||
options.insert("temperature".to_string(), serde_json::json!(t));
|
||||
}
|
||||
if let Some(mt) = req.max_tokens {
|
||||
options.insert("num_predict".to_string(), serde_json::json!(mt));
|
||||
}
|
||||
if !options.is_empty() {
|
||||
body["options"] = serde_json::Value::Object(options);
|
||||
}
|
||||
if let Some(sys) = &req.system {
|
||||
body["system"] = serde_json::json!(sys);
|
||||
}
|
||||
if let Some(th) = req.think {
|
||||
body["think"] = serde_json::json!(th);
|
||||
}
|
||||
|
||||
// Direct-sidecar legacy path. Used by gateway internals (so
|
||||
// ollama_arm can call sidecar without a self-loop) and by
|
||||
// any consumer that wants raw transport without /v1/usage
|
||||
// accounting.
|
||||
let resp = self.client
|
||||
.post(format!("{}/api/generate", self.base_url))
|
||||
.json(&body)
|
||||
.post(format!("{}/generate", self.base_url))
|
||||
.json(&req)
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("generate request failed: {e}"))?;
|
||||
|
||||
if !resp.status().is_success() {
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
return Err(format!("ollama generate error: {text}"));
|
||||
return Err(format!("generate error: {text}"));
|
||||
}
|
||||
let parsed: serde_json::Value = resp.json().await
|
||||
.map_err(|e| format!("generate parse error: {e}"))?;
|
||||
|
||||
Ok(GenerateResponse {
|
||||
text: parsed.get("response").and_then(|v| v.as_str()).unwrap_or("").to_string(),
|
||||
model,
|
||||
tokens_evaluated: parsed.get("prompt_eval_count").and_then(|v| v.as_u64()),
|
||||
tokens_generated: parsed.get("eval_count").and_then(|v| v.as_u64()),
|
||||
})
|
||||
resp.json().await.map_err(|e| format!("generate parse error: {e}"))
|
||||
}
|
||||
|
||||
/// Phase 44 part 2: route generate() through the gateway's
|
||||
@ -467,60 +217,19 @@ impl AiClient {
|
||||
})
|
||||
}
|
||||
|
||||
/// Cross-encoder reranking via Ollama generate. Asks the model to
|
||||
/// rate each document's relevance to the query 0-10, then sorts
|
||||
/// descending. Mirrors the sidecar's pre-2026-05-02 algorithm
|
||||
/// exactly so callers see the same scores.
|
||||
pub async fn rerank(&self, req: RerankRequest) -> Result<RerankResponse, String> {
|
||||
let model = req.model.clone().unwrap_or_else(|| "qwen3.5:latest".to_string());
|
||||
let mut scored: Vec<ScoredDocument> = Vec::with_capacity(req.documents.len());
|
||||
let resp = self.client
|
||||
.post(format!("{}/rerank", self.base_url))
|
||||
.json(&req)
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("rerank request failed: {e}"))?;
|
||||
|
||||
for (i, doc) in req.documents.iter().enumerate() {
|
||||
let prompt = format!(
|
||||
"Rate the relevance of the following document to the query on a scale of 0 to 10. \
|
||||
Respond with ONLY a number.\n\n\
|
||||
Query: {}\n\n\
|
||||
Document: {}\n\n\
|
||||
Score:",
|
||||
req.query, doc,
|
||||
);
|
||||
let resp = self.client
|
||||
.post(format!("{}/api/generate", self.base_url))
|
||||
.json(&serde_json::json!({
|
||||
"model": &model,
|
||||
"prompt": prompt,
|
||||
"stream": false,
|
||||
"options": {"temperature": 0.0, "num_predict": 8},
|
||||
}))
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("rerank request failed: {e}"))?;
|
||||
|
||||
if !resp.status().is_success() {
|
||||
let body = resp.text().await.unwrap_or_default();
|
||||
return Err(format!("ollama rerank error: {body}"));
|
||||
}
|
||||
let parsed: serde_json::Value = resp.json().await
|
||||
.map_err(|e| format!("rerank parse error: {e}"))?;
|
||||
let text = parsed.get("response").and_then(|v| v.as_str()).unwrap_or("").trim();
|
||||
// Parse the leading number; tolerate "7", "7.5", "7 — strong match".
|
||||
let score = text.split_whitespace().next()
|
||||
.and_then(|t| t.parse::<f64>().ok())
|
||||
.unwrap_or(0.0)
|
||||
.clamp(0.0, 10.0);
|
||||
|
||||
scored.push(ScoredDocument {
|
||||
index: i,
|
||||
text: doc.clone(),
|
||||
score,
|
||||
});
|
||||
if !resp.status().is_success() {
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
return Err(format!("rerank error: {text}"));
|
||||
}
|
||||
|
||||
scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
|
||||
if let Some(k) = req.top_k {
|
||||
scored.truncate(k);
|
||||
}
|
||||
Ok(RerankResponse { results: scored, model })
|
||||
resp.json().await.map_err(|e| format!("rerank parse error: {e}"))
|
||||
}
|
||||
|
||||
/// Force Ollama to unload the named model from VRAM (keep_alive=0).
|
||||
@ -529,116 +238,40 @@ impl AiClient {
|
||||
/// profile's model can linger in VRAM next to the new one.
|
||||
pub async fn unload_model(&self, model: &str) -> Result<serde_json::Value, String> {
|
||||
let resp = self.client
|
||||
.post(format!("{}/api/generate", self.base_url))
|
||||
.json(&serde_json::json!({
|
||||
"model": model,
|
||||
"prompt": "",
|
||||
"keep_alive": 0,
|
||||
"stream": false,
|
||||
}))
|
||||
.post(format!("{}/admin/unload", self.base_url))
|
||||
.json(&serde_json::json!({ "model": model }))
|
||||
.send().await
|
||||
.map_err(|e| format!("unload request failed: {e}"))?;
|
||||
if !resp.status().is_success() {
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
return Err(format!("ollama unload error: {text}"));
|
||||
return Err(format!("unload error: {text}"));
|
||||
}
|
||||
// Ollama returns 200 with the empty-prompt response shape.
|
||||
// Fold into the legacy {"unloaded": "<model>"} envelope so
|
||||
// callers' parsing doesn't break.
|
||||
Ok(serde_json::json!({ "unloaded": model }))
|
||||
resp.json().await.map_err(|e| format!("unload parse error: {e}"))
|
||||
}
|
||||
|
||||
/// Ask Ollama to load the named model into VRAM proactively. Makes
|
||||
/// the first real request after profile activation fast (no cold-load
|
||||
/// latency). Empty prompts confuse some models, so we send a single
|
||||
/// space + cap num_predict=1 (matches the sidecar's prior behavior).
|
||||
/// latency).
|
||||
pub async fn preload_model(&self, model: &str) -> Result<serde_json::Value, String> {
|
||||
let resp = self.client
|
||||
.post(format!("{}/api/generate", self.base_url))
|
||||
.json(&serde_json::json!({
|
||||
"model": model,
|
||||
"prompt": " ",
|
||||
"keep_alive": "5m",
|
||||
"stream": false,
|
||||
"options": {"num_predict": 1},
|
||||
}))
|
||||
.post(format!("{}/admin/preload", self.base_url))
|
||||
.json(&serde_json::json!({ "model": model }))
|
||||
.send().await
|
||||
.map_err(|e| format!("preload request failed: {e}"))?;
|
||||
if !resp.status().is_success() {
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
return Err(format!("ollama preload error: {text}"));
|
||||
return Err(format!("preload error: {text}"));
|
||||
}
|
||||
let parsed: serde_json::Value = resp.json().await
|
||||
.map_err(|e| format!("preload parse error: {e}"))?;
|
||||
Ok(serde_json::json!({
|
||||
"preloaded": model,
|
||||
"load_duration_ns": parsed.get("load_duration"),
|
||||
"total_duration_ns": parsed.get("total_duration"),
|
||||
}))
|
||||
resp.json().await.map_err(|e| format!("preload parse error: {e}"))
|
||||
}
|
||||
|
||||
/// GPU + loaded-model snapshot. Combines nvidia-smi output (when
|
||||
/// available) with Ollama's /api/ps. Same shape as the prior
|
||||
/// sidecar /admin/vram endpoint so callers don't need updating.
|
||||
/// GPU + loaded-model snapshot from the sidecar. Combines nvidia-smi
|
||||
/// output (if available) with Ollama's /api/ps.
|
||||
pub async fn vram_snapshot(&self) -> Result<serde_json::Value, String> {
|
||||
let resp = self.client
|
||||
.get(format!("{}/api/ps", self.base_url))
|
||||
.get(format!("{}/admin/vram", self.base_url))
|
||||
.send().await
|
||||
.map_err(|e| format!("ollama ps request failed: {e}"))?;
|
||||
let loaded: Vec<serde_json::Value> = if resp.status().is_success() {
|
||||
let parsed: serde_json::Value = resp.json().await.unwrap_or(serde_json::Value::Null);
|
||||
parsed.get("models")
|
||||
.and_then(|v| v.as_array())
|
||||
.map(|arr| arr.iter().map(|m| serde_json::json!({
|
||||
"name": m.get("name"),
|
||||
"size_vram_mib": m.get("size_vram").and_then(|v| v.as_u64()).map(|n| n / (1024 * 1024)),
|
||||
"expires_at": m.get("expires_at"),
|
||||
})).collect())
|
||||
.unwrap_or_default()
|
||||
} else {
|
||||
Vec::new()
|
||||
};
|
||||
|
||||
let gpu = nvidia_smi_snapshot();
|
||||
|
||||
Ok(serde_json::json!({
|
||||
"gpu": gpu,
|
||||
"ollama_loaded": loaded,
|
||||
}))
|
||||
.map_err(|e| format!("vram request failed: {e}"))?;
|
||||
resp.json().await.map_err(|e| format!("vram parse error: {e}"))
|
||||
}
|
||||
}
|
||||
|
||||
/// One-shot nvidia-smi poll. Returns Null if the tool isn't on PATH
|
||||
/// or the call fails. Mirrors the sidecar's `_nvidia_smi_snapshot`
|
||||
/// shape exactly so callers reading vram_snapshot don't break.
|
||||
fn nvidia_smi_snapshot() -> serde_json::Value {
|
||||
use std::process::Command;
|
||||
let out = Command::new("nvidia-smi")
|
||||
.args([
|
||||
"--query-gpu=memory.used,memory.total,utilization.gpu,name",
|
||||
"--format=csv,noheader,nounits",
|
||||
])
|
||||
.output();
|
||||
let stdout = match out {
|
||||
Ok(o) if o.status.success() => o.stdout,
|
||||
_ => return serde_json::Value::Null,
|
||||
};
|
||||
let line = String::from_utf8_lossy(&stdout);
|
||||
let line = line.trim();
|
||||
if line.is_empty() {
|
||||
return serde_json::Value::Null;
|
||||
}
|
||||
let parts: Vec<&str> = line.split(',').map(|s| s.trim()).collect();
|
||||
if parts.len() < 4 {
|
||||
return serde_json::Value::Null;
|
||||
}
|
||||
let used = parts[0].parse::<u64>().unwrap_or(0);
|
||||
let total = parts[1].parse::<u64>().unwrap_or(0);
|
||||
let util = parts[2].parse::<u64>().unwrap_or(0);
|
||||
serde_json::json!({
|
||||
"name": parts[3],
|
||||
"used_mib": used,
|
||||
"total_mib": total,
|
||||
"utilization_pct": util,
|
||||
})
|
||||
}
|
||||
|
||||
@ -1,37 +0,0 @@
|
||||
//! Cross-runtime parity helper for `extract_json`.
|
||||
//!
|
||||
//! Reads a single model-output string from stdin, runs the Rust
|
||||
//! extract_json, prints `{"matched": bool, "value": <object|null>}`
|
||||
//! to stdout as JSON. Exit 0 on success, exit 1 on internal error.
|
||||
//!
|
||||
//! The Go counterpart lives at
|
||||
//! `golangLAKEHOUSE/internal/validator/iterate.go::ExtractJSON`. The
|
||||
//! parity probe at
|
||||
//! `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`
|
||||
//! feeds the same fixtures through both and diffs the outputs.
|
||||
//!
|
||||
//! Usage:
|
||||
//! echo '<raw model output>' | parity_extract_json
|
||||
//! parity_extract_json <<< '...'
|
||||
|
||||
use std::io::Read;
|
||||
|
||||
fn main() {
|
||||
let mut buf = String::new();
|
||||
if let Err(e) = std::io::stdin().read_to_string(&mut buf) {
|
||||
eprintln!("read stdin: {e}");
|
||||
std::process::exit(1);
|
||||
}
|
||||
let result = gateway::v1::iterate::extract_json(&buf);
|
||||
let body = serde_json::json!({
|
||||
"matched": result.is_some(),
|
||||
"value": result.unwrap_or(serde_json::Value::Null),
|
||||
});
|
||||
match serde_json::to_string(&body) {
|
||||
Ok(s) => println!("{s}"),
|
||||
Err(e) => {
|
||||
eprintln!("serialize result: {e}");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -1,71 +0,0 @@
|
||||
//! Cross-runtime parity helper for `SessionRecord` JSON shape.
|
||||
//!
|
||||
//! Reads a fixture JSON on stdin, builds a `SessionRecord`, emits
|
||||
//! one JSONL row on stdout. Used by
|
||||
//! `golangLAKEHOUSE/scripts/cutover/parity/session_log_parity.sh`
|
||||
//! to verify the Rust gateway's session log shape stays byte-equal
|
||||
//! to the Go-side validatord's `validator.SessionRecord` (commit
|
||||
//! 1a3a82a in golangLAKEHOUSE).
|
||||
|
||||
use gateway::v1::session_log::{SessionAttemptRecord, SessionRecord, SESSION_RECORD_SCHEMA};
|
||||
use serde::Deserialize;
|
||||
use std::io::Read;
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct FixtureInput {
|
||||
session_id: String,
|
||||
kind: String,
|
||||
model: String,
|
||||
provider: String,
|
||||
prompt: String,
|
||||
iterations: u32,
|
||||
max_iterations: u32,
|
||||
final_verdict: String,
|
||||
attempts: Vec<SessionAttemptRecord>,
|
||||
#[serde(default)]
|
||||
artifact: Option<serde_json::Value>,
|
||||
#[serde(default)]
|
||||
grounded_in_roster: Option<bool>,
|
||||
duration_ms: u64,
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let mut buf = String::new();
|
||||
if let Err(e) = std::io::stdin().read_to_string(&mut buf) {
|
||||
eprintln!("read stdin: {e}");
|
||||
std::process::exit(1);
|
||||
}
|
||||
let input: FixtureInput = match serde_json::from_str(&buf) {
|
||||
Ok(v) => v,
|
||||
Err(e) => {
|
||||
eprintln!("parse stdin: {e}");
|
||||
std::process::exit(1);
|
||||
}
|
||||
};
|
||||
let rec = SessionRecord {
|
||||
schema: SESSION_RECORD_SCHEMA.to_string(),
|
||||
session_id: input.session_id,
|
||||
// Pinned timestamp so both runtimes' rows compare byte-equal
|
||||
// when the test wrapper normalizes on `daemon` only.
|
||||
timestamp: "2026-01-01T00:00:00+00:00".to_string(),
|
||||
daemon: "gateway".to_string(),
|
||||
kind: input.kind,
|
||||
model: input.model,
|
||||
provider: input.provider,
|
||||
prompt: input.prompt,
|
||||
iterations: input.iterations,
|
||||
max_iterations: input.max_iterations,
|
||||
final_verdict: input.final_verdict,
|
||||
attempts: input.attempts,
|
||||
artifact: input.artifact,
|
||||
grounded_in_roster: input.grounded_in_roster,
|
||||
duration_ms: input.duration_ms,
|
||||
};
|
||||
match serde_json::to_string(&rec) {
|
||||
Ok(s) => println!("{s}"),
|
||||
Err(e) => {
|
||||
eprintln!("marshal: {e}");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -438,10 +438,6 @@ impl ExecutionLoop {
|
||||
start_time: start_time.to_rfc3339(),
|
||||
end_time: end_time.to_rfc3339(),
|
||||
latency_ms: elapsed_ms,
|
||||
// Internal execution-loop traffic is its own top-level
|
||||
// trace per call. If a future caller threads a parent
|
||||
// trace into self.state, lift this to Some(parent_id).
|
||||
parent_trace_id: None,
|
||||
});
|
||||
}
|
||||
|
||||
@ -586,10 +582,10 @@ impl ExecutionLoop {
|
||||
/// Phase 20 step (8) — T3 overseer escalation.
|
||||
///
|
||||
/// When the local executor/reviewer loop can't self-correct, call
|
||||
/// the cloud overseer (`claude-opus-4-7` via OpenCode Zen) with
|
||||
/// (a) the KB context — recent outcomes + prior corrections for
|
||||
/// this sig_hash + task_class, across every profile that has run
|
||||
/// it — and (b) the recent log tail. Its output is appended as a
|
||||
/// the cloud overseer (`gpt-oss:120b` via Ollama Cloud) with (a)
|
||||
/// the KB context — recent outcomes + prior corrections for this
|
||||
/// sig_hash + task_class, across every profile that has run it —
|
||||
/// and (b) the recent log tail. Its output is appended as a
|
||||
/// `system` role turn so the next executor generation sees it,
|
||||
/// AND written to `data/_kb/overseer_corrections.jsonl` so every
|
||||
/// future profile activation reads from the same learning pool.
|
||||
@ -597,16 +593,9 @@ impl ExecutionLoop {
|
||||
/// This is the "pipe to the overviewer" piece from 2026-04-23 —
|
||||
/// the overseer is now a first-class KB consumer AND producer, not
|
||||
/// a one-shot correction oracle.
|
||||
///
|
||||
/// 2026-04-28: routed through OpenCode (Zen tier) for Claude Opus
|
||||
/// 4.7. Frontier reasoning matters here because the overseer fires
|
||||
/// only after local self-correction has failed twice — by that
|
||||
/// point we need the strongest reasoning available, not the
|
||||
/// cheapest token. Frequency is low so the Zen pay-per-token cost
|
||||
/// stays bounded.
|
||||
async fn escalate_to_overseer(&mut self, turn: u32, reason: &str) -> Result<(), String> {
|
||||
let Some(opencode_key) = self.state.opencode_key.clone() else {
|
||||
return Err("OPENCODE_API_KEY not configured — skipping escalation".into());
|
||||
let Some(cloud_key) = self.state.ollama_cloud_key.clone() else {
|
||||
return Err("OLLAMA_CLOUD_KEY not configured — skipping escalation".into());
|
||||
};
|
||||
|
||||
let kb = KbContext::load_for(&sig_hash(&self.req), &self.req.task_class).await;
|
||||
@ -615,18 +604,16 @@ impl ExecutionLoop {
|
||||
let started = std::time::Instant::now();
|
||||
let start_time = chrono::Utc::now();
|
||||
let chat_req = crate::v1::ChatRequest {
|
||||
model: "claude-opus-4-7".to_string(),
|
||||
model: "gpt-oss:120b".to_string(),
|
||||
messages: vec![crate::v1::Message::new_text("user", prompt.clone())],
|
||||
temperature: Some(0.1),
|
||||
max_tokens: None,
|
||||
stream: Some(false),
|
||||
// Anthropic models on opencode reject `think` (handled in
|
||||
// the adapter), but we keep the intent flag for parity.
|
||||
think: Some(true),
|
||||
provider: Some("opencode".into()),
|
||||
think: Some(true), // overseer KEEPS thinking (Phase 20 rule)
|
||||
provider: Some("ollama_cloud".into()),
|
||||
};
|
||||
let resp = crate::v1::opencode::chat(&opencode_key, &chat_req).await
|
||||
.map_err(|e| format!("opencode: {e}"))?;
|
||||
let resp = crate::v1::ollama_cloud::chat(&cloud_key, &chat_req).await
|
||||
.map_err(|e| format!("ollama_cloud: {e}"))?;
|
||||
let latency_ms = started.elapsed().as_millis() as u64;
|
||||
let end_time = chrono::Utc::now();
|
||||
let correction_text: String = resp.choices.into_iter().next()
|
||||
@ -646,8 +633,8 @@ impl ExecutionLoop {
|
||||
if let Some(lf) = &self.state.langfuse {
|
||||
use crate::v1::langfuse_trace::ChatTrace;
|
||||
lf.emit_chat(ChatTrace {
|
||||
provider: "opencode".into(),
|
||||
model: "claude-opus-4-7".into(),
|
||||
provider: "ollama_cloud".into(),
|
||||
model: "gpt-oss:120b".into(),
|
||||
input: vec![crate::v1::Message::new_text("user", prompt.clone())],
|
||||
output: correction_text.clone(),
|
||||
prompt_tokens: resp.usage.prompt_tokens,
|
||||
@ -658,13 +645,12 @@ impl ExecutionLoop {
|
||||
start_time: start_time.to_rfc3339(),
|
||||
end_time: end_time.to_rfc3339(),
|
||||
latency_ms,
|
||||
parent_trace_id: None,
|
||||
});
|
||||
}
|
||||
|
||||
// Append to the transcript so the next executor turn sees it.
|
||||
self.append(LogEntry::new(
|
||||
turn, "system", "claude-opus-4-7", "overseer_correction",
|
||||
turn, "system", "gpt-oss:120b", "overseer_correction",
|
||||
serde_json::json!({
|
||||
"reason": reason,
|
||||
"correction": correction_text,
|
||||
@ -686,7 +672,7 @@ impl ExecutionLoop {
|
||||
"task_class": self.req.task_class,
|
||||
"operation": self.req.operation,
|
||||
"reason": reason,
|
||||
"model": "claude-opus-4-7",
|
||||
"model": "gpt-oss:120b",
|
||||
"correction": correction_text,
|
||||
"applied_at_turn": turn,
|
||||
"kb_context_used": kb,
|
||||
|
||||
@ -1,19 +0,0 @@
|
||||
//! Library facade for the gateway crate so sub-binaries (e.g.
|
||||
//! `parity_extract_json`) can reuse the same modules the gateway
|
||||
//! binary uses.
|
||||
//!
|
||||
//! Added 2026-05-02 to support the cross-runtime parity probe at
|
||||
//! `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`.
|
||||
//! `extract_json` is the load-bearing public surface for that probe.
|
||||
//!
|
||||
//! main.rs still uses local `mod foo;` declarations independently —
|
||||
//! adding this file is purely additive (the binary's module tree is
|
||||
//! unchanged).
|
||||
|
||||
pub mod access;
|
||||
pub mod access_service;
|
||||
pub mod auth;
|
||||
pub mod execution_loop;
|
||||
pub mod observability;
|
||||
pub mod tools;
|
||||
pub mod v1;
|
||||
@ -362,22 +362,6 @@ async fn main() {
|
||||
}
|
||||
c
|
||||
},
|
||||
// Coordinator session JSONL — one row per /v1/iterate
|
||||
// session for offline DuckDB analysis. Cross-runtime
|
||||
// parity with Go-side validatord (commit 1a3a82a).
|
||||
session_log: {
|
||||
let path = &config.gateway.session_log_path;
|
||||
let s = v1::session_log::SessionLogger::from_path(path);
|
||||
if s.is_some() {
|
||||
tracing::info!(
|
||||
"v1: session log enabled — coordinator sessions written to {}",
|
||||
path
|
||||
);
|
||||
} else {
|
||||
tracing::info!("v1: session log disabled (set [gateway].session_log_path to enable)");
|
||||
}
|
||||
s
|
||||
},
|
||||
}));
|
||||
|
||||
// Auth middleware (if enabled) — P5-001 fix 2026-04-23:
|
||||
|
||||
@ -21,19 +21,12 @@
|
||||
//! re-implementation. Staffing executors, agent loops, and future
|
||||
//! validators all reach the same code path.
|
||||
|
||||
use axum::{extract::State, http::{HeaderMap, StatusCode}, response::IntoResponse, Json};
|
||||
use axum::{extract::State, http::StatusCode, response::IntoResponse, Json};
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
const DEFAULT_MAX_ITERATIONS: u32 = 3;
|
||||
const LOOPBACK_TIMEOUT_SECS: u64 = 240;
|
||||
|
||||
/// Header name used to propagate a Langfuse parent trace id across
|
||||
/// daemon boundaries. Matches Go's `shared.TraceIDHeader` constant
|
||||
/// byte-for-byte (commit d6d2fdf in golangLAKEHOUSE) — same wire
|
||||
/// format means a Go caller can hit Rust's /v1/iterate (or vice
|
||||
/// versa) and the resulting Langfuse trees nest correctly.
|
||||
pub const TRACE_ID_HEADER: &str = "x-lakehouse-trace-id";
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct IterateRequest {
|
||||
/// "fill" | "email" | "playbook" — picks which validator runs.
|
||||
@ -87,14 +80,6 @@ pub struct IterateResponse {
|
||||
pub validation: serde_json::Value,
|
||||
pub iterations: u32,
|
||||
pub history: Vec<IterateAttempt>,
|
||||
/// Echoes the resolved trace id (caller-forwarded header, body
|
||||
/// field, langfuse-middleware mint, or local fallback). Operators
|
||||
/// pivot from this id straight into Langfuse + the
|
||||
/// coordinator_sessions.jsonl join key. Cross-runtime parity with
|
||||
/// Go's `validator.IterateResponse` (commit 6847bbc in
|
||||
/// golangLAKEHOUSE).
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub trace_id: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Serialize)]
|
||||
@ -102,52 +87,29 @@ pub struct IterateFailure {
|
||||
pub error: String,
|
||||
pub iterations: u32,
|
||||
pub history: Vec<IterateAttempt>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub trace_id: Option<String>,
|
||||
}
|
||||
|
||||
pub async fn iterate(
|
||||
State(state): State<super::V1State>,
|
||||
headers: HeaderMap,
|
||||
State(_state): State<super::V1State>,
|
||||
Json(req): Json<IterateRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let max_iter = req.max_iterations.unwrap_or(DEFAULT_MAX_ITERATIONS).max(1);
|
||||
let temperature = req.temperature.unwrap_or(0.2);
|
||||
let max_tokens = req.max_tokens.unwrap_or(4096);
|
||||
let mut history: Vec<IterateAttempt> = Vec::with_capacity(max_iter as usize);
|
||||
let mut attempt_records: Vec<super::session_log::SessionAttemptRecord> = Vec::with_capacity(max_iter as usize);
|
||||
let mut current_prompt = req.prompt.clone();
|
||||
|
||||
// Resolve the parent Langfuse trace id. Caller-forwarded header
|
||||
// wins (cross-daemon tree linkage); otherwise mint a fresh id so
|
||||
// the iterate session is its own tree. Same shape as the Go-side
|
||||
// validatord trace propagation.
|
||||
let trace_id: String = headers
|
||||
.get(TRACE_ID_HEADER)
|
||||
.and_then(|v| v.to_str().ok())
|
||||
.filter(|s| !s.is_empty())
|
||||
.map(|s| s.to_string())
|
||||
.unwrap_or_else(new_trace_id);
|
||||
|
||||
let client = match reqwest::Client::builder()
|
||||
.timeout(std::time::Duration::from_secs(LOOPBACK_TIMEOUT_SECS))
|
||||
.build() {
|
||||
Ok(c) => c,
|
||||
Err(e) => {
|
||||
// Even infrastructure failures get a session row so a
|
||||
// missing /v1/iterate event never silently disappears
|
||||
// from the longitudinal log.
|
||||
write_infra_error(&state, &req, &trace_id, max_iter, 0, format!("client build: {e}")).await;
|
||||
return (StatusCode::INTERNAL_SERVER_ERROR, format!("client build: {e}")).into_response();
|
||||
}
|
||||
Err(e) => return (StatusCode::INTERNAL_SERVER_ERROR, format!("client build: {e}")).into_response(),
|
||||
};
|
||||
// Self-loopback to the gateway port. Carries gateway internal
|
||||
// calls through /v1/chat + /v1/validate so /v1/usage tracks them.
|
||||
let gateway = "http://127.0.0.1:3100";
|
||||
let t0 = std::time::Instant::now();
|
||||
|
||||
for iteration in 0..max_iter {
|
||||
let attempt_started = chrono::Utc::now();
|
||||
// ── Generate ──
|
||||
let mut messages = Vec::with_capacity(2);
|
||||
if let Some(sys) = &req.system {
|
||||
@ -161,33 +123,20 @@ pub async fn iterate(
|
||||
"temperature": temperature,
|
||||
"max_tokens": max_tokens,
|
||||
});
|
||||
let raw = match call_chat(&client, gateway, &chat_body, &trace_id).await {
|
||||
let raw = match call_chat(&client, gateway, &chat_body).await {
|
||||
Ok(r) => r,
|
||||
Err(e) => {
|
||||
write_infra_error(&state, &req, &trace_id, max_iter, t0.elapsed().as_millis() as u64, format!("/v1/chat hop failed at iter {iteration}: {e}")).await;
|
||||
return (StatusCode::BAD_GATEWAY, format!("/v1/chat hop failed at iter {iteration}: {e}")).into_response();
|
||||
}
|
||||
Err(e) => return (StatusCode::BAD_GATEWAY, format!("/v1/chat hop failed at iter {iteration}: {e}")).into_response(),
|
||||
};
|
||||
|
||||
// ── Extract JSON ──
|
||||
let artifact = match extract_json(&raw) {
|
||||
Some(a) => a,
|
||||
None => {
|
||||
let span_id = emit_attempt_span(
|
||||
&state, &trace_id, iteration, &req, ¤t_prompt, &raw, "no_json", None,
|
||||
attempt_started, chrono::Utc::now(),
|
||||
);
|
||||
history.push(IterateAttempt {
|
||||
iteration,
|
||||
raw: raw.chars().take(2000).collect(),
|
||||
status: AttemptStatus::NoJson,
|
||||
});
|
||||
attempt_records.push(super::session_log::SessionAttemptRecord {
|
||||
iteration,
|
||||
verdict_kind: "no_json".to_string(),
|
||||
error: None,
|
||||
span_id,
|
||||
});
|
||||
current_prompt = format!(
|
||||
"{}\n\nYour previous attempt did not contain a JSON object. Reply with ONLY a valid JSON object matching the requested artifact shape.",
|
||||
req.prompt,
|
||||
@ -202,41 +151,22 @@ pub async fn iterate(
|
||||
"artifact": artifact,
|
||||
"context": req.context.clone().unwrap_or(serde_json::Value::Null),
|
||||
});
|
||||
match call_validate(&client, gateway, &validate_body, &trace_id).await {
|
||||
match call_validate(&client, gateway, &validate_body).await {
|
||||
Ok(report) => {
|
||||
let span_id = emit_attempt_span(
|
||||
&state, &trace_id, iteration, &req, ¤t_prompt, &raw, "accepted", None,
|
||||
attempt_started, chrono::Utc::now(),
|
||||
);
|
||||
history.push(IterateAttempt {
|
||||
iteration,
|
||||
raw: raw.chars().take(2000).collect(),
|
||||
status: AttemptStatus::Accepted,
|
||||
});
|
||||
attempt_records.push(super::session_log::SessionAttemptRecord {
|
||||
iteration,
|
||||
verdict_kind: "accepted".to_string(),
|
||||
error: None,
|
||||
span_id,
|
||||
});
|
||||
let duration_ms = t0.elapsed().as_millis() as u64;
|
||||
let grounded = grounded_in_roster(&state, &req.kind, &artifact);
|
||||
write_session_accepted(&state, &req, &trace_id, iteration + 1, max_iter, attempt_records, &artifact, grounded, duration_ms).await;
|
||||
return (StatusCode::OK, Json(IterateResponse {
|
||||
artifact,
|
||||
validation: report,
|
||||
iterations: iteration + 1,
|
||||
history,
|
||||
trace_id: Some(trace_id.clone()),
|
||||
})).into_response();
|
||||
}
|
||||
Err(err) => {
|
||||
let err_summary = err.to_string();
|
||||
let span_id = emit_attempt_span(
|
||||
&state, &trace_id, iteration, &req, ¤t_prompt, &raw, "validation_failed",
|
||||
Some(err_summary.clone()),
|
||||
attempt_started, chrono::Utc::now(),
|
||||
);
|
||||
history.push(IterateAttempt {
|
||||
iteration,
|
||||
raw: raw.chars().take(2000).collect(),
|
||||
@ -244,12 +174,6 @@ pub async fn iterate(
|
||||
error: serde_json::to_value(&err_summary).unwrap_or(serde_json::Value::Null),
|
||||
},
|
||||
});
|
||||
attempt_records.push(super::session_log::SessionAttemptRecord {
|
||||
iteration,
|
||||
verdict_kind: "validation_failed".to_string(),
|
||||
error: Some(err_summary.clone()),
|
||||
span_id,
|
||||
});
|
||||
// Append validation feedback to prompt for next iter.
|
||||
// The model sees concrete failure mode + retries with
|
||||
// corrective context. This is the "observer correction"
|
||||
@ -264,167 +188,19 @@ pub async fn iterate(
|
||||
}
|
||||
}
|
||||
|
||||
let duration_ms = t0.elapsed().as_millis() as u64;
|
||||
write_session_failure(&state, &req, &trace_id, max_iter, max_iter, attempt_records, duration_ms).await;
|
||||
(StatusCode::UNPROCESSABLE_ENTITY, Json(IterateFailure {
|
||||
error: format!("max iterations reached ({max_iter}) without passing validation"),
|
||||
iterations: max_iter,
|
||||
history,
|
||||
trace_id: Some(trace_id.clone()),
|
||||
})).into_response()
|
||||
}
|
||||
|
||||
// ─── Helpers — Langfuse spans + session log + roster check ─────────
|
||||
|
||||
fn emit_attempt_span(
|
||||
state: &super::V1State,
|
||||
trace_id: &str,
|
||||
iteration: u32,
|
||||
req: &IterateRequest,
|
||||
prompt: &str,
|
||||
raw: &str,
|
||||
verdict: &str,
|
||||
error: Option<String>,
|
||||
started: chrono::DateTime<chrono::Utc>,
|
||||
ended: chrono::DateTime<chrono::Utc>,
|
||||
) -> Option<String> {
|
||||
let lf = state.langfuse.as_ref()?;
|
||||
Some(lf.emit_attempt_span(super::langfuse_trace::AttemptSpan {
|
||||
trace_id: trace_id.to_string(),
|
||||
iteration,
|
||||
model: req.model.clone(),
|
||||
provider: req.provider.clone(),
|
||||
prompt: prompt.to_string(),
|
||||
raw: raw.to_string(),
|
||||
verdict: verdict.to_string(),
|
||||
error,
|
||||
start_time: started.to_rfc3339(),
|
||||
end_time: ended.to_rfc3339(),
|
||||
}))
|
||||
}
|
||||
|
||||
/// Verify every fill artifact's candidate IDs exist in the roster.
|
||||
/// Returns Some(true)/Some(false) on the fill kind, None otherwise
|
||||
/// (other kinds don't have worker IDs to ground). Same semantics as
|
||||
/// Go's `handlers.rosterCheckFor("fill")`.
|
||||
fn grounded_in_roster(
|
||||
state: &super::V1State,
|
||||
kind: &str,
|
||||
artifact: &serde_json::Value,
|
||||
) -> Option<bool> {
|
||||
if kind != "fill" {
|
||||
return None;
|
||||
}
|
||||
let fills = artifact.get("fills").and_then(|v| v.as_array())?;
|
||||
for f in fills {
|
||||
let id = match f.get("candidate_id").and_then(|v| v.as_str()) {
|
||||
Some(s) if !s.is_empty() => s,
|
||||
_ => return Some(false),
|
||||
};
|
||||
if state.validate_workers.find(id).is_none() {
|
||||
return Some(false);
|
||||
}
|
||||
}
|
||||
Some(true)
|
||||
}
|
||||
|
||||
async fn write_session_accepted(
|
||||
state: &super::V1State,
|
||||
req: &IterateRequest,
|
||||
trace_id: &str,
|
||||
iterations: u32,
|
||||
max_iter: u32,
|
||||
attempts: Vec<super::session_log::SessionAttemptRecord>,
|
||||
artifact: &serde_json::Value,
|
||||
grounded: Option<bool>,
|
||||
duration_ms: u64,
|
||||
) {
|
||||
let Some(logger) = state.session_log.as_ref() else { return };
|
||||
let rec = build_session_record(req, trace_id, "accepted", iterations, max_iter, attempts, Some(artifact.clone()), grounded, duration_ms);
|
||||
logger.append(rec).await;
|
||||
}
|
||||
|
||||
async fn write_session_failure(
|
||||
state: &super::V1State,
|
||||
req: &IterateRequest,
|
||||
trace_id: &str,
|
||||
iterations: u32,
|
||||
max_iter: u32,
|
||||
attempts: Vec<super::session_log::SessionAttemptRecord>,
|
||||
duration_ms: u64,
|
||||
) {
|
||||
let Some(logger) = state.session_log.as_ref() else { return };
|
||||
let rec = build_session_record(req, trace_id, "max_iter_exhausted", iterations, max_iter, attempts, None, None, duration_ms);
|
||||
logger.append(rec).await;
|
||||
}
|
||||
|
||||
async fn write_infra_error(
|
||||
state: &super::V1State,
|
||||
req: &IterateRequest,
|
||||
trace_id: &str,
|
||||
max_iter: u32,
|
||||
duration_ms: u64,
|
||||
error: String,
|
||||
) {
|
||||
let Some(logger) = state.session_log.as_ref() else { return };
|
||||
let attempts = vec![super::session_log::SessionAttemptRecord {
|
||||
iteration: 0,
|
||||
verdict_kind: "infra_error".to_string(),
|
||||
error: Some(error),
|
||||
span_id: None,
|
||||
}];
|
||||
let rec = build_session_record(req, trace_id, "infra_error", 0, max_iter, attempts, None, None, duration_ms);
|
||||
logger.append(rec).await;
|
||||
}
|
||||
|
||||
fn build_session_record(
|
||||
req: &IterateRequest,
|
||||
trace_id: &str,
|
||||
final_verdict: &str,
|
||||
iterations: u32,
|
||||
max_iter: u32,
|
||||
attempts: Vec<super::session_log::SessionAttemptRecord>,
|
||||
artifact: Option<serde_json::Value>,
|
||||
grounded: Option<bool>,
|
||||
duration_ms: u64,
|
||||
) -> super::session_log::SessionRecord {
|
||||
super::session_log::SessionRecord {
|
||||
schema: super::session_log::SESSION_RECORD_SCHEMA.to_string(),
|
||||
session_id: trace_id.to_string(),
|
||||
timestamp: chrono::Utc::now().to_rfc3339(),
|
||||
daemon: "gateway".to_string(),
|
||||
kind: req.kind.clone(),
|
||||
model: req.model.clone(),
|
||||
provider: req.provider.clone(),
|
||||
prompt: super::session_log::truncate(&req.prompt, 4000),
|
||||
iterations,
|
||||
max_iterations: max_iter,
|
||||
final_verdict: final_verdict.to_string(),
|
||||
attempts,
|
||||
artifact,
|
||||
grounded_in_roster: grounded,
|
||||
duration_ms,
|
||||
}
|
||||
}
|
||||
|
||||
/// Generate a fresh trace id when no parent was forwarded. Same
|
||||
/// time-ordered hex shape Langfuse already accepts elsewhere in this
|
||||
/// crate (see `langfuse_trace::uuid_v7_like`).
|
||||
fn new_trace_id() -> String {
|
||||
let ts = chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0);
|
||||
let rand = std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.map(|d| d.subsec_nanos())
|
||||
.unwrap_or(0);
|
||||
format!("{:016x}-{:08x}", ts, rand)
|
||||
}
|
||||
|
||||
async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::Value, trace_id: &str) -> Result<String, String> {
|
||||
let mut req = client.post(format!("{gateway}/v1/chat")).json(body);
|
||||
if !trace_id.is_empty() {
|
||||
req = req.header(TRACE_ID_HEADER, trace_id);
|
||||
}
|
||||
let resp = req.send().await.map_err(|e| format!("chat hop: {e}"))?;
|
||||
async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::Value) -> Result<String, String> {
|
||||
let resp = client.post(format!("{gateway}/v1/chat"))
|
||||
.json(body)
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("chat hop: {e}"))?;
|
||||
let status = resp.status();
|
||||
if !status.is_success() {
|
||||
let body = resp.text().await.unwrap_or_default();
|
||||
@ -437,12 +213,12 @@ async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::V
|
||||
.to_string())
|
||||
}
|
||||
|
||||
async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_json::Value, trace_id: &str) -> Result<serde_json::Value, String> {
|
||||
let mut req = client.post(format!("{gateway}/v1/validate")).json(body);
|
||||
if !trace_id.is_empty() {
|
||||
req = req.header(TRACE_ID_HEADER, trace_id);
|
||||
}
|
||||
let resp = req.send().await.map_err(|e| format!("validate hop: {e}"))?;
|
||||
async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_json::Value) -> Result<serde_json::Value, String> {
|
||||
let resp = client.post(format!("{gateway}/v1/validate"))
|
||||
.json(body)
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| format!("validate hop: {e}"))?;
|
||||
let status = resp.status();
|
||||
let parsed: serde_json::Value = resp.json().await.map_err(|e| format!("validate parse: {e}"))?;
|
||||
if status.is_success() {
|
||||
@ -458,13 +234,7 @@ async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_jso
|
||||
/// Extract the first JSON object from a model's output. Handles
|
||||
/// fenced code blocks (```json ... ```), bare braces, and stray
|
||||
/// prose around the JSON. Returns None on no extractable object.
|
||||
///
|
||||
/// Made `pub` 2026-05-02 to support the cross-runtime parity probe
|
||||
/// at `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`.
|
||||
/// The Go counterpart lives at `internal/validator/iterate.go::ExtractJSON`;
|
||||
/// when either runtime's algorithm changes the parity probe surfaces
|
||||
/// the divergence.
|
||||
pub fn extract_json(raw: &str) -> Option<serde_json::Value> {
|
||||
fn extract_json(raw: &str) -> Option<serde_json::Value> {
|
||||
// Try fenced first.
|
||||
let candidates: Vec<String> = {
|
||||
let mut out = vec![];
|
||||
|
||||
@ -76,54 +76,63 @@ impl LangfuseClient {
|
||||
});
|
||||
}
|
||||
|
||||
/// Fire-and-forget per-iteration span emit. Returns the generated
|
||||
/// span id synchronously so the caller can stamp it on
|
||||
/// `IterateAttempt.span_id` before the network round-trip resolves.
|
||||
/// Mirrors Go's `validator.Tracer` callback shape.
|
||||
pub fn emit_attempt_span(&self, sp: AttemptSpan) -> String {
|
||||
let span_id = uuid_v7_like();
|
||||
let span_id_for_caller = span_id.clone();
|
||||
let this = self.clone();
|
||||
tokio::spawn(async move {
|
||||
if let Err(e) = this.emit_attempt_span_inner(span_id, sp).await {
|
||||
tracing::warn!(target: "v1.langfuse", "iterate span drop: {e}");
|
||||
}
|
||||
});
|
||||
span_id_for_caller
|
||||
}
|
||||
async fn emit_chat_inner(&self, ev: ChatTrace) -> Result<(), String> {
|
||||
let trace_id = uuid_v7_like();
|
||||
let gen_id = uuid_v7_like();
|
||||
let trace_ts = ev.start_time.clone();
|
||||
|
||||
async fn emit_attempt_span_inner(&self, span_id: String, sp: AttemptSpan) -> Result<(), String> {
|
||||
let level = if sp.verdict == "accepted" { "DEFAULT" } else { "WARNING" };
|
||||
let batch = IngestionBatch {
|
||||
batch: vec![IngestionEvent {
|
||||
id: uuid_v7_like(),
|
||||
timestamp: sp.end_time.clone(),
|
||||
kind: "span-create",
|
||||
body: serde_json::json!({
|
||||
"id": span_id,
|
||||
"traceId": sp.trace_id,
|
||||
"name": format!("iterate.attempt[{}]", sp.iteration),
|
||||
"input": serde_json::json!({
|
||||
"iteration": sp.iteration,
|
||||
"model": sp.model,
|
||||
"provider": sp.provider,
|
||||
"prompt": truncate(&sp.prompt, 4000),
|
||||
batch: vec![
|
||||
IngestionEvent {
|
||||
id: uuid_v7_like(),
|
||||
timestamp: trace_ts.clone(),
|
||||
kind: "trace-create",
|
||||
body: serde_json::json!({
|
||||
"id": trace_id,
|
||||
"name": format!("v1.chat:{}", ev.provider),
|
||||
"input": serde_json::json!({
|
||||
"model": ev.model,
|
||||
"messages": ev.input,
|
||||
}),
|
||||
"metadata": serde_json::json!({
|
||||
"provider": ev.provider,
|
||||
"think": ev.think,
|
||||
}),
|
||||
}),
|
||||
"output": serde_json::json!({
|
||||
"verdict": sp.verdict,
|
||||
"error": sp.error,
|
||||
"raw": truncate(&sp.raw, 4000),
|
||||
},
|
||||
IngestionEvent {
|
||||
id: uuid_v7_like(),
|
||||
timestamp: ev.end_time.clone(),
|
||||
kind: "generation-create",
|
||||
body: serde_json::json!({
|
||||
"id": gen_id,
|
||||
"traceId": trace_id,
|
||||
"name": "chat",
|
||||
"model": ev.model,
|
||||
"modelParameters": serde_json::json!({
|
||||
"temperature": ev.temperature,
|
||||
"max_tokens": ev.max_tokens,
|
||||
"think": ev.think,
|
||||
}),
|
||||
"input": ev.input,
|
||||
"output": ev.output,
|
||||
"usage": serde_json::json!({
|
||||
"input": ev.prompt_tokens,
|
||||
"output": ev.completion_tokens,
|
||||
"total": ev.prompt_tokens + ev.completion_tokens,
|
||||
"unit": "TOKENS",
|
||||
}),
|
||||
"startTime": ev.start_time,
|
||||
"endTime": ev.end_time,
|
||||
"metadata": serde_json::json!({
|
||||
"provider": ev.provider,
|
||||
"latency_ms": ev.latency_ms,
|
||||
}),
|
||||
}),
|
||||
"level": level,
|
||||
"startTime": sp.start_time,
|
||||
"endTime": sp.end_time,
|
||||
}),
|
||||
}],
|
||||
},
|
||||
],
|
||||
};
|
||||
self.post_batch(batch).await
|
||||
}
|
||||
|
||||
async fn post_batch(&self, batch: IngestionBatch) -> Result<(), String> {
|
||||
let url = format!("{}{}", self.inner.base_url.trim_end_matches('/'), INGESTION_PATH);
|
||||
let resp = self.inner.http
|
||||
.post(url)
|
||||
@ -137,81 +146,6 @@ impl LangfuseClient {
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn emit_chat_inner(&self, ev: ChatTrace) -> Result<(), String> {
|
||||
// When the caller forwarded a parent trace id (via the
|
||||
// X-Lakehouse-Trace-Id header → V1State plumbing), attach the
|
||||
// generation as a child of that trace. Without a parent we
|
||||
// mint a new top-level trace per call (Phase 40 default).
|
||||
let trace_id = ev.parent_trace_id.clone().unwrap_or_else(uuid_v7_like);
|
||||
let nested = ev.parent_trace_id.is_some();
|
||||
let gen_id = uuid_v7_like();
|
||||
let trace_ts = ev.start_time.clone();
|
||||
|
||||
let mut events = Vec::with_capacity(2);
|
||||
if !nested {
|
||||
// Only mint a fresh trace-create when we don't have a parent.
|
||||
// Reusing a parent trace id without re-creating it is the
|
||||
// contract that lets validatord's iterate-session show up
|
||||
// as one tree in Langfuse.
|
||||
events.push(IngestionEvent {
|
||||
id: uuid_v7_like(),
|
||||
timestamp: trace_ts.clone(),
|
||||
kind: "trace-create",
|
||||
body: serde_json::json!({
|
||||
"id": trace_id,
|
||||
"name": format!("v1.chat:{}", ev.provider),
|
||||
"input": serde_json::json!({
|
||||
"model": ev.model,
|
||||
"messages": ev.input,
|
||||
}),
|
||||
"metadata": serde_json::json!({
|
||||
"provider": ev.provider,
|
||||
"think": ev.think,
|
||||
}),
|
||||
}),
|
||||
});
|
||||
}
|
||||
events.push(IngestionEvent {
|
||||
id: uuid_v7_like(),
|
||||
timestamp: ev.end_time.clone(),
|
||||
kind: "generation-create",
|
||||
body: serde_json::json!({
|
||||
"id": gen_id,
|
||||
"traceId": trace_id,
|
||||
"name": "chat",
|
||||
"model": ev.model,
|
||||
"modelParameters": serde_json::json!({
|
||||
"temperature": ev.temperature,
|
||||
"max_tokens": ev.max_tokens,
|
||||
"think": ev.think,
|
||||
}),
|
||||
"input": ev.input,
|
||||
"output": ev.output,
|
||||
"usage": serde_json::json!({
|
||||
"input": ev.prompt_tokens,
|
||||
"output": ev.completion_tokens,
|
||||
"total": ev.prompt_tokens + ev.completion_tokens,
|
||||
"unit": "TOKENS",
|
||||
}),
|
||||
"startTime": ev.start_time,
|
||||
"endTime": ev.end_time,
|
||||
"metadata": serde_json::json!({
|
||||
"provider": ev.provider,
|
||||
"latency_ms": ev.latency_ms,
|
||||
}),
|
||||
}),
|
||||
});
|
||||
|
||||
self.post_batch(IngestionBatch { batch: events }).await
|
||||
}
|
||||
}
|
||||
|
||||
/// Truncate a string to at most `n` chars (NOT bytes). Matches the Go
|
||||
/// `trim` helper used in session log + attempt-span emission so an
|
||||
/// operator reading two cross-runtime traces sees the same boundary.
|
||||
fn truncate(s: &str, n: usize) -> String {
|
||||
s.chars().take(n).collect()
|
||||
}
|
||||
|
||||
/// Everything the v1.chat handler collects for one completed call.
|
||||
@ -228,32 +162,6 @@ pub struct ChatTrace {
|
||||
pub start_time: String,
|
||||
pub end_time: String,
|
||||
pub latency_ms: u64,
|
||||
/// When set, attach this chat trace as a child of the named
|
||||
/// Langfuse trace instead of starting a new top-level trace. Used
|
||||
/// by `/v1/iterate` to nest its inner /v1/chat hops under the
|
||||
/// iterate-session trace so a multi-call session shows in
|
||||
/// Langfuse as ONE trace tree, not N+1 disconnected traces.
|
||||
/// Matches the Go-side `X-Lakehouse-Trace-Id` propagation
|
||||
/// (commit d6d2fdf in golangLAKEHOUSE).
|
||||
pub parent_trace_id: Option<String>,
|
||||
}
|
||||
|
||||
/// One iteration attempt inside `/v1/iterate`'s loop. Becomes one
|
||||
/// span on the parent trace when emitted via `emit_attempt_span`.
|
||||
/// Matches Go's `validator.AttemptSpan` shape so the cross-runtime
|
||||
/// observability surface is consistent.
|
||||
pub struct AttemptSpan {
|
||||
pub trace_id: String,
|
||||
pub iteration: u32,
|
||||
pub model: String,
|
||||
pub provider: String,
|
||||
pub prompt: String,
|
||||
pub raw: String,
|
||||
/// Verdict kind: "no_json" | "validation_failed" | "accepted"
|
||||
pub verdict: String,
|
||||
pub error: Option<String>,
|
||||
pub start_time: String,
|
||||
pub end_time: String,
|
||||
}
|
||||
|
||||
#[derive(Serialize)]
|
||||
|
||||
@ -21,7 +21,6 @@ pub mod opencode;
|
||||
pub mod validate;
|
||||
pub mod iterate;
|
||||
pub mod langfuse_trace;
|
||||
pub mod session_log;
|
||||
pub mod mode;
|
||||
pub mod respond;
|
||||
pub mod truth;
|
||||
@ -84,13 +83,6 @@ pub struct V1State {
|
||||
/// disabled (keys missing or container unreachable). Traces are
|
||||
/// fire-and-forget: never block the response path.
|
||||
pub langfuse: Option<langfuse_trace::LangfuseClient>,
|
||||
/// Coordinator session JSONL writer (path from
|
||||
/// `[gateway].session_log_path`). One row per `/v1/iterate`
|
||||
/// session for offline DuckDB analysis. None = disabled.
|
||||
/// Cross-runtime parity with the Go-side `validatord`
|
||||
/// `[validatord].session_log_path` (commit 1a3a82a in
|
||||
/// golangLAKEHOUSE).
|
||||
pub session_log: Option<session_log::SessionLogger>,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Serialize)]
|
||||
@ -369,7 +361,6 @@ mod resolve_provider_tests {
|
||||
|
||||
async fn chat(
|
||||
State(state): State<V1State>,
|
||||
headers: axum::http::HeaderMap,
|
||||
Json(req): Json<ChatRequest>,
|
||||
) -> Result<Json<ChatResponse>, (StatusCode, String)> {
|
||||
if req.messages.is_empty() {
|
||||
@ -499,17 +490,6 @@ async fn chat(
|
||||
let output = resp.choices.first()
|
||||
.map(|c| c.message.text())
|
||||
.unwrap_or_default();
|
||||
// Cross-runtime trace linkage. When a caller (validatord on
|
||||
// Go side, /v1/iterate on Rust side) forwards a parent trace
|
||||
// id via X-Lakehouse-Trace-Id, attach this generation to that
|
||||
// trace so the iterate session and its inner chat hops show
|
||||
// up as ONE trace tree in Langfuse. Header name matches the
|
||||
// Go-side `shared.TraceIDHeader` constant byte-for-byte.
|
||||
let parent_trace_id = headers
|
||||
.get(crate::v1::iterate::TRACE_ID_HEADER)
|
||||
.and_then(|v| v.to_str().ok())
|
||||
.map(|s| s.to_string())
|
||||
.filter(|s| !s.is_empty());
|
||||
lf.emit_chat(langfuse_trace::ChatTrace {
|
||||
provider: used_provider.clone(),
|
||||
model: resp.model.clone(),
|
||||
@ -523,7 +503,6 @@ async fn chat(
|
||||
start_time: start_time.to_rfc3339(),
|
||||
end_time: end_time.to_rfc3339(),
|
||||
latency_ms,
|
||||
parent_trace_id,
|
||||
});
|
||||
}
|
||||
|
||||
|
||||
@ -1,235 +0,0 @@
|
||||
//! Coordinator session JSONL writer — Rust parity with the Go-side
|
||||
//! `internal/validator/session_log.go` (commit 1a3a82a in
|
||||
//! golangLAKEHOUSE). Same schema, same field names, same producer
|
||||
//! semantics, so a unified longitudinal log can pull from either
|
||||
//! runtime via DuckDB.
|
||||
//!
|
||||
//! Schema: `session.iterate.v1`. One row per `/v1/iterate` session.
|
||||
//! Append-only. Best-effort posture: errors warn and the iterate
|
||||
//! response always ships.
|
||||
//!
|
||||
//! See `golangLAKEHOUSE/docs/SESSION_LOG.md` for the full schema
|
||||
//! reference + DuckDB query examples. This module produces rows
|
||||
//! with `daemon: "gateway"`; the Go side produces `daemon:
|
||||
//! "validatord"`. Operators who want a unified stream can point both
|
||||
//! to the same path (the OS write-append is atomic for the row sizes
|
||||
//! we produce) or query both files together via duckdb's `read_json`
|
||||
//! glob support.
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::sync::Arc;
|
||||
use tokio::sync::Mutex;
|
||||
|
||||
pub const SESSION_RECORD_SCHEMA: &str = "session.iterate.v1";
|
||||
|
||||
/// One row in coordinator_sessions.jsonl. Field names are the on-wire
|
||||
/// names — must stay byte-equal to the Go side's
|
||||
/// `validator.SessionRecord` (proven by the cross-runtime parity
|
||||
/// probe at golangLAKEHOUSE/scripts/cutover/parity/).
|
||||
// Deserialize is supported so the parity helper binary can round-trip
|
||||
// fixture inputs through serde without hand-rolling a parser. Production
|
||||
// emit path uses Serialize only; SessionRecord rows are written by the
|
||||
// gateway and consumed by DuckDB / external tooling, never re-read by us.
|
||||
#[derive(Serialize, Deserialize)]
|
||||
pub struct SessionRecord {
|
||||
pub schema: String,
|
||||
pub session_id: String,
|
||||
pub timestamp: String,
|
||||
pub daemon: String,
|
||||
pub kind: String,
|
||||
pub model: String,
|
||||
pub provider: String,
|
||||
pub prompt: String,
|
||||
pub iterations: u32,
|
||||
pub max_iterations: u32,
|
||||
pub final_verdict: String, // "accepted" | "max_iter_exhausted" | "infra_error"
|
||||
pub attempts: Vec<SessionAttemptRecord>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub artifact: Option<serde_json::Value>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub grounded_in_roster: Option<bool>,
|
||||
pub duration_ms: u64,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Deserialize)]
|
||||
pub struct SessionAttemptRecord {
|
||||
pub iteration: u32,
|
||||
pub verdict_kind: String, // "no_json" | "validation_failed" | "accepted" | "infra_error"
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub error: Option<String>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub span_id: Option<String>,
|
||||
}
|
||||
|
||||
/// Append-only writer. Cloneable handle — internal state is Arc'd so
|
||||
/// V1State can keep its own clone and per-request clones are cheap.
|
||||
#[derive(Clone)]
|
||||
pub struct SessionLogger {
|
||||
inner: Arc<Inner>,
|
||||
}
|
||||
|
||||
struct Inner {
|
||||
path: String,
|
||||
/// tokio::Mutex (not std) because we hold it across the async
|
||||
/// fs write. Contention is low (one row per /v1/iterate session).
|
||||
mu: Mutex<()>,
|
||||
}
|
||||
|
||||
impl SessionLogger {
|
||||
/// Construct a logger writing to `path`. Empty path → None
|
||||
/// (skip the wiring in the iterate handler entirely).
|
||||
pub fn from_path(path: &str) -> Option<Self> {
|
||||
if path.is_empty() {
|
||||
return None;
|
||||
}
|
||||
Some(Self {
|
||||
inner: Arc::new(Inner {
|
||||
path: path.to_string(),
|
||||
mu: Mutex::new(()),
|
||||
}),
|
||||
})
|
||||
}
|
||||
|
||||
/// Append one record. Best-effort: failures land in `tracing::warn!`
|
||||
/// and the caller sees Ok(()) — observability is a witness, never
|
||||
/// a gate. Returns Err only on impossible cases the type system
|
||||
/// can't rule out (here: serde_json::to_string failing on a
|
||||
/// well-formed struct, which shouldn't happen).
|
||||
pub async fn append(&self, rec: SessionRecord) {
|
||||
let body = match serde_json::to_string(&rec) {
|
||||
Ok(s) => s,
|
||||
Err(e) => {
|
||||
tracing::warn!(target: "v1.session_log", "marshal: {e}");
|
||||
return;
|
||||
}
|
||||
};
|
||||
let _guard = self.inner.mu.lock().await;
|
||||
if let Err(e) = self.write(&body).await {
|
||||
tracing::warn!(target: "v1.session_log", "write {}: {e}", self.inner.path);
|
||||
}
|
||||
}
|
||||
|
||||
async fn write(&self, body: &str) -> std::io::Result<()> {
|
||||
use tokio::fs::OpenOptions;
|
||||
use tokio::io::AsyncWriteExt;
|
||||
|
||||
// Lazy mkdir on first write so a not-yet-mounted volume at
|
||||
// startup doesn't kill the daemon.
|
||||
if let Some(parent) = std::path::Path::new(&self.inner.path).parent() {
|
||||
if !parent.as_os_str().is_empty() {
|
||||
tokio::fs::create_dir_all(parent).await?;
|
||||
}
|
||||
}
|
||||
let mut f = OpenOptions::new()
|
||||
.append(true)
|
||||
.create(true)
|
||||
.open(&self.inner.path)
|
||||
.await?;
|
||||
f.write_all(body.as_bytes()).await?;
|
||||
f.write_all(b"\n").await?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
/// Best-effort UTF-8 char truncation. Matches Go's `trim` helper so
|
||||
/// rows produced by either runtime cap fields at the same boundaries.
|
||||
pub fn truncate(s: &str, n: usize) -> String {
|
||||
s.chars().take(n).collect()
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::path::PathBuf;
|
||||
use tokio::fs;
|
||||
|
||||
fn fixture_record(session_id: &str) -> SessionRecord {
|
||||
SessionRecord {
|
||||
schema: SESSION_RECORD_SCHEMA.to_string(),
|
||||
session_id: session_id.to_string(),
|
||||
timestamp: "2026-05-02T08:00:00Z".to_string(),
|
||||
daemon: "gateway".to_string(),
|
||||
kind: "fill".to_string(),
|
||||
model: "qwen3.5:latest".to_string(),
|
||||
provider: "ollama".to_string(),
|
||||
prompt: "produce a fill artifact".to_string(),
|
||||
iterations: 1,
|
||||
max_iterations: 3,
|
||||
final_verdict: "accepted".to_string(),
|
||||
attempts: vec![SessionAttemptRecord {
|
||||
iteration: 0,
|
||||
verdict_kind: "accepted".to_string(),
|
||||
error: None,
|
||||
span_id: Some("span-0".to_string()),
|
||||
}],
|
||||
artifact: Some(serde_json::json!({"fills":[{"candidate_id":"W-1"}]})),
|
||||
grounded_in_roster: Some(true),
|
||||
duration_ms: 50,
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn from_path_empty_returns_none() {
|
||||
assert!(SessionLogger::from_path("").is_none());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn append_writes_jsonl_row_with_schema_field() {
|
||||
let dir = tempdir();
|
||||
let path = dir.join("sessions.jsonl");
|
||||
let path_str = path.to_string_lossy().to_string();
|
||||
let logger = SessionLogger::from_path(&path_str).unwrap();
|
||||
logger.append(fixture_record("trace-a")).await;
|
||||
|
||||
let body = fs::read_to_string(&path).await.unwrap();
|
||||
assert!(body.contains("\"schema\":\"session.iterate.v1\""));
|
||||
assert!(body.contains("\"session_id\":\"trace-a\""));
|
||||
assert!(body.contains("\"grounded_in_roster\":true"));
|
||||
assert!(body.ends_with('\n'));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn append_concurrent_safe() {
|
||||
let dir = tempdir();
|
||||
let path = dir.join("sessions.jsonl");
|
||||
let path_str = path.to_string_lossy().to_string();
|
||||
let logger = SessionLogger::from_path(&path_str).unwrap();
|
||||
|
||||
let n = 32;
|
||||
let mut handles = Vec::with_capacity(n);
|
||||
for i in 0..n {
|
||||
let l = logger.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
l.append(fixture_record(&format!("trace-{i}"))).await;
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
h.await.unwrap();
|
||||
}
|
||||
|
||||
let body = fs::read_to_string(&path).await.unwrap();
|
||||
let lines: Vec<_> = body.lines().filter(|l| !l.is_empty()).collect();
|
||||
assert_eq!(lines.len(), n, "expected {n} rows, got {}", lines.len());
|
||||
// Every row must round-trip through serde — a torn write
|
||||
// would surface as a parse error.
|
||||
for line in lines {
|
||||
let _: serde_json::Value = serde_json::from_str(line).expect("valid json per row");
|
||||
}
|
||||
}
|
||||
|
||||
fn tempdir() -> PathBuf {
|
||||
// Per-test unique path so prior runs don't pollute the next.
|
||||
// The static counter increments across the whole test binary,
|
||||
// so back-to-back tests in the same module get distinct dirs.
|
||||
static COUNTER: std::sync::atomic::AtomicU64 = std::sync::atomic::AtomicU64::new(0);
|
||||
let n = COUNTER.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
|
||||
let p = std::env::temp_dir().join(format!(
|
||||
"session_log_test_{}_{}_{}",
|
||||
std::process::id(),
|
||||
chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0),
|
||||
n,
|
||||
));
|
||||
std::fs::create_dir_all(&p).unwrap();
|
||||
p
|
||||
}
|
||||
}
|
||||
@ -456,26 +456,6 @@ async fn build_lance_vector_index(path: &str, _dims: usize) -> Result<()> {
|
||||
.await
|
||||
.context("create_index")?;
|
||||
|
||||
// Also build the scalar btree on doc_id. This bench's
|
||||
// measure_random_access_lance uses take(row_position) which doesn't
|
||||
// need the btree, but the dataset this bench writes is also queried
|
||||
// downstream by /vectors/lance/doc/<name>/<doc_id> (the production
|
||||
// lookup path) — without this index that path falls back to a full
|
||||
// table scan. Cheap to build (~1.2s on 10M rows) and matches the
|
||||
// gateway's lance_migrate handler behavior so bench-produced datasets
|
||||
// are immediately production-shape.
|
||||
use lance_index::scalar::ScalarIndexParams;
|
||||
dataset
|
||||
.create_index(
|
||||
&["doc_id"],
|
||||
IndexType::Scalar,
|
||||
Some("doc_id_btree".into()),
|
||||
&ScalarIndexParams::default(),
|
||||
true,
|
||||
)
|
||||
.await
|
||||
.context("create_index doc_id btree")?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
|
||||
@ -62,15 +62,6 @@ pub struct GatewayConfig {
|
||||
pub host: String,
|
||||
#[serde(default = "default_gateway_port")]
|
||||
pub port: u16,
|
||||
/// Coordinator session JSONL output path. One row per
|
||||
/// `/v1/iterate` session, schema=`session.iterate.v1`. Empty =
|
||||
/// disabled. Cross-runtime parity with the Go side's
|
||||
/// `[validatord].session_log_path` (added 2026-05-02). Default
|
||||
/// empty so existing deployments aren't perturbed; production
|
||||
/// sets `/var/lib/lakehouse/gateway/sessions.jsonl`. See
|
||||
/// `golangLAKEHOUSE/docs/SESSION_LOG.md` for query examples.
|
||||
#[serde(default)]
|
||||
pub session_log_path: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
@ -158,13 +149,7 @@ fn default_gateway_port() -> u16 { 3100 }
|
||||
fn default_storage_root() -> String { "./data".to_string() }
|
||||
fn default_profile_root() -> String { "./data/_profiles".to_string() }
|
||||
fn default_manifest_prefix() -> String { "_catalog/manifests".to_string() }
|
||||
// Post-2026-05-02: AiClient talks directly to Ollama; the Python
|
||||
// sidecar's hot-path role was retired. The config field name
|
||||
// `[sidecar].url` is preserved for migration compatibility (operators
|
||||
// with existing TOMLs don't need to rename anything), but the value
|
||||
// now points at Ollama. Lab UI / pipeline_lab Python remains as a
|
||||
// dev-only tool; not on this URL.
|
||||
fn default_sidecar_url() -> String { "http://localhost:11434".to_string() }
|
||||
fn default_sidecar_url() -> String { "http://localhost:3200".to_string() }
|
||||
fn default_embed_model() -> String { "nomic-embed-text".to_string() }
|
||||
fn default_gen_model() -> String { "qwen2.5".to_string() }
|
||||
fn default_rerank_model() -> String { "qwen2.5".to_string() }
|
||||
@ -199,11 +184,7 @@ impl Config {
|
||||
impl Default for Config {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
gateway: GatewayConfig {
|
||||
host: default_host(),
|
||||
port: default_gateway_port(),
|
||||
session_log_path: String::new(),
|
||||
},
|
||||
gateway: GatewayConfig { host: default_host(), port: default_gateway_port() },
|
||||
storage: StorageConfig {
|
||||
root: default_storage_root(),
|
||||
profile_root: default_profile_root(),
|
||||
|
||||
@ -603,210 +603,3 @@ fn row_from_batch(batch: &RecordBatch, row: usize) -> Result<Row, String> {
|
||||
|
||||
Ok(Row { doc_id, chunk_text, vector: v, source, chunk_idx })
|
||||
}
|
||||
|
||||
// =================== Tests ===================
|
||||
//
|
||||
// All tests run against a temp directory — never the production
|
||||
// data/lance/ tree. Lance reads/writes are async + filesystem-bound,
|
||||
// so we use #[tokio::test]. Each test uses a unique per-pid + per-
|
||||
// nanosecond temp dir so concurrent runs don't collide and a re-run
|
||||
// of a single test doesn't see prior state.
|
||||
//
|
||||
// Surfaced 2026-05-02 audit: vectord-lance had ZERO tests despite
|
||||
// being on the live HTTP path. These are the load-bearing locks for
|
||||
// the public API contract.
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn temp_path(label: &str) -> String {
|
||||
// Per-process atomic counter — guarantees uniqueness regardless
|
||||
// of clock resolution or test scheduling. Combined with pid, the
|
||||
// result is unique within and across processes for any practical
|
||||
// test workload. Nanosecond timestamps were not enough on their
|
||||
// own: opus WARN at lib.rs:622 from the 2026-05-02 scrum noted
|
||||
// that under tokio scheduling, multiple tests in the same cargo
|
||||
// process can hit the same nanos bucket.
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
static COUNTER: AtomicU64 = AtomicU64::new(0);
|
||||
let seq = COUNTER.fetch_add(1, Ordering::Relaxed);
|
||||
let pid = std::process::id();
|
||||
let nanos = std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.map(|d| d.subsec_nanos())
|
||||
.unwrap_or(0);
|
||||
std::env::temp_dir()
|
||||
.join(format!("vlance_test_{label}_{pid}_{nanos}_{seq}"))
|
||||
.to_string_lossy()
|
||||
.to_string()
|
||||
}
|
||||
|
||||
/// Build a minimal in-memory Parquet file matching vectord's
|
||||
/// binary-blob schema. Used as input to migrate_from_parquet_bytes.
|
||||
fn synth_parquet_bytes(n_rows: usize, dims: usize) -> Vec<u8> {
|
||||
use parquet::arrow::ArrowWriter;
|
||||
use std::io::Cursor;
|
||||
|
||||
let schema = Arc::new(Schema::new(vec![
|
||||
Field::new("source", DataType::Utf8, true),
|
||||
Field::new("doc_id", DataType::Utf8, false),
|
||||
Field::new("chunk_idx", DataType::Int32, true),
|
||||
Field::new("chunk_text", DataType::Utf8, true),
|
||||
Field::new("vector", DataType::Binary, false),
|
||||
]));
|
||||
|
||||
let sources: Vec<Option<&str>> = (0..n_rows).map(|_| Some("test")).collect();
|
||||
let doc_ids: Vec<String> = (0..n_rows).map(|i| format!("DOC-{i:04}")).collect();
|
||||
let chunk_idxs: Vec<Option<i32>> = (0..n_rows).map(|i| Some(i as i32)).collect();
|
||||
let chunk_texts: Vec<String> = (0..n_rows).map(|i| format!("synth chunk {i}")).collect();
|
||||
let vectors: Vec<Vec<u8>> = (0..n_rows).map(|i| {
|
||||
let v: Vec<f32> = (0..dims).map(|j| (i * dims + j) as f32 * 0.01).collect();
|
||||
let mut bytes = Vec::with_capacity(dims * 4);
|
||||
for f in v { bytes.extend_from_slice(&f.to_le_bytes()); }
|
||||
bytes
|
||||
}).collect();
|
||||
|
||||
let batch = RecordBatch::try_new(schema.clone(), vec![
|
||||
Arc::new(StringArray::from(sources)),
|
||||
Arc::new(StringArray::from(doc_ids)),
|
||||
Arc::new(Int32Array::from(chunk_idxs)),
|
||||
Arc::new(StringArray::from(chunk_texts)),
|
||||
Arc::new(BinaryArray::from(vectors.iter().map(|v| v.as_slice()).collect::<Vec<_>>())),
|
||||
]).expect("synth parquet batch");
|
||||
|
||||
let mut buf = Cursor::new(Vec::new());
|
||||
let mut writer = ArrowWriter::try_new(&mut buf, schema, None).expect("arrow writer");
|
||||
writer.write(&batch).expect("write batch");
|
||||
writer.close().expect("close writer");
|
||||
buf.into_inner()
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn fresh_store_reports_no_state() {
|
||||
let path = temp_path("fresh");
|
||||
let store = LanceVectorStore::new(path.clone());
|
||||
assert_eq!(store.path(), path);
|
||||
assert_eq!(store.count().await.unwrap_or(0), 0);
|
||||
assert!(!store.has_vector_index().await.unwrap_or(true));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn migrate_then_count_and_fetch() {
|
||||
let path = temp_path("migrate_fetch");
|
||||
let store = LanceVectorStore::new(path.clone());
|
||||
let bytes = synth_parquet_bytes(8, 4);
|
||||
|
||||
let stats = store.migrate_from_parquet_bytes(&bytes).await.expect("migrate");
|
||||
assert_eq!(stats.rows_written, 8);
|
||||
assert_eq!(stats.dimensions, 4);
|
||||
assert!(stats.disk_bytes > 0, "lance dataset should occupy disk");
|
||||
|
||||
assert_eq!(store.count().await.unwrap(), 8);
|
||||
|
||||
let row = store.get_by_doc_id("DOC-0003").await
|
||||
.expect("get_by_doc_id Ok").expect("DOC-0003 exists");
|
||||
assert_eq!(row.doc_id, "DOC-0003");
|
||||
assert_eq!(row.chunk_text, "synth chunk 3");
|
||||
assert_eq!(row.vector.len(), 4);
|
||||
|
||||
let _ = std::fs::remove_dir_all(&path);
|
||||
}
|
||||
|
||||
/// Load-bearing contract: get_by_doc_id distinguishes "dataset
|
||||
/// missing" (Err) from "id missing" (Ok(None)) so the HTTP
|
||||
/// handler can return 404 without inspecting error strings.
|
||||
#[tokio::test]
|
||||
async fn get_by_doc_id_missing_returns_none() {
|
||||
let path = temp_path("missing_id");
|
||||
let store = LanceVectorStore::new(path.clone());
|
||||
store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate");
|
||||
|
||||
let row = store.get_by_doc_id("DOC-NEVER-EXISTS").await.expect("Ok");
|
||||
assert!(row.is_none(), "missing id must return Ok(None), not Err");
|
||||
|
||||
let _ = std::fs::remove_dir_all(&path);
|
||||
}
|
||||
|
||||
/// Verifies the load-bearing structural-difference claim of
|
||||
/// ADR-019: Lance appends without rewriting the whole file. Row
|
||||
/// count grows; new rows are fetchable by their doc_ids.
|
||||
#[tokio::test]
|
||||
async fn append_grows_count_and_new_rows_fetchable() {
|
||||
let path = temp_path("append");
|
||||
let store = LanceVectorStore::new(path.clone());
|
||||
store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate");
|
||||
assert_eq!(store.count().await.unwrap(), 4);
|
||||
|
||||
let stats = store.append(
|
||||
Some("appended".into()),
|
||||
vec!["NEW-A".into(), "NEW-B".into()],
|
||||
vec![0, 0],
|
||||
vec!["new chunk a".into(), "new chunk b".into()],
|
||||
vec![vec![0.1, 0.2, 0.3, 0.4], vec![0.5, 0.6, 0.7, 0.8]],
|
||||
).await.expect("append");
|
||||
|
||||
assert_eq!(stats.rows_appended, 2);
|
||||
assert_eq!(store.count().await.unwrap(), 6);
|
||||
|
||||
let new_a = store.get_by_doc_id("NEW-A").await.unwrap().expect("NEW-A");
|
||||
assert_eq!(new_a.chunk_text, "new chunk a");
|
||||
assert_eq!(new_a.source.as_deref(), Some("appended"));
|
||||
|
||||
let _ = std::fs::remove_dir_all(&path);
|
||||
}
|
||||
|
||||
/// Without this guard a dim-mismatch row would land on disk and
|
||||
/// silently break search at query time.
|
||||
#[tokio::test]
|
||||
async fn append_dim_mismatch_errors() {
|
||||
let path = temp_path("dim_mismatch");
|
||||
let store = LanceVectorStore::new(path.clone());
|
||||
store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate");
|
||||
|
||||
let err = store.append(
|
||||
None, vec!["X".into(), "Y".into()], vec![0, 0],
|
||||
vec!["a".into(), "b".into()],
|
||||
vec![vec![1.0, 2.0, 3.0, 4.0], vec![1.0, 2.0]],
|
||||
).await;
|
||||
assert!(err.is_err(), "dim mismatch must error");
|
||||
let msg = err.unwrap_err();
|
||||
assert!(msg.contains("dim") || msg.contains("expected"),
|
||||
"error must mention the dimension problem; got: {msg}");
|
||||
|
||||
let _ = std::fs::remove_dir_all(&path);
|
||||
}
|
||||
|
||||
/// Search round-trip: query the exact vector for one row, top-1
|
||||
/// must be that row. Verifies the search path works on small
|
||||
/// datasets where IVF training would normally be skipped.
|
||||
#[tokio::test]
|
||||
async fn search_returns_nearest() {
|
||||
let path = temp_path("search");
|
||||
let store = LanceVectorStore::new(path.clone());
|
||||
store.migrate_from_parquet_bytes(&synth_parquet_bytes(8, 4)).await.expect("migrate");
|
||||
|
||||
let target: Vec<f32> = (0..4).map(|j| (5 * 4 + j) as f32 * 0.01).collect();
|
||||
let hits = store.search(&target, 3, None, None).await.expect("search");
|
||||
assert!(!hits.is_empty(), "search must return at least 1 hit");
|
||||
assert_eq!(hits[0].doc_id, "DOC-0005",
|
||||
"exact-vector match should be top-1; got {hits:?}");
|
||||
|
||||
let _ = std::fs::remove_dir_all(&path);
|
||||
}
|
||||
|
||||
/// stats() summarizes the dataset state in one call. Locks the
|
||||
/// field shape so downstream consumers don't break on a rename.
|
||||
#[tokio::test]
|
||||
async fn stats_reports_post_migrate_state() {
|
||||
let path = temp_path("stats");
|
||||
let store = LanceVectorStore::new(path.clone());
|
||||
store.migrate_from_parquet_bytes(&synth_parquet_bytes(5, 4)).await.expect("migrate");
|
||||
|
||||
let s = store.stats().await.expect("stats");
|
||||
assert_eq!(s.rows, 5);
|
||||
assert!(s.disk_bytes > 0);
|
||||
assert!(!s.has_vector_index, "no vector index built yet");
|
||||
|
||||
let _ = std::fs::remove_dir_all(&path);
|
||||
}
|
||||
}
|
||||
|
||||
@ -925,7 +925,7 @@ mod tests {
|
||||
reject_reason: None,
|
||||
}];
|
||||
let mut trace = PathwayTrace {
|
||||
pathway_id: pathway_id.clone(),
|
||||
pathway_id,
|
||||
task_class: "scrum_review".into(),
|
||||
file_path: format!("crates/{id_tag}/src/x.rs"),
|
||||
signal_class: Some("CONVERGING".into()),
|
||||
@ -954,14 +954,6 @@ mod tests {
|
||||
replay_count: replays,
|
||||
replays_succeeded: succ,
|
||||
retired: false,
|
||||
// Versioning fields added by Mem0 wave (commit 6ac7f61) — defaults
|
||||
// mirror "this trace is the live head with no parent/successor".
|
||||
trace_uid: format!("test-{pathway_id}"),
|
||||
version: 1,
|
||||
parent_trace_uid: None,
|
||||
superseded_at: None,
|
||||
superseded_by_trace_uid: None,
|
||||
retirement_reason: None,
|
||||
};
|
||||
trace.pathway_vec = build_pathway_vec(&trace);
|
||||
trace
|
||||
|
||||
@ -163,11 +163,7 @@ pub async fn query(
|
||||
// production caller of the Phase 21 primitives — see audit finding
|
||||
// "Phase 21 Rust primitives are wired but not CALLED by any
|
||||
// production surface" from 2026-04-21.
|
||||
// 2026-04-30 model bump: qwen2.5:latest → qwen3.5:latest to match
|
||||
// the small-model-pipeline local-tier default. Same JSON-clean
|
||||
// property, more capacity. think=Some(false) preserved — RAG hot
|
||||
// path doesn't need reasoning traces; direct answers only.
|
||||
let mut cont_opts = ContinuableOpts::new("qwen3.5:latest");
|
||||
let mut cont_opts = ContinuableOpts::new("qwen2.5:latest");
|
||||
cont_opts.max_tokens = Some(512);
|
||||
cont_opts.temperature = Some(0.2);
|
||||
cont_opts.shape = ResponseShape::Text;
|
||||
@ -180,7 +176,7 @@ pub async fn query(
|
||||
// echoes whatever Ollama loaded). Use the configured tier model
|
||||
// for now; if RAG needs to report the actual resolved model,
|
||||
// the runner can add a post-call ps probe later.
|
||||
model: "qwen3.5:latest".to_string(),
|
||||
model: "qwen2.5:latest".to_string(),
|
||||
sources: results,
|
||||
tokens_generated: None,
|
||||
})
|
||||
|
||||
@ -1855,10 +1855,10 @@ async fn lance_migrate(
|
||||
.map_err(|e| (StatusCode::NOT_FOUND, format!("read parquet: {e}")))?;
|
||||
|
||||
let lance_store = state.lance.store_for_new(&index_name, &bucket).await
|
||||
.map_err(|e| sanitize_lance_err(e, &index_name))?;
|
||||
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
|
||||
|
||||
let stats = lance_store.migrate_from_parquet_bytes(&bytes).await
|
||||
.map_err(|e| sanitize_lance_err(e, &index_name))?;
|
||||
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
|
||||
|
||||
tracing::info!(
|
||||
"lance migrate '{}': {} rows, {}d, {} bytes on disk, {:.2}s",
|
||||
@ -1866,40 +1866,11 @@ async fn lance_migrate(
|
||||
stats.disk_bytes, stats.duration_secs,
|
||||
);
|
||||
|
||||
// Auto-build the doc_id btree. The scalar index is what makes
|
||||
// get_doc_by_id O(log n) instead of a full table scan; ADR-019
|
||||
// calls this out as the load-bearing feature for hybrid lookup.
|
||||
// Verified 2026-05-02: skipping this on a 10M-row dataset turns
|
||||
// ~5ms doc-fetch into ~100ms (full scan over 35GB). Cheap to
|
||||
// build (~1.2s on 10M, +269MB on disk) and only runs once per
|
||||
// dataset since `has_scalar_index` short-circuits subsequent calls.
|
||||
let scalar_stats = if !lance_store.has_scalar_index("doc_id").await.unwrap_or(false) {
|
||||
match lance_store.build_scalar_index("doc_id").await {
|
||||
Ok(s) => {
|
||||
tracing::info!(
|
||||
"lance migrate '{}': doc_id btree built in {:.2}s (+{} bytes)",
|
||||
index_name, s.build_time_secs, s.disk_bytes_added,
|
||||
);
|
||||
Some(s)
|
||||
}
|
||||
Err(e) => {
|
||||
// Don't fail the whole migrate over a missing btree —
|
||||
// the dataset is still queryable, just slowly. Log it
|
||||
// so it's debuggable.
|
||||
tracing::warn!("lance migrate '{}': doc_id btree build failed (will fall back to scan): {e}", index_name);
|
||||
None
|
||||
}
|
||||
}
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
Ok::<_, (StatusCode, String)>(Json(serde_json::json!({
|
||||
"index_name": index_name,
|
||||
"bucket": bucket,
|
||||
"lance_path": lance_store.path(),
|
||||
"stats": stats,
|
||||
"scalar_index": scalar_stats,
|
||||
})))
|
||||
}
|
||||
|
||||
@ -1917,300 +1888,6 @@ fn default_partitions() -> u32 { 316 } // ≈√100K — sane for the referenc
|
||||
fn default_bits() -> u32 { 8 }
|
||||
fn default_subvectors() -> u32 { 48 } // 768/48 = 16 dims per subvector
|
||||
|
||||
/// Sanitize a Lance backend error before returning it to the HTTP
|
||||
/// caller. Two responsibilities:
|
||||
///
|
||||
/// 1. Map "dataset not found" patterns to HTTP 404 instead of 500.
|
||||
/// A missing index isn't an internal failure — it's a resource
|
||||
/// lookup miss, and the response code should reflect that.
|
||||
/// 2. Strip server-side filesystem paths and Rust crate registry
|
||||
/// paths (`/root/.cargo/registry/src/index.crates.io-...`) from
|
||||
/// the message body. An attacker probing the surface shouldn't
|
||||
/// learn the server's directory layout or our exact dep versions.
|
||||
///
|
||||
/// Surfaced 2026-05-02 by the Lance backend audit: missing-index
|
||||
/// search returned 500 + leaked the lakehouse data path AND the
|
||||
/// .cargo/registry path with crate versions.
|
||||
fn sanitize_lance_err(err: String, index_name: &str) -> (StatusCode, String) {
|
||||
// 404 detection — narrowed across two 2026-05-02→03 scrum waves.
|
||||
// First wave (opus WARN service.rs:1908): the original `lower.contains
|
||||
// ("not found")` was too broad — caught "column not found" /
|
||||
// "field not found in schema" which are real 500s. Second wave (opus
|
||||
// WARN service.rs:1949): the looser `mentions_path_missing` branch I
|
||||
// added would 404 on a registry-file error like "/root/.cargo/.../x.rs:
|
||||
// no such file or directory" because it triggers without dataset
|
||||
// context. Drop the standalone path-missing branch; require dataset
|
||||
// context AND a missing-shape phrase. Lance's actual error format
|
||||
// ("Dataset at path X was not found") satisfies this.
|
||||
let lower = err.to_lowercase();
|
||||
let mentions_dataset = lower.contains("dataset");
|
||||
let lance_dataset_missing = mentions_dataset && (
|
||||
lower.contains("not found") || lower.contains("does not exist")
|
||||
);
|
||||
// Excluded shapes — these contain "not found" but are real 500s.
|
||||
let column_or_field = lower.contains("column not found")
|
||||
|| lower.contains("field not found")
|
||||
|| lower.contains("schema not found");
|
||||
let is_not_found = lance_dataset_missing && !column_or_field;
|
||||
if is_not_found {
|
||||
return (StatusCode::NOT_FOUND, format!("lance dataset not found: {index_name}"));
|
||||
}
|
||||
|
||||
// Path redaction — replace path-shaped substrings with [REDACTED]
|
||||
// rather than truncating, per opus BLOCK at service.rs:1914 from the
|
||||
// 2026-05-02 scrum. The previous `err.split("/home/").next()` returned
|
||||
// Some("") when the error string STARTED with "/home/", erasing the
|
||||
// entire message and falling back to a generic "lance backend error"
|
||||
// that lost all real error context. Replacing keeps the structural
|
||||
// error (the "what failed") while stripping the location.
|
||||
let cleaned = redact_paths(&err)
|
||||
.trim_end_matches([',', ' ', '\n', '\t'])
|
||||
.to_string();
|
||||
let msg = if cleaned.is_empty() {
|
||||
format!("lance backend error on {index_name}")
|
||||
} else {
|
||||
cleaned
|
||||
};
|
||||
(StatusCode::INTERNAL_SERVER_ERROR, msg)
|
||||
}
|
||||
|
||||
/// Replace absolute-path substrings (under known leak-prone roots) with
|
||||
/// "[REDACTED]". Walks the input once, identifying path-shaped runs that
|
||||
/// start with one of the configured prefixes and continue until a
|
||||
/// path-terminating character (whitespace, quote, comma, paren, EOL).
|
||||
///
|
||||
/// Linear time, no regex dep. Catches multi-occurrence cases that
|
||||
/// `String::split(p).next()` lost. The path-redaction surface intentionally
|
||||
/// includes /var, /tmp, /etc, /usr, /opt in addition to /home and
|
||||
/// /root/.cargo because Lance/Arrow errors surface system paths in
|
||||
/// addition to project paths.
|
||||
fn redact_paths(s: &str) -> String {
|
||||
// Two prefix sets:
|
||||
// - ABSOLUTE: paths starting with '/' (always safe to redact)
|
||||
// - RELATIVE: same path bodies but without leading '/' (Lance occasionally
|
||||
// strips the leading slash when echoing dataset paths back, observed
|
||||
// live 2026-05-02 — "Dataset at path home/profit/lakehouse/data/lance/x
|
||||
// was not found"). Match these only when preceded by a non-alpha char
|
||||
// (start of string, space, colon, etc.) so we don't redact innocent
|
||||
// tokens like "homecoming" or "etcetera".
|
||||
const ABSOLUTE: &[&str] = &[
|
||||
"/root/.cargo", "/home", "/var", "/tmp", "/etc", "/usr", "/opt",
|
||||
];
|
||||
const RELATIVE: &[&str] = &[
|
||||
"root/.cargo", "home/", "var/", "tmp/", "etc/", "usr/", "opt/",
|
||||
];
|
||||
fn is_path_term(b: u8) -> bool {
|
||||
matches!(b, b' ' | b'\t' | b'\n' | b'\r' | b'"' | b'\'' | b',' | b')' | b']' | b'}')
|
||||
}
|
||||
fn is_word_boundary_before(bytes: &[u8], i: usize) -> bool {
|
||||
// True if byte at i-1 is non-alphanumeric (so this position starts
|
||||
// a fresh token). True at start-of-input.
|
||||
if i == 0 { return true; }
|
||||
let b = bytes[i - 1];
|
||||
!(b.is_ascii_alphanumeric() || b == b'_' || b == b'.' || b == b'-')
|
||||
}
|
||||
// Walk by byte index but slice the original &str when emitting, never
|
||||
// cast bytes to char (that would corrupt multi-byte UTF-8 — opus WARN
|
||||
// at service.rs:2018 from the 2026-05-03 re-scrum). Path prefixes are
|
||||
// pure ASCII so byte-level matching is sound; what matters is that
|
||||
// we emit non-matched stretches as &str slices, not byte-by-byte.
|
||||
let bytes = s.as_bytes();
|
||||
let mut out = String::with_capacity(s.len());
|
||||
let mut i = 0;
|
||||
let mut copy_start = 0usize; // start of an in-progress unmatched run
|
||||
while i < bytes.len() {
|
||||
let mut matched_len: Option<usize> = None;
|
||||
// Try absolute prefixes first (always allowed).
|
||||
for p in ABSOLUTE {
|
||||
let pb = p.as_bytes();
|
||||
if i + pb.len() <= bytes.len() && &bytes[i..i + pb.len()] == pb {
|
||||
let after = i + pb.len();
|
||||
if after == bytes.len() || bytes[after] == b'/' || is_path_term(bytes[after]) {
|
||||
matched_len = Some(pb.len());
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
// Then relative prefixes — only at word boundaries.
|
||||
if matched_len.is_none() && is_word_boundary_before(bytes, i) {
|
||||
for p in RELATIVE {
|
||||
let pb = p.as_bytes();
|
||||
if i + pb.len() <= bytes.len() && &bytes[i..i + pb.len()] == pb {
|
||||
matched_len = Some(pb.len());
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
if let Some(prefix_len) = matched_len {
|
||||
// Flush any pending unmatched run as a UTF-8-safe slice.
|
||||
if copy_start < i {
|
||||
out.push_str(&s[copy_start..i]);
|
||||
}
|
||||
out.push_str("[REDACTED]");
|
||||
// Skip past the prefix and the path body (until terminator).
|
||||
let mut j = i + prefix_len;
|
||||
while j < bytes.len() && !is_path_term(bytes[j]) {
|
||||
j += 1;
|
||||
}
|
||||
i = j;
|
||||
copy_start = i;
|
||||
} else {
|
||||
// Advance one CHAR (not one byte) so multi-byte UTF-8 sequences
|
||||
// stay intact in the eventual slice. Look up the next char
|
||||
// boundary using the public API.
|
||||
i += utf8_char_len(bytes, i);
|
||||
}
|
||||
}
|
||||
if copy_start < bytes.len() {
|
||||
out.push_str(&s[copy_start..]);
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
/// Length in bytes of the UTF-8 character starting at byte `i`. Bytes are
|
||||
/// guaranteed to be a valid UTF-8 sequence start (callers ensure that).
|
||||
fn utf8_char_len(bytes: &[u8], i: usize) -> usize {
|
||||
let b = bytes[i];
|
||||
if b < 0x80 { 1 }
|
||||
else if b < 0xC0 { 1 } // continuation byte — defensive, shouldn't start here
|
||||
else if b < 0xE0 { 2 }
|
||||
else if b < 0xF0 { 3 }
|
||||
else { 4 }
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod sanitize_tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn redact_path_at_offset_zero() {
|
||||
// Regression: opus BLOCK 2026-05-02. Old impl returned Some("")
|
||||
// when err started with "/home/", erasing the whole message.
|
||||
let out = redact_paths("/home/profit/lakehouse/data/lance not a directory");
|
||||
assert_eq!(out, "[REDACTED] not a directory");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redact_keeps_pre_and_post_text() {
|
||||
let out = redact_paths("failed to open /home/profit/lakehouse/data/x for read: ENOENT");
|
||||
assert_eq!(out, "failed to open [REDACTED] for read: ENOENT");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redact_multiple_paths() {
|
||||
let out = redact_paths("at /root/.cargo/registry/src/index.crates.io-foo/lance-table-4.0.0/src/io/commit.rs:364:26 from /home/profit/lakehouse");
|
||||
assert!(!out.contains("/root/.cargo"));
|
||||
assert!(!out.contains("/home/"));
|
||||
assert!(out.contains("[REDACTED]"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redact_preserves_quote_terminator() {
|
||||
let out = redact_paths("{\"path\":\"/home/profit/x\",\"err\":\"bad\"}");
|
||||
assert_eq!(out, "{\"path\":\"[REDACTED]\",\"err\":\"bad\"}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn is_not_found_narrow_dataset_only() {
|
||||
// Regression: opus WARN 2026-05-02. Old impl 404'd on any "not
|
||||
// found" — including legitimate column/field-not-found 500s.
|
||||
let (status, _) = sanitize_lance_err(
|
||||
"column not found: vector".into(), "test_idx",
|
||||
);
|
||||
assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR);
|
||||
|
||||
let (status, _) = sanitize_lance_err(
|
||||
"dataset not found at /home/profit/lakehouse/data/lance/missing".into(), "test_idx",
|
||||
);
|
||||
assert_eq!(status, StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redact_does_not_match_prefix_substring() {
|
||||
// /etcetera should NOT trigger /etc redaction.
|
||||
let out = redact_paths("etcetera and /etcd");
|
||||
assert_eq!(out, "etcetera and /etcd");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redact_relative_paths_lance_emits() {
|
||||
// 2026-05-02: live missing-index probe surfaced Lance error of the
|
||||
// form "Dataset at path home/profit/lakehouse/data/lance/x was not
|
||||
// found" — leading slash stripped. Need to redact the relative form
|
||||
// when preceded by a word boundary.
|
||||
let out = redact_paths("Dataset at path home/profit/lakehouse/data/lance/x was not found");
|
||||
assert!(!out.contains("home/profit"), "should redact: {out}");
|
||||
assert!(out.contains("Dataset at path"));
|
||||
assert!(out.contains("was not found"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redact_does_not_eat_innocent_prefix_words() {
|
||||
// "homecoming" must NOT trigger "home/" redaction. "Etcetera" must
|
||||
// NOT trigger "etc/" redaction. The word-boundary guard handles this.
|
||||
let out = redact_paths("homecoming etcetera vary tmpfile");
|
||||
assert_eq!(out, "homecoming etcetera vary tmpfile");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn is_not_found_lance_actual_phrasing() {
|
||||
// Lance's actual error format observed live: "Dataset at path X was
|
||||
// not found: Not found: ...". Must 404, not 500.
|
||||
let (status, _) = sanitize_lance_err(
|
||||
"Dataset at path home/profit/lakehouse/data/lance/x was not found".into(),
|
||||
"x",
|
||||
);
|
||||
assert_eq!(status, StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn is_not_found_excludes_column_field_schema() {
|
||||
// Real 500s with the "not found" phrase that aren't dataset-missing.
|
||||
for err in [
|
||||
"column not found: vector",
|
||||
"field not found in schema: doc_id",
|
||||
"schema not found for dataset xyz",
|
||||
] {
|
||||
let (status, _) = sanitize_lance_err(err.into(), "test_idx");
|
||||
assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR, "{err}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn is_not_found_does_not_match_unrelated_path_missing() {
|
||||
// Regression: opus WARN at service.rs:1949 from the 2026-05-03
|
||||
// re-scrum. A registry-file error from inside a Lance internal
|
||||
// module should NOT be coerced to 404 just because it contains
|
||||
// "no such file or directory" — it's a real 500.
|
||||
let (status, _) = sanitize_lance_err(
|
||||
"/root/.cargo/registry/src/index.crates.io-foo/lance-table-4.0.0/src/io/commit.rs: no such file or directory".into(),
|
||||
"test_idx",
|
||||
);
|
||||
assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR);
|
||||
// (And the path is still redacted in the message.)
|
||||
let (_, msg) = sanitize_lance_err(
|
||||
"/root/.cargo/registry/src/lance-foo/x.rs: no such file or directory".into(),
|
||||
"test_idx",
|
||||
);
|
||||
assert!(!msg.contains("/root/.cargo"), "path leak: {msg}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redact_preserves_multibyte_utf8() {
|
||||
// Regression: opus WARN at service.rs:2018 from the 2026-05-03
|
||||
// re-scrum. Old impl did `out.push(bytes[i] as char)` which
|
||||
// corrupted multi-byte UTF-8 (e.g. a path containing user-supplied
|
||||
// names with non-ASCII characters) into Latin-1 mojibake.
|
||||
let input = "Failed to open /home/profit/工作/data — café not found";
|
||||
let out = redact_paths(input);
|
||||
// The path is redacted...
|
||||
assert!(!out.contains("/home/profit"), "path leak: {out}");
|
||||
// ...AND the multi-byte characters elsewhere are preserved verbatim.
|
||||
assert!(out.contains("café"), "lost UTF-8: {out}");
|
||||
assert!(out.contains("not found"), "lost trailing context: {out}");
|
||||
}
|
||||
}
|
||||
|
||||
/// Build the IVF_PQ index on the Lance dataset.
|
||||
async fn lance_build_index(
|
||||
State(state): State<VectorState>,
|
||||
@ -2218,10 +1895,10 @@ async fn lance_build_index(
|
||||
Json(req): Json<LanceIndexRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let lance_store = state.lance.store_for(&index_name).await
|
||||
.map_err(|e| sanitize_lance_err(e, &index_name))?;
|
||||
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
|
||||
match lance_store.build_index(req.num_partitions, req.num_bits, req.num_sub_vectors).await {
|
||||
Ok(stats) => Ok(Json(stats)),
|
||||
Err(e) => Err(sanitize_lance_err(e, &index_name)),
|
||||
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
|
||||
}
|
||||
}
|
||||
|
||||
@ -2270,13 +1947,13 @@ async fn lance_search(
|
||||
let qv: Vec<f32> = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect();
|
||||
|
||||
let lance_store = state.lance.store_for(&index_name).await
|
||||
.map_err(|e| sanitize_lance_err(e, &index_name))?;
|
||||
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
|
||||
|
||||
let t0 = std::time::Instant::now();
|
||||
let nprobes = req.nprobes.or(Some(LANCE_DEFAULT_NPROBES));
|
||||
let refine = req.refine_factor.or(Some(LANCE_DEFAULT_REFINE_FACTOR));
|
||||
let hits = lance_store.search(&qv, req.top_k, nprobes, refine).await
|
||||
.map_err(|e| sanitize_lance_err(e, &index_name))?;
|
||||
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
|
||||
|
||||
Ok(Json(serde_json::json!({
|
||||
"index_name": index_name,
|
||||
@ -2294,7 +1971,7 @@ async fn lance_get_doc(
|
||||
Path((index_name, doc_id)): Path<(String, String)>,
|
||||
) -> impl IntoResponse {
|
||||
let lance_store = state.lance.store_for(&index_name).await
|
||||
.map_err(|e| sanitize_lance_err(e, &index_name))?;
|
||||
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
|
||||
let t0 = std::time::Instant::now();
|
||||
match lance_store.get_by_doc_id(&doc_id).await {
|
||||
Ok(Some(row)) => Ok(Json(serde_json::json!({
|
||||
@ -2304,7 +1981,7 @@ async fn lance_get_doc(
|
||||
"row": row,
|
||||
}))),
|
||||
Ok(None) => Err((StatusCode::NOT_FOUND, format!("doc_id not found: {doc_id}"))),
|
||||
Err(e) => Err(sanitize_lance_err(e, &index_name)),
|
||||
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
|
||||
}
|
||||
}
|
||||
|
||||
@ -2336,7 +2013,7 @@ async fn lance_append(
|
||||
return Err((StatusCode::BAD_REQUEST, "rows array is empty".into()));
|
||||
}
|
||||
let lance_store = state.lance.store_for(&index_name).await
|
||||
.map_err(|e| sanitize_lance_err(e, &index_name))?;
|
||||
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
|
||||
|
||||
let mut doc_ids = Vec::with_capacity(req.rows.len());
|
||||
let mut chunk_idxs = Vec::with_capacity(req.rows.len());
|
||||
|
||||
@ -1,46 +0,0 @@
|
||||
# Lakehouse: Rust vs Go architecture comparison
|
||||
|
||||
> **Source of truth lives in the golangLAKEHOUSE repo:**
|
||||
> [`/home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`](file:///home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md)
|
||||
>
|
||||
> J's living document — pulled from there into this repo's docs as
|
||||
> a pointer so the comparison is reachable from either side.
|
||||
|
||||
## Why the source lives in golangLAKEHOUSE
|
||||
|
||||
The Go rewrite was the trigger for the comparison. The doc updates as
|
||||
J ships fixes on either side, and most of the open backlog items
|
||||
(materializer port, replay port, validators network surface) land in
|
||||
the Go repo. Keeping the source there means PR auditing on Go
|
||||
commits also catches doc drift.
|
||||
|
||||
## When to update from this side
|
||||
|
||||
If a fix lands in the Rust repo that changes a comparison value
|
||||
(e.g. embed cache change, sidecar drop, new validator), update both:
|
||||
|
||||
1. The source at `/home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`
|
||||
2. The change log section at the bottom of the same file
|
||||
|
||||
This file is a pointer — **do not put authoritative content here.**
|
||||
Drift between two copies wastes the discipline.
|
||||
|
||||
## Quick links
|
||||
|
||||
- **Decisions tracker** — section near the top of the source file.
|
||||
Lists actioned items + open backlog with LOC estimates.
|
||||
- **Performance numbers** — Python dependency section. Updated each
|
||||
time a load test is rerun.
|
||||
- **Distillation porting status** — table of phase-by-phase port
|
||||
state across runtimes.
|
||||
- **Recommendation** — current working hypothesis on Go-primary vs
|
||||
Rust-primary. Subject to change as fixes ship.
|
||||
|
||||
## Last known state
|
||||
|
||||
- **2026-05-01**: Rust embed cache shipped (`150cc3b`), 236× RPS gain.
|
||||
- **2026-05-01**: Go validator port shipped (`b03521a`), production
|
||||
safety net now on Go side.
|
||||
- **Open**: Drop Rust Python sidecar (~200 LOC, universal-win).
|
||||
- **Open**: Port Rust materializer to Go (~500-800 LOC, unblocks
|
||||
Go-only end-to-end pipeline).
|
||||
@ -1,107 +0,0 @@
|
||||
# Phase Audit Guidance for Claude Code
|
||||
|
||||
## Purpose
|
||||
This document provides the proper workflow for auditing completed phases in the Lakehouse project.
|
||||
|
||||
## ⚠️ Important: Do NOT Skip Steps
|
||||
Each phase requires BOTH:
|
||||
1. PRD spec verification (check code exists)
|
||||
2. Full SCRUM execution (6 commands)
|
||||
|
||||
## Proper Phase Audit Workflow
|
||||
|
||||
### Step 1: Read PRD Specification
|
||||
For each phase, read the PRD to understand what's supposed to ship:
|
||||
```bash
|
||||
# Read from docs/PRD.md or docs/PHASES.md
|
||||
cat docs/PHASES.md | grep -A20 "Phase N:"
|
||||
```
|
||||
|
||||
### Step 2: Verify Code Exists
|
||||
Check that each deliverable from the PRD spec has corresponding code:
|
||||
```bash
|
||||
# Example - check for specific implementations
|
||||
grep -r "function_name" crates/*/src/
|
||||
ls crates/*/src/*.rs
|
||||
```
|
||||
|
||||
### Step 3: Run Full SCRUM (6 Commands)
|
||||
In order, execute ALL of these for the phase's crates:
|
||||
|
||||
```bash
|
||||
# 1. Build
|
||||
cargo build -p <crate-name>
|
||||
|
||||
# 2. Test
|
||||
cargo test -p <crate-name>
|
||||
|
||||
# 3. Clippy (if installed)
|
||||
cargo clippy -p <crate-name> -- -D warnings
|
||||
|
||||
# 4. Format check
|
||||
cargo fmt -p <crate-name> -- --check
|
||||
|
||||
# 5. Cargo check
|
||||
cargo check -p <crate-name>
|
||||
|
||||
# 6. Doc check
|
||||
cargo doc -p <crate-name> --no-deps
|
||||
```
|
||||
|
||||
### Step 4: Fix Issues
|
||||
If any SCRUM command fails:
|
||||
- Fix the code
|
||||
- Re-run the failing command
|
||||
- Re-run ALL 6 commands to verify
|
||||
|
||||
### Step 5: Update Phase Documentation
|
||||
Only mark as ✅ after ALL 6 SCRUM commands pass:
|
||||
```markdown
|
||||
## Phase N: [Name] ✅
|
||||
- [x] spec item 1
|
||||
- [x] spec item 2
|
||||
- SCRUM: build ✅ test ✅ clippy ✅ fmt ✅ check ✅ doc ✅
|
||||
```
|
||||
|
||||
## Current Phase Status
|
||||
|
||||
| Phase | Status | Notes |
|
||||
|-------|--------|-------|
|
||||
| 0 | ✅ | Bootstrap complete |
|
||||
| 1 | ✅ | Storage + Catalog |
|
||||
| 2 | ✅ | Query Engine |
|
||||
| 3 | ✅ | AI Integration |
|
||||
| 4 | ✅ | Frontend |
|
||||
| 5 | ✅ | Hardening |
|
||||
| 6-42 | ✅ | See docs/PHASES.md |
|
||||
|
||||
## Notes from Previous Session
|
||||
|
||||
- Clippy and rustfmt are NOT installed on this system
|
||||
- Run `rustup component add clippy rustfmt` to install
|
||||
- Some crates have 0 unit tests (expected for service crates)
|
||||
- 28 warnings remain in unused code paths (ui/vectord)
|
||||
|
||||
## Key Files
|
||||
|
||||
- `docs/PHASES.md` - Phase tracker with checkboxes
|
||||
- `docs/PRD.md` - Full product requirements
|
||||
- `docs/CONTROL_PLANE_PRD.md` - Phases 38+ specifications
|
||||
- `crates/*/` - All crate implementations
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# Full workspace SCRUM
|
||||
cargo build --workspace
|
||||
cargo test --workspace
|
||||
# (clippy if installed)
|
||||
cargo fmt -- --check
|
||||
cargo check --workspace
|
||||
cargo doc --no-deps
|
||||
|
||||
# Per-crate
|
||||
cargo build -p <crate>
|
||||
cargo test -p <crate>
|
||||
cargo check -p <crate>
|
||||
```
|
||||
@ -3,15 +3,6 @@
|
||||
[gateway]
|
||||
host = "0.0.0.0"
|
||||
port = 3100
|
||||
# Coordinator session JSONL — one row per /v1/iterate session for
|
||||
# offline DuckDB analysis. Cross-runtime parity with the Go-side
|
||||
# [validatord].session_log_path. Set to the SAME path Go validatord
|
||||
# writes to so DuckDB queries see one unified longitudinal stream
|
||||
# across both runtimes (rows are tagged daemon="gateway" vs
|
||||
# daemon="validatord" so producers stay distinguishable). Append-write
|
||||
# is atomic at the row sizes both runtimes produce — both daemons
|
||||
# co-writing is safe.
|
||||
session_log_path = "/tmp/lakehouse-validator/sessions.jsonl"
|
||||
|
||||
[storage]
|
||||
root = "./data"
|
||||
@ -53,22 +44,12 @@ manifest_prefix = "_catalog/manifests"
|
||||
# max_rows_per_query = 10000
|
||||
|
||||
[sidecar]
|
||||
# Post-2026-05-02: AiClient talks directly to Ollama; the Python
|
||||
# sidecar's hot-path role (~120 LOC of pure Ollama wrappers) was
|
||||
# retired. Field name kept for migration compat — value now points
|
||||
# at Ollama on :11434. Lab UI + pipeline_lab Python remains as a
|
||||
# dev-only tool, NOT on this URL.
|
||||
url = "http://localhost:11434"
|
||||
url = "http://localhost:3200"
|
||||
|
||||
[ai]
|
||||
embed_model = "nomic-embed-text"
|
||||
# Local-tier defaults bumped 2026-04-30: qwen3.5:latest is the
|
||||
# stronger local rung in the 5-loop substrate (per
|
||||
# project_small_model_pipeline_vision.md). Same JSON-clean property
|
||||
# as qwen2.5, more capacity. Ollama still serves both — bump back
|
||||
# in this file if a workload regressed.
|
||||
gen_model = "qwen3.5:latest"
|
||||
rerank_model = "qwen3.5:latest"
|
||||
gen_model = "qwen2.5"
|
||||
rerank_model = "qwen2.5"
|
||||
|
||||
[auth]
|
||||
enabled = false
|
||||
@ -91,9 +72,7 @@ min_recall = 0.9 # never promote below this
|
||||
max_trials_per_hour = 20 # hard budget cap
|
||||
|
||||
# Model roster — available for profile hot-swap
|
||||
# qwen3.5:latest: stronger local rung — JSON-clean, 8K+ context,
|
||||
# default for gen_model and rerank_model
|
||||
# qwen3: 8.2B, 40K context, thinking+tools, best for reasoning tasks
|
||||
# qwen2.5: 7B, 8K context, fast — kept loaded for the 2026-04 era
|
||||
# comparison runs; new defaults use qwen3.5:latest
|
||||
# qwen2.5: 7B, 8K context, fast, good for SQL generation
|
||||
# mistral: 7B, 8K context, good for general generation
|
||||
# nomic-embed-text: 137M, embedding-only, used by all profiles
|
||||
|
||||
@ -23,13 +23,6 @@ import { roleBand, SCENES, SCENES_VERSION, FACE_RENDER_DIM, type RoleBand } from
|
||||
import { ICONS, ICONS_VERSION, DEFAULT_NEGATIVE, certToSlug, type IconRecipe } from "./icon_recipes.js";
|
||||
|
||||
const BASE = process.env.LAKEHOUSE_URL || "http://localhost:3100";
|
||||
// G5 cutover prep (2026-05-01): when GO_LAKEHOUSE_URL is set, the
|
||||
// `/_go/*` pass-through routes to the Go gateway. Off-by-default
|
||||
// (empty = disabled, returns 503 on /_go/*). Reversible via unset.
|
||||
// Doesn't modify any existing tool — additive parallel path so
|
||||
// operators can validate Go-side handlers under real Bun-frontend
|
||||
// shape without touching the production Rust path.
|
||||
const GO_BASE = process.env.GO_LAKEHOUSE_URL || "";
|
||||
const PORT = parseInt(process.env.MCP_PORT || "3700");
|
||||
|
||||
// ─── Staffer roster — used by the per-staffer hot-swap index (G). ────
|
||||
@ -320,9 +313,9 @@ ${(buckets as any[] || []).map((b: any) => `- ${b.name}: ${b.backend} (${b.reach
|
||||
- Ollama: :11434
|
||||
|
||||
## Available Models
|
||||
- qwen3.5:latest: stronger local rung, JSON-clean (default for gen + rerank)
|
||||
- qwen3: 8.2B, 40K context, thinking+tools (best for reasoning)
|
||||
- qwen2.5: 7B, 8K context (legacy — 2026-04 era comparison runs only)
|
||||
- qwen2.5: 7B, 8K context (best for fast SQL generation)
|
||||
- mistral: 7B, 8K context (general generation)
|
||||
- nomic-embed-text: 137M (embedding, automatic)
|
||||
`;
|
||||
return { contents: [{ uri: uri.href, mimeType: "text/plain", text }] };
|
||||
@ -718,26 +711,6 @@ async function main() {
|
||||
return new Response(await r.text(), { status: r.status, headers: { "Content-Type": "application/json" } });
|
||||
}
|
||||
|
||||
// G5 cutover slice: pass-through to GO_BASE (Go gateway). Same
|
||||
// shape as /api/* but points at the Go side. Returns 503 when
|
||||
// GO_LAKEHOUSE_URL isn't set so callers know the cutover slice
|
||||
// is off rather than silently routing to Rust. No body
|
||||
// transformation — caller responsible for sending Go-shaped
|
||||
// requests (e.g. /v1/embed not /ai/embed).
|
||||
if (url.pathname.startsWith("/_go/")) {
|
||||
if (!GO_BASE) {
|
||||
return err("GO_LAKEHOUSE_URL not set; /_go/* cutover slice is disabled", 503);
|
||||
}
|
||||
const path = url.pathname.replace("/_go", "");
|
||||
const body = req.method !== "GET" ? await req.text() : undefined;
|
||||
try {
|
||||
const r = await fetch(`${GO_BASE}${path}`, { method: req.method, headers: { "Content-Type": "application/json" }, body });
|
||||
return new Response(await r.text(), { status: r.status, headers: { "Content-Type": "application/json" } });
|
||||
} catch (e) {
|
||||
return err(`Go gateway unreachable at ${GO_BASE}: ${e}`, 502);
|
||||
}
|
||||
}
|
||||
|
||||
// Proof — narrative HTML served from mcp-server/proof.html.
|
||||
// Live tests consumed client-side via /proof.json.
|
||||
if (url.pathname === "/proof") {
|
||||
|
||||
@ -146,16 +146,15 @@ async function persistOp(op: ObservedOp) {
|
||||
// ─── LLM Team escalation (code_review mode) ───
|
||||
//
|
||||
// When recent failures on a single sig_hash cross a threshold the
|
||||
// local-model analysis is probably insufficient. J's 2026-04-24
|
||||
// local qwen2.5 analysis is probably insufficient. J's 2026-04-24
|
||||
// direction: "the observer would trigger to give more context" —
|
||||
// route failure clusters to LLM Team's specialized code_review mode
|
||||
// (via /api/run) so richer structured signal lands in the KB for
|
||||
// scrum + auditor + playbook memory to consume next pass.
|
||||
//
|
||||
// Non-destructive: runs in parallel to the existing local diagnose
|
||||
// call (qwen3.5:latest after the 2026-04-30 bump), never replaces
|
||||
// it. Writes to data/_kb/observer_escalations.jsonl as a dedicated
|
||||
// audit surface.
|
||||
// Non-destructive: runs in parallel to the existing qwen2.5 analysis,
|
||||
// never replaces it. Writes to data/_kb/observer_escalations.jsonl
|
||||
// as a dedicated audit surface.
|
||||
|
||||
const LLM_TEAM = process.env.LH_LLM_TEAM_URL ?? "http://localhost:5000";
|
||||
const LLM_TEAM_ESCALATIONS = "/home/profit/lakehouse/data/_kb/observer_escalations.jsonl";
|
||||
@ -543,7 +542,7 @@ async function analyzeErrors() {
|
||||
if (failures.length === 0) return;
|
||||
|
||||
// NEW 2026-04-24: escalate recurring sig_hash clusters to LLM Team
|
||||
// code_review mode. Runs in parallel to the local diagnose call
|
||||
// code_review mode. Runs in parallel to the local qwen2.5 analysis
|
||||
// below — non-blocking, richer downstream signal for scrum/auditor.
|
||||
maybeEscalate(failures).catch(() => {});
|
||||
|
||||
@ -553,14 +552,13 @@ async function analyzeErrors() {
|
||||
|
||||
// Ask local model to diagnose. Phase 44 migration (2026-04-27):
|
||||
// /v1/chat instead of legacy /ai/generate so /v1/usage tracks the
|
||||
// call + Langfuse traces it. 2026-04-30 model bump: qwen2.5 →
|
||||
// qwen3.5:latest to match the small-model-pipeline local-tier default.
|
||||
// call + Langfuse traces it. Same upstream model (qwen2.5 local).
|
||||
try {
|
||||
const resp = await fetch(`${LAKEHOUSE}/v1/chat`, {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({
|
||||
model: "qwen3.5:latest",
|
||||
model: "qwen2.5",
|
||||
provider: "ollama",
|
||||
messages: [{
|
||||
role: "user",
|
||||
@ -771,7 +769,7 @@ async function tailOverseerCorrections(): Promise<number> {
|
||||
try { row = JSON.parse(line); } catch { continue; }
|
||||
const op: ObservedOp = {
|
||||
timestamp: row.created_at ?? new Date().toISOString(),
|
||||
endpoint: `overseer:${row.model ?? "claude-opus-4-7"}`,
|
||||
endpoint: `overseer:${row.model ?? "gpt-oss:120b"}`,
|
||||
input_summary: `${row.task_class ?? "?"}: ${row.reason ?? "escalation"}`,
|
||||
// Correction itself is neither success nor failure — it's a
|
||||
// mitigation attempt. We mark success=true so analyzeErrors
|
||||
|
||||
@ -1,28 +0,0 @@
|
||||
[Unit]
|
||||
Description=Lakehouse Langfuse → observer bridge — forwards LLM trace metadata to :3800 so KB learns from cost/latency/provider deltas
|
||||
Documentation=file:///home/profit/lakehouse/mcp-server/langfuse_bridge.ts
|
||||
After=network.target
|
||||
# No hard dependency on either Langfuse or observer — if either is down,
|
||||
# the bridge retries on the next tick without crashing. That's the
|
||||
# whole point of the cursor state file.
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
WorkingDirectory=/home/profit/lakehouse
|
||||
ExecStart=/home/profit/.bun/bin/bun run /home/profit/lakehouse/mcp-server/langfuse_bridge.ts
|
||||
Restart=on-failure
|
||||
RestartSec=30
|
||||
# Credentials resolved from env. Matches how
|
||||
# crates/gateway/src/v1/langfuse_trace.rs reads them so both producer
|
||||
# (gateway emitter) and consumer (this bridge) share the same config.
|
||||
EnvironmentFile=-/etc/lakehouse/langfuse.env
|
||||
Environment=LANGFUSE_URL=http://localhost:3001
|
||||
Environment=OBSERVER_URL=http://localhost:3800
|
||||
Environment=LANGFUSE_POLL_MS=30000
|
||||
Environment=LANGFUSE_BATCH_LIMIT=50
|
||||
Environment=LANGFUSE_STATE_FILE=/var/lib/lakehouse-guard/langfuse_last_seen.json
|
||||
KillSignal=SIGTERM
|
||||
TimeoutStopSec=5
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
@ -1,5 +0,0 @@
|
||||
{
|
||||
"dependencies": {
|
||||
"langfuse": "^3.38.20"
|
||||
}
|
||||
}
|
||||
@ -1,6 +1,6 @@
|
||||
# Phase 6 — Acceptance Gate Report
|
||||
|
||||
**Run:** 2026-04-27T15:43:37.943Z
|
||||
**Run:** 2026-04-27T04:54:32.225Z
|
||||
**Fixture:** `tests/fixtures/distillation/acceptance/`
|
||||
**Temp root:** `/tmp/distillation_phase6_acceptance`
|
||||
**Pipeline run_ids:** `acceptance-run-1-stable` (first) + `acceptance-run-2-stable` (second / hash reproducibility)
|
||||
@ -40,13 +40,13 @@
|
||||
| 19 | scratchpad/tree-split case: fixture row materialized into evidence | found | found | ✓ |
|
||||
| 20 | PRD drift case: fixture row materialized | found | found | ✓ |
|
||||
| 21 | hash reproducibility: per-stage output_hash identical across runs | 0 mismatches | all match | ✓ |
|
||||
| 22 | hash reproducibility: run_hash identical | 8dfdacee62380ec2... | 8dfdacee62380ec2... | ✓ |
|
||||
| 22 | hash reproducibility: run_hash identical | 3ea12b160ee9099a... | 3ea12b160ee9099a... | ✓ |
|
||||
|
||||
## Hash reproducibility detail
|
||||
|
||||
run 1 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`
|
||||
run 1 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
|
||||
|
||||
run 2 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`
|
||||
run 2 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
|
||||
|
||||
**Bit-for-bit identical.** Two runs of the entire pipeline on the same fixture with the same `recorded_at` produce the same outputs. Distillation is deterministic.
|
||||
|
||||
|
||||
@ -1,8 +1,8 @@
|
||||
# Phase 8 — Full System Audit Report
|
||||
|
||||
**Run:** 2026-04-27T15:43:38.021Z
|
||||
**Git commit:** ca7375ea2b178159a0c61bbf62788a2ffa2390e9
|
||||
**Baseline:** 2026-04-27T10:31:44.043Z (d11632a6fae6)
|
||||
**Run:** 2026-04-27T04:54:32.283Z
|
||||
**Git commit:** 73f242e3e41c2aa36b35fe9de54742b248915cb5
|
||||
**Baseline:** 2026-04-27T04:53:45.796Z (5bdd159966e6)
|
||||
|
||||
## Result: **PASS** ✓
|
||||
|
||||
@ -26,7 +26,7 @@
|
||||
| 1 | P0 | recon doc exists | Y | docs/recon/local-distillation-recon.md present | present | ✓ |
|
||||
| 2 | P0 | tier-1 source streams present | — | all 4 tier-1 jsonls on disk | all present | ✓ |
|
||||
| 3 | P1 | schema validators pass on fixtures | Y | ≥40 tests, 0 fail | 51 pass, 0 fail | ✓ |
|
||||
| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1139 read · 82 written · 2 skipped | ✓ |
|
||||
| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1073 read · 16 written · 2 skipped | ✓ |
|
||||
| 5 | P2 | tier-1 sources each materialize ≥1 row | — | 4/4: distilled_facts, scrum_reviews, audit_facts, mode_experiments | 1/4 hit (mode_experiments) | ✓ |
|
||||
| 6 | P3 | on-disk scored-runs distribution non-empty | Y | >=1 accepted | acc=386 part=132 rej=57 hum=480 | ✓ |
|
||||
| 7 | P3 | scored-runs distribution sums positive | — | >0 total | 1055 total | ✓ |
|
||||
@ -38,19 +38,19 @@
|
||||
| 13 | P5 | latest run (3fa51d66-784c-4c7d-843d-6c48328a608c) has all 5 stage receipts | Y | collect,score,export-rag,export-sft,export-preference | all present | ✓ |
|
||||
| 14 | P5 | every stage receipt validates against schema | Y | 0 invalid | 0 invalid | ✓ |
|
||||
| 15 | P5 | RunSummary validates | Y | valid | valid | ✓ |
|
||||
| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: ca7375ea2b17...) | ✓ |
|
||||
| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: 73f242e3e41c...) | ✓ |
|
||||
| 17 | P5 | run_hash is sha256 | Y | /^[0-9a-f]{64}$/ | 2336b96c3638982d... | ✓ |
|
||||
| 18 | P6 | acceptance gate passes 22/22 invariants on fixture | Y | PASS — 22/22 | 22/22 (exit=0) | ✓ |
|
||||
| 19 | P7 | replay validation passes on 3/3 dry-run sample tasks | Y | 3/3 | 3/3 | ✓ |
|
||||
| 20 | P7 | replay retrieval surfaces ≥1 playbook on each task (when corpus present) | — | ≥1 task with retrieval | 3/3 | ✓ |
|
||||
| 21 | P7 | escalation loop guard: no path > 2 models | Y | 0 loops | 0 | ✓ |
|
||||
| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 27 rows total | ✓ |
|
||||
| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 21 rows total | ✓ |
|
||||
|
||||
## Drift vs prior baseline
|
||||
|
||||
| Metric | Baseline | Current | Δ% | Flag |
|
||||
|---|---|---|---|---|
|
||||
| p2_evidence_rows | 25 | 82 | 228% | warn |
|
||||
| p2_evidence_rows | 15 | 16 | 7% | ok |
|
||||
| p2_evidence_skips | 2 | 2 | 0% | ok |
|
||||
| p3_accepted | 386 | 386 | 0% | ok |
|
||||
| p3_partial | 132 | 132 | 0% | ok |
|
||||
@ -61,7 +61,7 @@
|
||||
| p4_pref_pairs | 83 | 83 | 0% | ok |
|
||||
| p4_total_quarantined | 1325 | 1325 | 0% | ok |
|
||||
|
||||
**1 metric(s) drifted >20% from baseline.** Investigate before treating outputs as stable.
|
||||
All metrics within 20% of baseline — pipeline stable across runs.
|
||||
|
||||
## System health status
|
||||
|
||||
|
||||
@ -1,45 +0,0 @@
|
||||
# Kimi Forensic Audit (FULL FILES) — distillation v1.0.0
|
||||
|
||||
**Generated:** 2026-04-27 by `kimi-for-coding` via gateway /v1/chat
|
||||
**Latency:** 270.6s | **finish:** stop | **usage:** {'prompt_tokens': 66338, 'completion_tokens': 10159, 'total_tokens': 76497}
|
||||
**Input:** /tmp/kimi-audit-full.md (238KB · 12 commits · 15 files · line-numbered, no truncation)
|
||||
|
||||
---
|
||||
|
||||
## Verdict
|
||||
**Hold**: the substrate’s TypeScript pipeline is architecturally coherent and the SFT firewall is genuine, but committed Rust tests fail to compile, drift detection hardcodes an unverified integrity assertion, and deterministic guarantees leak wall-clock time in multiple places.
|
||||
|
||||
## What's solid
|
||||
- **Three-layer SFT contamination firewall is real.** Schema enum restricts `quality_score` to `["accepted", "partially_accepted"]` (`sft_sample.ts:13,62`), exporter constant `SFT_NEVER` blocks rejected/needs_human_review before synthesis (`export_sft.ts:51,205`), and `receipts.ts` re-reads the output to fail loud if any forbidden score leaked (`receipts.ts:231-236`).
|
||||
- **Core scorer is pure and deterministic.** `scoreRecord` takes an `EvidenceRecord`, performs no I/O, no LLM calls, and uses no mutable state (`scorer.ts:1-5,257-273`).
|
||||
- **Quarantine is exhaustive and observable.** Every exporter routes skips to structured `exports/quarantine/<exporter>.jsonl` with typed reasons; silent drops are impossible by construction (`quarantine.ts:1-6,14-26`).
|
||||
- **Evidence provenance is mandatory on every row.** Every `EvidenceRecord` carries `source_file`, `line_offset`, `sig_hash`, and `recorded_at` (`build_evidence_index.ts:27-34`).
|
||||
- **Local-first replay reduces cloud calls.** `replay.ts` defaults to a local model, augments via RAG retrieval, and only escalates on validation failure, directly supporting the cloud-call reduction claim (`replay.ts:24,349-376`).
|
||||
|
||||
## What's risky
|
||||
1. **receipts.ts:495** hardcodes `input_hash_match: true` in drift reports while comments on lines 467-469 admit input-hash comparison is unimplemented; this is false telemetry in a forensic system.
|
||||
2. **score_runs.ts:159** deduplicates scored runs by `scored.provenance.sig_hash` (the *evidence* hash), not by a composite of evidence + scorer version, so scorer logic or `SCORER_VERSION` updates are silently ignored on re-runs against existing partition files.
|
||||
3. **transforms.ts:181** `auto_apply` transform falls back to `new Date().toISOString()` when `row.ts` is missing, injecting wall-clock time into the supposedly deterministic materialization layer.
|
||||
4. **mode.rs:1035,1042** Rust test code assigns `Some("...".into())` and `None` to a `Vec<String>` field (`matrix_corpus`), which would fail `cargo test` compilation; this contradicts the claim that the tag is fully tested.
|
||||
5. **export_sft.ts:109-133** synthesizes fake instruction templates per source stem instead of using actual historical prompts; the SFT firewall prevents category contamination but not prompt-fidelity distortion.
|
||||
|
||||
## Specific findings
|
||||
- **mode.rs:1035** — Compile error in test helper: `matrix_corpus: Some("distilled_procedural_v1".into())` mismatches the `Vec<String>` type declared at line 172. **Rationale:** Direct struct construction in the test module uses an `Option` where a `Vec` is required, so the Rust test suite cannot compile.
|
||||
- **receipts.ts:495** — Drift detection hardcodes `input_hash_match: true`. **Rationale:** The adjacent comment admits input-hash comparison is simplified and unimplemented (lines 467-469); asserting a verified match is misleading telemetry that will hide real input-side regressions.
|
||||
- **score_runs.ts:159** — Scored-run dedup ignores scorer version. **Rationale:** `loadSeenHashes` and the skip logic key only on the EvidenceRecord `sig_hash`, meaning an existing scored-run file from yesterday will block updated scores even if `SCORER_VERSION` or scorer logic changed today.
|
||||
- **transforms.ts:181** — Non-deterministic timestamp fallback in `auto_apply` transform. **Rationale:** `row.ts ?? new Date().toISOString()` injects wall-clock time when the source row lacks a timestamp, violating the header claim that transforms are “deterministic by construction” and breaking bit-identical reproducibility for that stream.
|
||||
- **export_sft.ts:126** — Unsafe property access via `as any`. **Rationale:** `(ev as any).contractor` bypasses the `EvidenceRecord` type contract; if the property is absent the template silently emits `"<contractor>"`, degrading SFT data quality without a type error.
|
||||
- **scorer.ts:30** — Environmental dependency in deterministic scorer. **Rationale:** `process.env.LH_SCORER_VERSION` means identical evidence inputs produce different `scorer_version` stamps (and different downstream receipts) depending on the runtime environment, undermining bit-identical claims.
|
||||
- **replay.ts:378** — Non-deterministic run identifier. **Rationale:** `` `replay:${task_hash.slice(0, 16)}:${Date.now()}` `` makes replay evidence rows non-reproducible and risks collision under rapid successive calls.
|
||||
- **export_sft.ts:109-133** — Synthetic instruction generation replaces ground-truth prompts. **Rationale:** The exporter fabricates instruction strings from metadata (e.g., hardcoded scrum review phrasing) rather than retrieving the actual historical prompt, so the resulting SFT dataset trains on reconstructed, not authentic, user instructions.
|
||||
|
||||
## Direction recommendation
|
||||
**Pause the staffing audit and harden the substrate first.** Before building the staffing inference mode (`staffing_inference_lakehouse` in `mode.rs:54`) on top of this substrate:
|
||||
|
||||
1. Fix the Rust test compile errors (`mode.rs:1035,1042`) and ensure `cargo test` runs in CI.
|
||||
2. Replace the hardcoded `input_hash_match: true` in drift detection (`receipts.ts:495`) with a real hash comparison or remove the field until it is implemented.
|
||||
3. Change scored-run dedup (`score_runs.ts:159`) to key on a composite hash of `evidence_sig_hash + scorer_version + SCORER_VERSION` so scorer updates force re-scoring.
|
||||
4. Remove the `new Date().toISOString()` fallback in `transforms.ts:181` or fail the row so determinism is preserved.
|
||||
5. Audit all `as any` casts in the export layer (`export_sft.ts:126`) for type-safe alternatives.
|
||||
|
||||
Once those fixes land and acceptance re-runs pass, proceed to the staffing audit wave; the architecture is sound enough to support it, but the forensic guarantees must be honest before downstream teams depend on them.
|
||||
@ -1,36 +0,0 @@
|
||||
# Kimi Forensic Audit — distillation v1.0.0 (last week)
|
||||
**Generated:** 2026-04-27 by `kimi-for-coding` via gateway /v1/chat
|
||||
**Latency:** 157.6s | **finish:** stop | **usage:** {'prompt_tokens': 14014, 'completion_tokens': 6356, 'total_tokens': 20370}
|
||||
**Input:** /tmp/kimi-audit-input.md (56k chars · 12 commits · 6 files)
|
||||
|
||||
---
|
||||
|
||||
## Verdict
|
||||
**hold** — Runtime lock-in, integration mismatches, and truncated source files in the v1.0.0 payload make the tag unshippable without rework.
|
||||
|
||||
## What's solid
|
||||
- `scorer.ts` is a pure, deterministic function with no I/O, no LLM calls, and an explicit version stamp (`scorer.ts:22`).
|
||||
- SFT export enforces defense-in-depth contamination firewalls via `SFT_NEVER` and schema validators (`export_sft.ts:49-50`; `sft_sample.ts:43-48`).
|
||||
- Evidence materialization is idempotent across reruns using `sig_hash` deduplication (`build_evidence_index.ts:114-126`).
|
||||
- Mode router falls back to a safe built-in default if config parsing fails (`mode.rs:208-228`).
|
||||
- Quarantine writer abstraction isolates bad records instead of failing the export (`export_sft.ts`).
|
||||
|
||||
## What's risky
|
||||
- **Integration mismatch**: `replay.ts` posts to `/v1/chat`, but the provided gateway only declares `/v1/mode` and `/v1/mode/execute` (`replay.ts:186` vs `mode.rs:13-18`), suggesting an undocumented or broken proxy contract.
|
||||
- **Bun runtime lock-in**: Multiple files depend on `Bun.CryptoHasher`, which throws in Node.js (`export_sft.ts:235`; `build_evidence_index.ts:89`).
|
||||
- **Unauditable files in scope**: Critical files listed in the diff—`transforms.ts`, `receipts.ts`, `quarantine.ts`, `score_runs.ts`—were not provided, so their logic is unseen.
|
||||
- **Every shown implementation file is truncated**: `scorer.ts`, `export_sft.ts`, `build_evidence_index.ts`, `replay.ts`, and `mode.rs` all end mid-block, hiding error handling, receipt finalization, and gateway dispatch code.
|
||||
- **Type safety escape**: `(ev as any).contractor` in SFT synthesis bypasses the schema layer (`export_sft.ts:138`).
|
||||
|
||||
## Specific findings
|
||||
1. `scripts/distillation/scorer.ts:22` — `SCORER_VERSION` reads from `process.env`, introducing environment-dependent output drift that contradicts the file’s “identical input → identical output forever” contract.
|
||||
2. `scripts/distillation/export_sft.ts:138` — `(ev as any).contractor` is an unguarded `any` cast; a malformed `EvidenceRecord` will inject the string `"undefined"` or crash at runtime inside the SFT instruction template.
|
||||
3. `scripts/distillation/export_sft.ts:235` — `new Bun.CryptoHasher("sha256")` is a Bun-only API; this path will fail under Node.js/Deno and makes the substrate non-portable.
|
||||
4. `scripts/distillation/build_evidence_index.ts:89` — Same Bun crypto lock-in in `sha256OfFile`, fragmenting the hashing implementation (here `Bun.CryptoHasher`, elsewhere `canonicalSha256`).
|
||||
5. `scripts/distillation/replay.ts:178` — Provider routing relies on fragile string heuristics (`model.includes("/")`, prefix lists); models with unexpected names will route to the wrong backend or hit the `ollama` default incorrectly.
|
||||
6. `scripts/distillation/replay.ts:186` — `fetch(`${gatewayUrl()}/v1/chat`` targets an endpoint absent from the provided `mode.rs` router; without the missing gateway dispatch code, this call will 404.
|
||||
7. `crates/gateway/src/v1/mode.rs:141` — `deserialize_string_or_vec` uses `serde_json::Value::deserialize` against a TOML source, which is non-idiomatic and risks mis-handling TOML-specific types (datetime, inline tables) compared to a native `toml::Value`.
|
||||
8. `scripts/distillation/build_evidence_index.ts:185` — `await canonicalSha256(row)` is async, yet `sha256OfFile` is sync; the mixing of sync/async crypto calls in the same module hints at inconsistent I/O boundaries.
|
||||
|
||||
## Direction recommendation
|
||||
Keep the substrate architecture, but **do not expand staffing audit work on top of v1.0.0 until three blockers are fixed**: (1) replace `Bun.CryptoHasher` with portable WebCrypto or Node `crypto` so the build is runtime-agnostic; (2) align `replay.ts` to the actual gateway contract (`/v1/mode/execute`) or document the `/v1/chat` proxy route; and (3) eliminate `any` casts in the export path. The schema firewalls, deterministic scorer, and receipt provenance are the right foundation—rework the runtime/contract gaps rather than rebuilding from scratch.
|
||||
@ -1,116 +0,0 @@
|
||||
# Lance backend re-benchmark — 10M vectors (scale_test_10m)
|
||||
|
||||
**Date:** 2026-05-02
|
||||
**Dataset:** `data/lance/scale_test_10m` (33 GB, ~10M vectors, 768d)
|
||||
**Driver:** live HTTP gateway `:3100/vectors/lance/*` (post sanitizer-fix binary)
|
||||
**Method tag on every search response:** `lance_ivf_pq` (confirms IVF_PQ, not brute-force)
|
||||
|
||||
ADR-019 deferred a 10M re-bench: *"at 10M we expect Lance to pull ahead because HNSW doesn't fit in RAM. Re-benchmark when we have a 10M-vector corpus to test against."* The corpus exists; this is that benchmark.
|
||||
|
||||
## Search latency, 10 diverse queries, top_k=10 (cold)
|
||||
|
||||
| Query | Latency |
|
||||
|---|---:|
|
||||
| warehouse forklift operator second shift | 50.5ms |
|
||||
| senior software engineer kubernetes | 52.9ms |
|
||||
| registered nurse pediatric | 37.6ms |
|
||||
| welder TIG aluminum | **127.7ms** |
|
||||
| data scientist python | 41.6ms |
|
||||
| electrician journeyman commercial | 31.4ms |
|
||||
| accountant CPA tax | 28.6ms |
|
||||
| machine learning research | 32.1ms |
|
||||
| construction site supervisor | 31.8ms |
|
||||
| biomedical engineer | 25.0ms |
|
||||
|
||||
Median ~32ms, mean ~46ms, one ~128ms outlier (TIG aluminum query — not investigated; could be query-specific IVF traversal pattern or transient I/O).
|
||||
|
||||
## Search latency, repeated query (warm cache)
|
||||
|
||||
Same query (`forklift operator`) hit 5 times in a row:
|
||||
|
||||
| Call | Latency |
|
||||
|---|---:|
|
||||
| 1 | 21.9ms |
|
||||
| 2 | 20.2ms |
|
||||
| 3 | 19.2ms |
|
||||
| 4 | 22.4ms |
|
||||
| 5 | 18.6ms |
|
||||
|
||||
**Warm-cache p50 ~20ms.** Stable across the 5 trials.
|
||||
|
||||
## Doc-fetch by id, 5 calls (post-warmup) — BEFORE scalar-index fix
|
||||
|
||||
Fetched the same doc_id (`VEC-2196862`) repeatedly:
|
||||
|
||||
| Call | Latency |
|
||||
|---|---:|
|
||||
| 1 | 68.2ms |
|
||||
| 2 | 89.3ms |
|
||||
| 3 | 153.9ms |
|
||||
| 4 | 126.5ms |
|
||||
| 5 | 140.7ms |
|
||||
|
||||
**~100ms p50, climbing under repeat.** Substantially slower than the 100K-corpus number from ADR-019 (311μs claimed; ~6ms measured today on workers_500k_v1).
|
||||
|
||||
### Root cause (investigated post-bench)
|
||||
|
||||
`/vectors/lance/stats/scale_test_10m` returned `has_doc_id_index: false`. The scalar btree on `doc_id` was **never built** for this dataset. Doc-fetch was running a full table scan over 35GB.
|
||||
|
||||
Cause: the auto-build code in `crates/vectord/src/service.rs:1492-1503` only fires for `IndexMeta`-registered indexes during `set_active_profile` warming. `scale_test_10m` was created by the `lance-bench` binary directly via the migrate HTTP route — it bypasses the IndexMeta registry, so warming never sees it, so neither the vector index nor the scalar index gets auto-built. (The vector index was built manually via `/vectors/lance/index/scale_test_10m`; the scalar index never was.)
|
||||
|
||||
### Doc-fetch by id, 5 calls — AFTER `POST /vectors/lance/scalar-index/scale_test_10m/doc_id`
|
||||
|
||||
Build took **1.22s** for 10M rows, added 269MB of btree on disk.
|
||||
|
||||
| Call | Latency |
|
||||
|---|---:|
|
||||
| 1 | 5.6ms |
|
||||
| 2 | 5.0ms |
|
||||
| 3 | 5.0ms |
|
||||
| 4 | 4.9ms |
|
||||
| 5 | 4.7ms |
|
||||
|
||||
**~5ms p50, stable.** ~20x improvement. Matches workers_500k_v1's ~6ms baseline.
|
||||
|
||||
ADR-019's "O(1) random access via btree" claim is structurally vindicated. The 311μs projection from the 100K bench was an in-process Rust call; the live HTTP/JSON round-trip floor is ~5ms regardless of dataset size.
|
||||
|
||||
### Followup: close the IndexMeta-bypass gap
|
||||
|
||||
The `lance-bench` binary writes datasets that the rest of the gateway can't see. Two reasonable fixes:
|
||||
1. **Auto-build scalar index inside `lance_migrate` HTTP handler** — every dataset created via the migrate route gets the btree before returning. Costs 1-2 seconds at ingest time, saves 100ms per doc-fetch forever after.
|
||||
2. **Have `lance-bench` register an IndexMeta entry** at the end of its run, so the existing warming code picks it up on next gateway start.
|
||||
|
||||
Recommendation: do (1). It's a one-line addition next to the existing `build_index` call inside the handler, and it makes the migrate route self-sufficient — no caller needs to remember a follow-up build call.
|
||||
|
||||
## Compared to ADR-019 100K projections
|
||||
|
||||
| Op | 100K (ADR-019) | 10M (today) | Notes |
|
||||
|---|---:|---:|---|
|
||||
| Search (cold) | 2229μs | ~46ms | 21x slower at 100x scale → reasonable for IVF_PQ |
|
||||
| Search (warm) | (not measured) | ~20ms | Warm cache converges nicely |
|
||||
| Doc fetch (no btree) | — | ~100ms | full scan, 35GB |
|
||||
| Doc fetch (post btree build) | 311μs | ~5ms | structural win confirmed; HTTP/JSON floor explains delta |
|
||||
| Index method | lance_ivf_pq | lance_ivf_pq | confirmed via response tag |
|
||||
|
||||
## What this means
|
||||
|
||||
ADR-019's claim that "at 10M, Lance pulls ahead because HNSW doesn't fit in RAM" remains **unverified-but-not-refuted**. We can't directly compare to HNSW at 10M because HNSW's RAM footprint at 10M × 768d × 4 bytes = ~30 GB just for vectors, double that for the graph — way past any single-node deployment. So Lance "wins" at 10M by being the only contender that operationally exists.
|
||||
|
||||
What the bench DID surface:
|
||||
- **Search at 10M works at production-shape latency** (~20ms warm). Acceptable for batch / async / non-conversational workloads. Too slow for sub-10ms voice or recommendation paths.
|
||||
- **Doc-fetch at 10M is fast (~5ms) once the scalar btree is built.** Pre-build was ~100ms (full scan). Built in 1.2s, +269MB on disk. ADR-019's structural claim holds.
|
||||
- **The auto-build only fires for IndexMeta-registered datasets.** `lance-bench` bypasses IndexMeta, so its datasets need either a manual `POST /vectors/lance/scalar-index/<name>/doc_id` after migration, or a one-line fix to the `lance_migrate` handler that builds the btree inline. Recommend the inline fix.
|
||||
- **Sanitizer fix held under load** — no 500-with-leak surfaced even on rare query pattern (TIG aluminum). The fix is robust to long-tail queries.
|
||||
|
||||
## Repro
|
||||
|
||||
```bash
|
||||
# Search latency, single query
|
||||
curl -sS -X POST http://127.0.0.1:3100/vectors/lance/search/scale_test_10m \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"query":"forklift operator","top_k":10}' | jq '.latency_us'
|
||||
|
||||
# Doc fetch by id
|
||||
curl -sS http://127.0.0.1:3100/vectors/lance/doc/scale_test_10m/VEC-2196862 \
|
||||
| jq '.latency_us'
|
||||
```
|
||||
@ -1,536 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# ------------------------------------------------------------
|
||||
# End-to-end pipeline verification for Lakehouse.
|
||||
#
|
||||
# Generates realistic staffing-style data, runs it through every
|
||||
# shipped pipeline stage, asserts correctness at each step, and
|
||||
# cleans up after itself.
|
||||
#
|
||||
# Stages exercised:
|
||||
# 0. Preflight — gateway + sidecar reachability
|
||||
# 1. Data generation — 1000 candidates, 200 placements, 10 resumes
|
||||
# 2. CSV ingest — Phase 6.1 (via ?name= query param)
|
||||
# 3. NDJSON ingest — Phase 6.2
|
||||
# 4. SQL queries + joins — Phase 2, Phase 8 hot cache
|
||||
# 5. Content-hash re-ingest dedup — Phase 6.4
|
||||
# 6. Idempotent register — ADR-020 (same-fingerprint path)
|
||||
# 7. Schema-drift rejection — ADR-020 (409 Conflict path)
|
||||
# 8. Catalog dedupe no-op — ADR-020 (clean state)
|
||||
# 9. Metadata enrichment — Phase 10 POST
|
||||
# 10. PII auto-detection audit — Phase 10
|
||||
# 11. Vector index + search — Phase 7 (documents pulled via SQL)
|
||||
# 12. Cleanup + baseline verify — no-orphan guarantee
|
||||
#
|
||||
# Usage:
|
||||
# ./scripts/e2e_pipeline_check.sh # run all stages
|
||||
# SKIP_VECTOR=1 ./scripts/e2e_pipeline_check.sh # skip Ollama-bound steps
|
||||
# KEEP_DATA=1 ./scripts/e2e_pipeline_check.sh # leave /tmp artifacts
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 all assertions passed
|
||||
# 1 one or more assertions failed
|
||||
# 2 preflight failed (service unreachable)
|
||||
# ------------------------------------------------------------
|
||||
|
||||
set -u
|
||||
set -o pipefail
|
||||
|
||||
GATEWAY="${GATEWAY:-http://localhost:3100}"
|
||||
SIDECAR="${SIDECAR:-http://localhost:3200}"
|
||||
WORKDIR="${WORKDIR:-/tmp/lakehouse_e2e}"
|
||||
DATA_ROOT="${DATA_ROOT:-/home/profit/lakehouse/data}"
|
||||
SKIP_VECTOR="${SKIP_VECTOR:-0}"
|
||||
KEEP_DATA="${KEEP_DATA:-0}"
|
||||
|
||||
RUN_ID="e2e_$(date +%s)"
|
||||
CAND_DS="${RUN_ID}_candidates"
|
||||
PLACE_DS="${RUN_ID}_placements"
|
||||
RESUME_DS="${RUN_ID}_resumes"
|
||||
VEC_IDX="${RESUME_DS}_v1"
|
||||
|
||||
# Color names use a CC_ prefix so they can't be shadowed by single-letter
|
||||
# local variables like `R` that hold curl responses elsewhere in the script.
|
||||
if [[ -t 1 ]]; then
|
||||
CC_GRN=$'\033[0;32m'; CC_RED=$'\033[0;31m'; CC_YLW=$'\033[1;33m'
|
||||
CC_BLU=$'\033[1;34m'; CC_DIM=$'\033[2m'; CC_RST=$'\033[0m'
|
||||
else
|
||||
CC_GRN=''; CC_RED=''; CC_YLW=''; CC_BLU=''; CC_DIM=''; CC_RST=''
|
||||
fi
|
||||
|
||||
PASS=0; FAIL=0; WARN=0; STARTED_AT=$(date +%s)
|
||||
FAILURES=()
|
||||
|
||||
pass() { printf ' %s✓%s %s\n' "$CC_GRN" "$CC_RST" "$1"; PASS=$((PASS+1)); }
|
||||
fail() { printf ' %s✗%s %s\n' "$CC_RED" "$CC_RST" "$1"; FAIL=$((FAIL+1)); FAILURES+=("$1"); }
|
||||
warn() { printf ' %s!%s %s\n' "$CC_YLW" "$CC_RST" "$1"; WARN=$((WARN+1)); }
|
||||
step() { printf '\n%s== %s ==%s\n' "$CC_BLU" "$1" "$CC_RST"; }
|
||||
info() { printf ' %s%s%s\n' "$CC_DIM" "$1" "$CC_RST"; }
|
||||
die() { printf '%sFATAL: %s%s\n' "$CC_RED" "$1" "$CC_RST" >&2; cleanup; exit 2; }
|
||||
|
||||
assert_eq() {
|
||||
if [[ "$1" == "$2" ]]; then pass "$3 ($1)"; else fail "$3: got '$1', expected '$2'"; fi
|
||||
}
|
||||
|
||||
http_code() {
|
||||
local method="$1" path="$2" data="${3:-}"
|
||||
if [[ -n "$data" ]]; then
|
||||
curl -s -o /dev/null -w '%{http_code}' -X "$method" "$GATEWAY$path" \
|
||||
-H 'Content-Type: application/json' -d "$data"
|
||||
else
|
||||
curl -s -o /dev/null -w '%{http_code}' -X "$method" "$GATEWAY$path"
|
||||
fi
|
||||
}
|
||||
|
||||
# query_scalar <sql> -> first column of first row as string, sentinel on empty/error
|
||||
query_scalar() {
|
||||
local sql="$1"
|
||||
local payload
|
||||
payload=$(python3 -c 'import json,sys; print(json.dumps({"sql": sys.argv[1]}))' "$sql")
|
||||
curl -s -X POST "$GATEWAY/query/sql" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d "$payload" \
|
||||
| python3 -c '
|
||||
import sys, json
|
||||
try:
|
||||
r = json.load(sys.stdin)
|
||||
except Exception:
|
||||
print("__PARSE_ERROR__"); sys.exit(0)
|
||||
if isinstance(r, dict) and "error" in r:
|
||||
sys.stderr.write("query error: " + str(r["error"]) + "\n")
|
||||
print("__ERROR__"); sys.exit(0)
|
||||
rows = r.get("rows") if isinstance(r, dict) else None
|
||||
if not rows:
|
||||
print("__NO_ROWS__"); sys.exit(0)
|
||||
row = rows[0]
|
||||
print(next(iter(row.values())))
|
||||
'
|
||||
}
|
||||
|
||||
cleanup() {
|
||||
[[ "$KEEP_DATA" == "1" ]] && { info "KEEP_DATA=1 — leaving $WORKDIR"; return; }
|
||||
info "cleaning up test datasets for $RUN_ID"
|
||||
|
||||
# Catch any previous-run zombies too: any catalog entry whose name
|
||||
# starts with "e2e_" is definitionally ours. Using DELETE (added for
|
||||
# this script's needs) purges both the live registry and the manifest
|
||||
# file atomically, so the next run doesn't trip on zombie entries
|
||||
# pointing at parquets we've already rm'd.
|
||||
local names
|
||||
names=$(curl -s "$GATEWAY/catalog/datasets" 2>/dev/null \
|
||||
| python3 -c "
|
||||
import sys, json
|
||||
try: ds = json.load(sys.stdin)
|
||||
except Exception: sys.exit(0)
|
||||
for d in ds:
|
||||
if d['name'].startswith('e2e_'):
|
||||
print(d['name'])
|
||||
" 2>/dev/null || true)
|
||||
local removed=0
|
||||
for n in $names; do
|
||||
curl -s -o /dev/null -X DELETE "$GATEWAY/catalog/datasets/by-name/$n" && removed=$((removed+1))
|
||||
done
|
||||
|
||||
# Delete any stray parquet + vector artifacts we can positively
|
||||
# attribute to an e2e_ prefix.
|
||||
rm -f "$DATA_ROOT/datasets/"e2e_*.parquet 2>/dev/null || true
|
||||
rm -f "$DATA_ROOT/vectors/"e2e_*.parquet 2>/dev/null || true
|
||||
rm -rf "$WORKDIR" 2>/dev/null || true
|
||||
info "deleted $removed e2e datasets (covers this run + any prior zombies)"
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
# ============================================================
|
||||
# 0. Preflight
|
||||
# ============================================================
|
||||
step "0. Preflight"
|
||||
|
||||
curl -sf -m 3 "$GATEWAY/health" >/dev/null 2>&1 || die "gateway not reachable at $GATEWAY"
|
||||
pass "gateway /health (200)"
|
||||
|
||||
SIDECAR_UP=0
|
||||
if curl -sf -m 3 "$SIDECAR/health" >/dev/null 2>&1; then
|
||||
SIDECAR_UP=1; pass "sidecar /health (200)"
|
||||
else
|
||||
warn "sidecar unreachable — vector stage will be skipped"
|
||||
SKIP_VECTOR=1
|
||||
fi
|
||||
|
||||
# Purge any e2e_* zombies from prior runs (stale registry entries that
|
||||
# would otherwise break DataFusion schema inference for every query).
|
||||
ZOMBIES=$(curl -s "$GATEWAY/catalog/datasets" 2>/dev/null \
|
||||
| python3 -c "
|
||||
import sys, json
|
||||
try: ds = json.load(sys.stdin)
|
||||
except Exception: sys.exit(0)
|
||||
for d in ds:
|
||||
if d['name'].startswith('e2e_'):
|
||||
print(d['name'])
|
||||
" 2>/dev/null || true)
|
||||
if [[ -n "$ZOMBIES" ]]; then
|
||||
ZCOUNT=$(echo "$ZOMBIES" | wc -l | tr -d ' ')
|
||||
for n in $ZOMBIES; do
|
||||
curl -s -o /dev/null -X DELETE "$GATEWAY/catalog/datasets/by-name/$n"
|
||||
done
|
||||
info "pre-cleaned $ZCOUNT e2e_ zombies from prior runs"
|
||||
fi
|
||||
|
||||
BASELINE=$(curl -s "$GATEWAY/catalog/datasets" | python3 -c 'import sys,json; print(len(json.load(sys.stdin)))')
|
||||
info "baseline dataset count: $BASELINE"
|
||||
|
||||
# ============================================================
|
||||
# 1. Generate realistic data
|
||||
# ============================================================
|
||||
step "1. Generate realistic staffing data"
|
||||
|
||||
mkdir -p "$WORKDIR"
|
||||
# Seed with RUN_ID (which embeds the wall-clock timestamp) so each run
|
||||
# produces different content. Otherwise the content-hash dedup from
|
||||
# Phase 6.4 keys off a stale hash that lingers in the live registry
|
||||
# until the next gateway restart, and subsequent runs silently dedupe.
|
||||
python3 - "$WORKDIR" "$RUN_ID" <<'PYEOF'
|
||||
import csv, json, random, sys, os
|
||||
workdir, run_id = sys.argv[1], sys.argv[2]
|
||||
# Mix RUN_ID into the seed so content differs per run, but keep it
|
||||
# deterministic within a single run.
|
||||
random.seed(hash(run_id) & 0x7FFFFFFF)
|
||||
|
||||
FIRST = ['Aisha','Brandon','Carlos','Daria','Eli','Fiona','Gabriel','Hana','Ian','Julia',
|
||||
'Kofi','Lena','Mateo','Nadia','Oscar','Priya','Quinn','Raj','Sofia','Tomas',
|
||||
'Uma','Victor','Wendy','Xander','Yuki','Zara']
|
||||
LAST = ['Adams','Brown','Chen','Davis','Evans','Fisher','Garcia','Hughes','Ibrahim','Johnson',
|
||||
'Kim','Lopez','Martinez','Nguyen','Ortiz','Patel','Rossi','Singh','Thomas','Umar',
|
||||
'Vargas','Williams','Xu','Young','Zhang','OConnor']
|
||||
PLACES = [('Chicago','IL'),('New York','NY'),('San Francisco','CA'),('Austin','TX'),
|
||||
('Seattle','WA'),('Denver','CO'),('Boston','MA'),('Atlanta','GA'),
|
||||
('Miami','FL'),('Phoenix','AZ')]
|
||||
SKILL_GROUPS = [
|
||||
['Python','AWS','Docker'],['Java','Spring','Kubernetes'],
|
||||
['React','TypeScript','Node'],['Go','PostgreSQL','gRPC'],
|
||||
['Rust','DataFusion','Parquet'],['C#','.NET','Azure'],
|
||||
['Ruby','Rails','Redis'],['Scala','Spark','Kafka'],
|
||||
['Swift','iOS','CoreData'],['Kotlin','Android','Jetpack'],
|
||||
]
|
||||
STATUSES = ['active','placed','inactive','blocked']
|
||||
STATUS_WEIGHTS = [60, 25, 10, 5]
|
||||
|
||||
with open(os.path.join(workdir, 'candidates.csv'), 'w', newline='') as f:
|
||||
w = csv.DictWriter(f, fieldnames=[
|
||||
'candidate_id','first_name','last_name','email','phone',
|
||||
'city','state','skills','years_experience','hourly_rate_usd','status'])
|
||||
w.writeheader()
|
||||
for i in range(1, 1001):
|
||||
fn, ln = random.choice(FIRST), random.choice(LAST)
|
||||
city, state = random.choice(PLACES)
|
||||
w.writerow({
|
||||
'candidate_id': f'CAND-{i:05d}',
|
||||
'first_name': fn, 'last_name': ln,
|
||||
'email': f'{fn.lower()}.{ln.lower()}{i}@example.com',
|
||||
'phone': f'({random.randint(200,999)}) {random.randint(200,999)}-{random.randint(1000,9999)}',
|
||||
'city': city, 'state': state,
|
||||
'skills': ','.join(random.choice(SKILL_GROUPS)),
|
||||
'years_experience': random.randint(0, 20),
|
||||
'hourly_rate_usd': random.randint(35, 185),
|
||||
'status': random.choices(STATUSES, weights=STATUS_WEIGHTS)[0],
|
||||
})
|
||||
|
||||
CLIENTS = ['Acme Corp','Globex','Initech','Umbrella','Wayne Enterprises',
|
||||
'Stark Industries','Tyrell','Cyberdyne','Massive Dynamic','Oscorp']
|
||||
with open(os.path.join(workdir, 'placements.ndjson'), 'w') as f:
|
||||
for i in range(1, 201):
|
||||
f.write(json.dumps({
|
||||
'placement_id': f'PLACE-{i:04d}',
|
||||
'candidate_id': f'CAND-{random.randint(1,1000):05d}',
|
||||
'client': random.choice(CLIENTS),
|
||||
'start_date': f'2026-{random.randint(1,4):02d}-{random.randint(1,28):02d}',
|
||||
'weekly_hours': random.choice([20,25,30,35,40]),
|
||||
'bill_rate': random.randint(80, 250),
|
||||
'placement_status': random.choice(['active','completed','terminated']),
|
||||
}) + '\n')
|
||||
|
||||
RESUMES = [
|
||||
'Senior Python engineer with 8 years of cloud infrastructure experience. Expert in AWS, Docker, and distributed systems design. Led migration of monolithic legacy system to microservices.',
|
||||
'Full-stack React and TypeScript developer specializing in real-time dashboards. Built financial trading interfaces. GraphQL, WebSocket, performance optimization.',
|
||||
'Data engineer with deep Apache Spark and Kafka expertise. Seven years on streaming analytics pipelines processing billions of events per day. Scala and Python.',
|
||||
'Embedded systems engineer with C++ and Rust experience. Worked on automotive ADAS systems and industrial IoT devices. Low-level firmware, RTOS.',
|
||||
'DevOps engineer with Kubernetes and Terraform expertise. Six years at hypergrowth startups. Prometheus, Grafana, and observability tooling.',
|
||||
'Machine learning engineer specializing in NLP. Built production transformer-based systems. PyTorch, Hugging Face, fine-tuning large language models.',
|
||||
'iOS developer with Swift and SwiftUI. Four years building consumer apps at mid-size tech companies. Offline-first architectures and CoreData.',
|
||||
'Backend Go developer focused on high-throughput APIs. Built payment processing systems handling millions of transactions. PostgreSQL, gRPC, Redis.',
|
||||
'Security engineer with penetration testing and threat modeling experience. OSCP certified. Web application security, AppSec code review, SAST and DAST tooling.',
|
||||
'Site reliability engineer with Linux internals and performance tuning expertise. Ten years at large-scale infrastructure. Tracing, profiling, kernel-level debugging.',
|
||||
]
|
||||
with open(os.path.join(workdir, 'resumes.ndjson'), 'w') as f:
|
||||
for i, r in enumerate(RESUMES, 1):
|
||||
f.write(json.dumps({'doc_id': f'RES-{i:03d}', 'resume_text': r}) + '\n')
|
||||
PYEOF
|
||||
|
||||
pass "candidates.csv (1000 rows, 11 cols)"
|
||||
pass "placements.ndjson (200 rows, 7 cols)"
|
||||
pass "resumes.ndjson (10 rows, 2 cols)"
|
||||
|
||||
# ============================================================
|
||||
# 2. CSV ingest
|
||||
# ============================================================
|
||||
step "2. CSV ingest (Phase 6.1)"
|
||||
|
||||
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$CAND_DS" -F "file=@$WORKDIR/candidates.csv")
|
||||
echo "$R" | python3 -c 'import sys,json; json.load(sys.stdin)' 2>/dev/null \
|
||||
|| { fail "ingest response was not JSON: $(echo "$R" | head -c 200)"; R='{}'; }
|
||||
|
||||
ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
|
||||
DEDUP=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("deduplicated","?"))' 2>/dev/null)
|
||||
DS_NAME=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("dataset_name","?"))' 2>/dev/null)
|
||||
assert_eq "$DS_NAME" "$CAND_DS" "ingest respected ?name= query param"
|
||||
assert_eq "$ROWS" "1000" "ingest rows"
|
||||
assert_eq "$DEDUP" "False" "first upload not deduplicated"
|
||||
|
||||
REG_ROWS=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS" \
|
||||
| python3 -c 'import sys,json; print(json.load(sys.stdin).get("row_count","null"))')
|
||||
assert_eq "$REG_ROWS" "1000" "manifest row_count reflects ingest"
|
||||
|
||||
# ============================================================
|
||||
# 3. NDJSON ingest
|
||||
# ============================================================
|
||||
step "3. NDJSON ingest (Phase 6.2)"
|
||||
|
||||
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$PLACE_DS" -F "file=@$WORKDIR/placements.ndjson")
|
||||
ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
|
||||
assert_eq "$ROWS" "200" "placements NDJSON ingest rows"
|
||||
|
||||
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$RESUME_DS" -F "file=@$WORKDIR/resumes.ndjson")
|
||||
ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
|
||||
assert_eq "$ROWS" "10" "resumes NDJSON ingest rows"
|
||||
|
||||
# ============================================================
|
||||
# 4. SQL queries + JOIN + cache
|
||||
# ============================================================
|
||||
step "4. SQL queries (Phase 2, Phase 8)"
|
||||
|
||||
N=$(query_scalar "SELECT COUNT(*) FROM $CAND_DS")
|
||||
assert_eq "$N" "1000" "candidates COUNT(*)"
|
||||
|
||||
N=$(query_scalar "SELECT COUNT(*) FROM $CAND_DS WHERE status = 'active'")
|
||||
if [[ "$N" =~ ^[0-9]+$ ]] && (( N > 400 && N < 700 )); then
|
||||
pass "active candidates in plausible range ($N, expect ~600)"
|
||||
else
|
||||
fail "active candidates count out of range: $N"
|
||||
fi
|
||||
|
||||
N=$(query_scalar "
|
||||
SELECT COUNT(DISTINCT c.candidate_id)
|
||||
FROM $CAND_DS c
|
||||
JOIN $PLACE_DS p ON c.candidate_id = p.candidate_id
|
||||
WHERE p.placement_status = 'active'
|
||||
")
|
||||
if [[ "$N" =~ ^[0-9]+$ ]] && (( N > 0 && N <= 200 )); then
|
||||
pass "cross-dataset JOIN with filter returns $N rows"
|
||||
else
|
||||
fail "JOIN returned unexpected count: $N"
|
||||
fi
|
||||
|
||||
AVG=$(query_scalar "SELECT AVG(hourly_rate_usd) FROM $CAND_DS")
|
||||
if python3 -c "import sys; v=float('$AVG'); sys.exit(0 if 100 < v < 130 else 1)" 2>/dev/null; then
|
||||
pass "average hourly rate in plausible range ($AVG, expect ~110)"
|
||||
else
|
||||
fail "average hourly rate out of range: $AVG"
|
||||
fi
|
||||
|
||||
CODE=$(http_code POST "/query/cache/pin" "{\"dataset\":\"$CAND_DS\"}")
|
||||
assert_eq "$CODE" "200" "cache pin HTTP"
|
||||
|
||||
# ============================================================
|
||||
# 5. Content-hash re-ingest dedup (Phase 6.4)
|
||||
# ============================================================
|
||||
step "5. Content-hash re-ingest dedup"
|
||||
|
||||
R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$CAND_DS" -F "file=@$WORKDIR/candidates.csv")
|
||||
DEDUP=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("deduplicated","?"))' 2>/dev/null)
|
||||
assert_eq "$DEDUP" "True" "re-upload same file is deduplicated"
|
||||
|
||||
# ============================================================
|
||||
# 6. Idempotent register — same fingerprint (ADR-020)
|
||||
# ============================================================
|
||||
step "6. Idempotent register (ADR-020 same-fp path)"
|
||||
|
||||
DS=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS")
|
||||
FP=$(echo "$DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["schema_fingerprint"])')
|
||||
OBJS=$(echo "$DS" | python3 -c 'import sys,json,json as j; print(j.dumps(json.load(sys.stdin)["objects"]))')
|
||||
ID_BEFORE=$(echo "$DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["id"])')
|
||||
|
||||
PAYLOAD=$(python3 -c "import json,sys; print(json.dumps({'name':sys.argv[1],'schema_fingerprint':sys.argv[2],'objects':json.loads(sys.argv[3])}))" "$CAND_DS" "$FP" "$OBJS")
|
||||
CODE=$(http_code POST "/catalog/datasets" "$PAYLOAD")
|
||||
assert_eq "$CODE" "201" "same-fp re-register returns 201"
|
||||
|
||||
ID_AFTER=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["id"])')
|
||||
assert_eq "$ID_AFTER" "$ID_BEFORE" "same DatasetId after re-register"
|
||||
|
||||
COUNT=$(curl -s "$GATEWAY/catalog/datasets" | python3 -c "import sys,json; print(sum(1 for d in json.load(sys.stdin) if d['name']=='$CAND_DS'))")
|
||||
assert_eq "$COUNT" "1" "no duplicate manifest created"
|
||||
|
||||
# ============================================================
|
||||
# 7. Schema-drift rejection (409)
|
||||
# ============================================================
|
||||
step "7. Schema-drift rejection (ADR-020 409 path)"
|
||||
|
||||
PAYLOAD=$(python3 -c "import json,sys; print(json.dumps({'name':sys.argv[1],'schema_fingerprint':'deadbeefnotmatching','objects':json.loads(sys.argv[2])}))" "$CAND_DS" "$OBJS")
|
||||
CODE=$(http_code POST "/catalog/datasets" "$PAYLOAD")
|
||||
assert_eq "$CODE" "409" "different-fp rejected with 409"
|
||||
|
||||
# ============================================================
|
||||
# 8. Dedupe no-op on clean catalog
|
||||
# ============================================================
|
||||
step "8. Dedupe no-op on clean state"
|
||||
|
||||
R=$(curl -s -X POST "$GATEWAY/catalog/dedupe")
|
||||
GROUPS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin)["groups"])')
|
||||
REMOVED=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin)["removed"])')
|
||||
assert_eq "$GROUPS" "0" "dedupe groups (clean catalog)"
|
||||
assert_eq "$REMOVED" "0" "dedupe removed count"
|
||||
|
||||
# ============================================================
|
||||
# 9. Metadata enrichment (Phase 10)
|
||||
# ============================================================
|
||||
step "9. Metadata enrichment (Phase 10)"
|
||||
|
||||
CODE=$(http_code POST "/catalog/datasets/by-name/$CAND_DS/metadata" \
|
||||
"{\"owner\":\"e2e-test\",\"description\":\"$RUN_ID synthetic candidates\",\"tags\":[\"test\",\"synthetic\"]}")
|
||||
assert_eq "$CODE" "200" "POST metadata HTTP"
|
||||
|
||||
META=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS")
|
||||
OWNER=$(echo "$META" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("owner",""))')
|
||||
assert_eq "$OWNER" "e2e-test" "owner persisted"
|
||||
|
||||
# ============================================================
|
||||
# 10. PII auto-detection (Phase 10)
|
||||
# ============================================================
|
||||
step "10. PII auto-detection (Phase 10)"
|
||||
|
||||
PII_COLS=$(echo "$META" | python3 -c '
|
||||
import sys, json
|
||||
m = json.load(sys.stdin)
|
||||
pii = [c["name"] for c in m.get("columns",[]) if c.get("is_pii") or (isinstance(c.get("sensitivity"),str) and c["sensitivity"].lower()=="pii")]
|
||||
print(" ".join(pii) if pii else "__NONE__")')
|
||||
if [[ "$PII_COLS" == *"email"* ]] && [[ "$PII_COLS" == *"phone"* ]]; then
|
||||
pass "email and phone flagged as PII ($PII_COLS)"
|
||||
elif [[ "$PII_COLS" == "__NONE__" ]]; then
|
||||
warn "no PII flagged — auto-detection may not run on this path"
|
||||
else
|
||||
warn "partial PII detection: $PII_COLS"
|
||||
fi
|
||||
|
||||
# ============================================================
|
||||
# 11. Vector index + semantic search (Phase 7)
|
||||
# ============================================================
|
||||
step "11. Vector index + semantic search (Phase 7)"
|
||||
|
||||
if [[ "$SKIP_VECTOR" == "1" ]]; then
|
||||
warn "SKIP_VECTOR=1 — skipping vector pipeline"
|
||||
else
|
||||
# Pull documents out of the ingested resumes dataset via SQL,
|
||||
# then feed to the inline /vectors/index body. This exercises
|
||||
# the query→embed integration rather than pre-canned input.
|
||||
DOCS=$(curl -s -X POST "$GATEWAY/query/sql" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d "$(python3 -c "import json; print(json.dumps({'sql': 'SELECT doc_id, resume_text FROM $RESUME_DS'}))")" \
|
||||
| python3 -c '
|
||||
import sys, json
|
||||
r = json.load(sys.stdin)
|
||||
docs = [{"id": row["doc_id"], "text": row["resume_text"]} for row in r.get("rows", [])]
|
||||
print(json.dumps(docs))')
|
||||
DOC_COUNT=$(echo "$DOCS" | python3 -c 'import sys,json; print(len(json.load(sys.stdin)))')
|
||||
assert_eq "$DOC_COUNT" "10" "pulled docs via SQL for embedding"
|
||||
|
||||
PAYLOAD=$(python3 -c "
|
||||
import json, sys
|
||||
print(json.dumps({
|
||||
'index_name': sys.argv[1],
|
||||
'source': sys.argv[2],
|
||||
'documents': json.loads(sys.argv[3]),
|
||||
'chunk_size': 500,
|
||||
'overlap': 50,
|
||||
}))" "$VEC_IDX" "$RESUME_DS" "$DOCS")
|
||||
|
||||
R=$(curl -s -X POST "$GATEWAY/vectors/index" -H 'Content-Type: application/json' -d "$PAYLOAD")
|
||||
JOB_ID=$(echo "$R" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d.get("job_id","__NONE__"))' 2>/dev/null)
|
||||
|
||||
if [[ "$JOB_ID" == "__NONE__" || -z "$JOB_ID" ]]; then
|
||||
fail "vector index job rejected: $(echo "$R" | head -c 200)"
|
||||
else
|
||||
pass "embedding job accepted (job=$JOB_ID)"
|
||||
# Poll up to 90s for 10 short resumes; Ollama cold-start can be slow.
|
||||
JOB_STATUS="unknown"
|
||||
for _ in $(seq 1 45); do
|
||||
JOB_STATUS=$(curl -s "$GATEWAY/vectors/jobs/$JOB_ID" 2>/dev/null \
|
||||
| python3 -c '
|
||||
import sys, json
|
||||
try: print(json.load(sys.stdin).get("status","?"))
|
||||
except Exception: print("?")' 2>/dev/null)
|
||||
[[ "$JOB_STATUS" == "completed" || "$JOB_STATUS" == "Completed" ]] && break
|
||||
[[ "$JOB_STATUS" == "failed" || "$JOB_STATUS" == "Failed" ]] && break
|
||||
sleep 2
|
||||
done
|
||||
|
||||
case "$JOB_STATUS" in
|
||||
completed|Completed)
|
||||
pass "embedding job completed"
|
||||
R=$(curl -s -X POST "$GATEWAY/vectors/search" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d "{\"index_name\":\"$VEC_IDX\",\"query\":\"fine-tuning large language models\",\"k\":3}")
|
||||
TOP_DOC=$(echo "$R" | python3 -c '
|
||||
import sys, json
|
||||
r = json.load(sys.stdin)
|
||||
if r.get("results"): print(r["results"][0].get("doc_id","?"))
|
||||
else: print("__NONE__")' 2>/dev/null)
|
||||
if [[ "$TOP_DOC" == "RES-006" ]]; then
|
||||
pass "top match is ML/NLP resume (semantically correct)"
|
||||
elif [[ "$TOP_DOC" == "__NONE__" ]]; then
|
||||
fail "search returned no results"
|
||||
else
|
||||
warn "top match is $TOP_DOC (expected RES-006 — ranking may vary)"
|
||||
fi ;;
|
||||
*)
|
||||
fail "embedding job did not complete (status=$JOB_STATUS)" ;;
|
||||
esac
|
||||
fi
|
||||
fi
|
||||
|
||||
# ============================================================
|
||||
# 12. Cleanup + baseline verify
|
||||
# ============================================================
|
||||
step "12. Cleanup + baseline verify"
|
||||
|
||||
cleanup
|
||||
trap - EXIT
|
||||
|
||||
ON_DISK=$(ls "$DATA_ROOT/_catalog/manifests"/*.json 2>/dev/null | wc -l | tr -d ' ')
|
||||
info "manifest files on disk now: $ON_DISK"
|
||||
|
||||
DISK_ORPHANS=0
|
||||
if compgen -G "$DATA_ROOT/_catalog/manifests/*.json" > /dev/null; then
|
||||
DISK_ORPHANS=$(grep -l "\"$RUN_ID" "$DATA_ROOT/_catalog/manifests"/*.json 2>/dev/null | wc -l | tr -d ' ')
|
||||
fi
|
||||
assert_eq "$DISK_ORPHANS" "0" "no orphan manifest files on disk for $RUN_ID"
|
||||
|
||||
LIVE_ORPHANS=$(curl -s "$GATEWAY/catalog/datasets" \
|
||||
| python3 -c "import sys,json; print(sum(1 for d in json.load(sys.stdin) if d['name'].startswith('$RUN_ID')))")
|
||||
if [[ "$LIVE_ORPHANS" != "0" ]]; then
|
||||
warn "$LIVE_ORPHANS entries linger in live registry (clears on gateway restart; on-disk is ground truth)"
|
||||
fi
|
||||
|
||||
# ============================================================
|
||||
# Summary
|
||||
# ============================================================
|
||||
ELAPSED=$(( $(date +%s) - STARTED_AT ))
|
||||
printf '\n%s─── Summary ───%s\n' "$CC_BLU" "$CC_RST"
|
||||
printf ' run_id: %s\n' "$RUN_ID"
|
||||
printf ' elapsed: %ss\n' "$ELAPSED"
|
||||
printf ' passed: %s%d%s\n' "$CC_GRN" "$PASS" "$CC_RST"
|
||||
printf ' failed: %s%d%s\n' "$CC_RED" "$FAIL" "$CC_RST"
|
||||
printf ' warnings: %s%d%s\n' "$CC_YLW" "$WARN" "$CC_RST"
|
||||
|
||||
if (( FAIL > 0 )); then
|
||||
printf '\n%sfailures:%s\n' "$CC_RED" "$CC_RST"
|
||||
for f in "${FAILURES[@]}"; do printf ' - %s\n' "$f"; done
|
||||
exit 1
|
||||
fi
|
||||
exit 0
|
||||
@ -1,104 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# lance smoke — gates the 5 /vectors/lance/* HTTP routes (search, doc,
|
||||
# index, append, migrate). Only the read paths are exercised here so a
|
||||
# CI run doesn't mutate state. Migrate + index + append have shape
|
||||
# probes (request bodies are well-formed) but ride the not-found path
|
||||
# that the 2026-05-02 audit added.
|
||||
#
|
||||
# Targets the live gateway at $LH_GATEWAY (default :3100). Uses an
|
||||
# existing on-disk Lance dataset — `workers_500k_v1` — so no
|
||||
# migration setup is needed. If the dataset is missing the smoke
|
||||
# fails loudly with a clear message.
|
||||
#
|
||||
# Surfaced 2026-05-02: the lance crates had zero tests + no smoke;
|
||||
# substrate change to lance_backend.rs would silently break the live
|
||||
# surface. This smoke is the regression gate.
|
||||
#
|
||||
# Usage:
|
||||
# ./scripts/lance_smoke.sh
|
||||
# LH_GATEWAY=http://127.0.0.1:3100 ./scripts/lance_smoke.sh
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
GATEWAY="${LH_GATEWAY:-http://127.0.0.1:3100}"
|
||||
DATASET="${LH_LANCE_DATASET:-workers_500k_v1}"
|
||||
PREFIX="$GATEWAY/vectors/lance"
|
||||
PASS=0; FAIL=0
|
||||
PROBE() { local label="$1"; shift; "$@" && { echo " ✓ $label"; PASS=$((PASS+1)); } || { echo " ✗ $label"; FAIL=$((FAIL+1)); }; }
|
||||
|
||||
echo "[lance-smoke] gateway=$GATEWAY dataset=$DATASET"
|
||||
|
||||
# ── 0. Gateway alive ─────────────────────────────────────────────
|
||||
PROBE "gateway /v1/health responds" \
|
||||
bash -c "curl -sf -m 3 $GATEWAY/v1/health -o /dev/null"
|
||||
|
||||
# ── 1. Search returns IVF_PQ results on existing dataset ────────
|
||||
# Capture curl status separately so a transport-level failure (gateway
|
||||
# down, network broken, timeout) shows up as its own probe — instead of
|
||||
# being swallowed by `|| echo '{}'` which would surface as the next jq
|
||||
# probe failing with a misleading "no method field" message. Per opus
|
||||
# INFO at lance_smoke.sh:38 from the 2026-05-02 scrum.
|
||||
RESP=$(curl -sS -m 30 -X POST "$PREFIX/search/$DATASET" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"query":"forklift operator","top_k":3}' 2>/dev/null)
|
||||
CURL_RC=$?
|
||||
PROBE "search/$DATASET curl reachable (exit 0)" \
|
||||
test "$CURL_RC" = "0"
|
||||
[ "$CURL_RC" != "0" ] && RESP='{}'
|
||||
PROBE "search/$DATASET returns top-3 lance_ivf_pq results" \
|
||||
bash -c "echo '$RESP' | jq -e '.method == \"lance_ivf_pq\" and (.results | length) == 3' >/dev/null"
|
||||
|
||||
# Capture one doc_id from those results so the next probe has something real to fetch.
|
||||
DOC_ID=$(echo "$RESP" | jq -r '.results[0].doc_id // ""')
|
||||
|
||||
# ── 2. get_doc by id returns the row ────────────────────────────
|
||||
PROBE "doc/$DATASET/<known-id> returns full row" \
|
||||
bash -c "[ -n '$DOC_ID' ] && curl -sf -m 5 '$PREFIX/doc/$DATASET/$DOC_ID' | jq -e '.row.doc_id == \"$DOC_ID\"' >/dev/null"
|
||||
|
||||
# ── 3. get_doc with bogus id returns 404 (not 500) ──────────────
|
||||
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_404.json -w '%{http_code}' \
|
||||
"$PREFIX/doc/$DATASET/W500K-NOT-A-REAL-ID-00000")
|
||||
PROBE "doc/$DATASET/<missing-id> → 404" \
|
||||
test "$STATUS" = "404"
|
||||
|
||||
# ── 4. search on missing dataset returns 404 + sanitized message ─
|
||||
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_500.json -w '%{http_code}' \
|
||||
-X POST "$PREFIX/search/no-such-dataset-${RANDOM}" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"query":"x","top_k":1}')
|
||||
BODY=$(cat /tmp/lance_smoke_500.json)
|
||||
PROBE "search/<missing> → 404 (was 500 pre-2026-05-02)" \
|
||||
test "$STATUS" = "404"
|
||||
# Assert "pattern absent" — `! grep -qE` (NOT `grep -qvE` which is unsound:
|
||||
# -v -q exits 0 if ANY line lacks the pattern, so a multi-line body containing
|
||||
# both a leak line AND any clean line would false-PASS. Caught 2026-05-02 by
|
||||
# opus scrum on the lance backend wave.)
|
||||
PROBE "search/<missing> body sanitized — no filesystem leak" \
|
||||
bash -c "! echo '$BODY' | grep -qE '/home/|/root/\.cargo/|/var/|/tmp/'"
|
||||
|
||||
# ── 5. build_index on missing dataset also sanitized ────────────
|
||||
STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_idx.json -w '%{http_code}' \
|
||||
-X POST "$PREFIX/index/no-such-dataset-${RANDOM}" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{}')
|
||||
BODY=$(cat /tmp/lance_smoke_idx.json)
|
||||
PROBE "index/<missing> body sanitized" \
|
||||
bash -c "! echo '$BODY' | grep -qE '/home/|/root/\.cargo/|/var/|/tmp/'"
|
||||
|
||||
# ── 6. append validates input shape (rejects empty rows array) ──
|
||||
STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
|
||||
-X POST "$PREFIX/append/$DATASET" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"rows":[]}')
|
||||
PROBE "append with empty rows[] → 400" \
|
||||
test "$STATUS" = "400"
|
||||
|
||||
# ── 7. migrate route is reachable (POST without body returns a real error, not 404) ──
|
||||
STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
|
||||
-X POST "$PREFIX/migrate/probe-not-real-${RANDOM}?bucket=primary" 2>/dev/null)
|
||||
# Should be 4xx (bad request shape), NOT 404 (route registered) and NOT 200.
|
||||
PROBE "migrate route registered (non-404, non-200 on empty body)" \
|
||||
bash -c "[ '$STATUS' != '404' ] && [ '$STATUS' != '200' ]"
|
||||
|
||||
echo "[lance-smoke] $PASS PASS / $FAIL FAIL"
|
||||
[ "$FAIL" -eq 0 ]
|
||||
@ -1,157 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# Production substrate smoke — single command that verifies every
|
||||
# production-critical surface end-to-end. Exits non-zero on the first
|
||||
# failure so an operator can run this before:
|
||||
# - Swapping workers_500k.parquet → real Chicago contractor data
|
||||
# - Spinning up the Asterisk voice agent against /v1/chat
|
||||
# - Running staffing inference loops via /v1/iterate
|
||||
# - Wiring the assistant against the gateway
|
||||
#
|
||||
# Usage:
|
||||
# ./scripts/production_smoke.sh
|
||||
#
|
||||
# Tunable via env:
|
||||
# GATEWAY=http://localhost:3100 # gateway base URL
|
||||
# FAIL_FAST=1 # exit on first failure (default 1)
|
||||
# VERBOSE=1 # print full responses on success too
|
||||
|
||||
set -e
|
||||
GATEWAY="${GATEWAY:-http://localhost:3100}"
|
||||
FAIL_FAST="${FAIL_FAST:-1}"
|
||||
VERBOSE="${VERBOSE:-0}"
|
||||
|
||||
PASS=0
|
||||
FAIL=0
|
||||
FAILURES=()
|
||||
|
||||
check() {
|
||||
local name="$1"
|
||||
local expected_status="$2"
|
||||
local cmd="$3"
|
||||
echo -n " [$(($PASS + $FAIL + 1))] $name ... "
|
||||
local resp
|
||||
resp=$(eval "$cmd" 2>&1) || true
|
||||
local status="${resp%%|||*}"
|
||||
local body="${resp#*|||}"
|
||||
if [ "$status" = "$expected_status" ]; then
|
||||
PASS=$((PASS + 1))
|
||||
echo "✓ ($status)"
|
||||
if [ "$VERBOSE" = "1" ]; then echo " $body" | head -3 | sed 's/^/ /'; fi
|
||||
else
|
||||
FAIL=$((FAIL + 1))
|
||||
FAILURES+=("$name: expected $expected_status, got $status")
|
||||
echo "✗ (got $status, expected $expected_status)"
|
||||
echo " $body" | head -3 | sed 's/^/ /'
|
||||
[ "$FAIL_FAST" = "1" ] && { print_summary; exit 1; }
|
||||
fi
|
||||
}
|
||||
|
||||
curl_with_status() {
|
||||
# Run curl, capture HTTP status + body, format as "status|||body"
|
||||
local args=("$@")
|
||||
curl -sS -w "\n%{http_code}" "${args[@]}" 2>&1 | awk '
|
||||
{ lines[NR]=$0 }
|
||||
END {
|
||||
status=lines[NR]
|
||||
body=""
|
||||
for (i=1; i<NR; i++) body=body lines[i] (i<NR-1?"\n":"")
|
||||
print status "|||" body
|
||||
}
|
||||
'
|
||||
}
|
||||
|
||||
print_summary() {
|
||||
echo ""
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
echo " $PASS passed · $FAIL failed"
|
||||
if [ ${#FAILURES[@]} -gt 0 ]; then
|
||||
echo " failures:"
|
||||
for f in "${FAILURES[@]}"; do echo " - $f"; done
|
||||
fi
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
}
|
||||
|
||||
echo "Production substrate smoke test against $GATEWAY"
|
||||
echo ""
|
||||
|
||||
# ─── 1. Liveness ─────────────────────────────────────────────────────
|
||||
echo "▶ Liveness"
|
||||
check "gateway /health" "200" \
|
||||
'curl_with_status -m 5 "$GATEWAY/health"'
|
||||
|
||||
# ─── 2. Operational health ──────────────────────────────────────────
|
||||
echo "▶ Operational state"
|
||||
HEALTH_RESP=$(curl -sS -m 10 "$GATEWAY/v1/health" 2>&1) || HEALTH_RESP="{}"
|
||||
WORKERS_COUNT=$(echo "$HEALTH_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('workers_count',0))" 2>/dev/null || echo 0)
|
||||
PROVIDERS_OK=$(echo "$HEALTH_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin).get('providers_configured',{}); print(sum(1 for v in d.values() if v))" 2>/dev/null || echo 0)
|
||||
echo " workers_count: $WORKERS_COUNT"
|
||||
echo " providers_configured (count): $PROVIDERS_OK"
|
||||
if [ "$WORKERS_COUNT" -lt 1 ]; then
|
||||
FAIL=$((FAIL + 1))
|
||||
FAILURES+=("workers_count=0 — parquet load failed or empty")
|
||||
echo " ✗ workers not loaded"
|
||||
[ "$FAIL_FAST" = "1" ] && { print_summary; exit 1; }
|
||||
else
|
||||
PASS=$((PASS + 1))
|
||||
echo " ✓ workers loaded"
|
||||
fi
|
||||
|
||||
# ─── 3. Truth Layer ──────────────────────────────────────────────────
|
||||
echo "▶ Truth Layer"
|
||||
check "/v1/context returns rules" "200" \
|
||||
'curl_with_status -m 10 "$GATEWAY/v1/context"'
|
||||
|
||||
# ─── 4. /v1/chat (provider=ollama) ──────────────────────────────────
|
||||
echo "▶ /v1/chat (provider=ollama, fast model)"
|
||||
check "/v1/chat ping" "200" \
|
||||
'curl_with_status -m 60 -X POST "$GATEWAY/v1/chat" \
|
||||
-H "content-type: application/json" \
|
||||
-d "{\"provider\":\"ollama\",\"model\":\"qwen3.5:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"reply: PONG\"}],\"max_tokens\":30,\"temperature\":0,\"think\":false}"'
|
||||
|
||||
# ─── 5. /v1/validate (negative + positive) ──────────────────────────
|
||||
echo "▶ /v1/validate"
|
||||
check "phantom candidate_id → 422 Consistency" "422" \
|
||||
'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
|
||||
-H "content-type: application/json" \
|
||||
-d "{\"kind\":\"fill\",\"artifact\":{\"fills\":[{\"candidate_id\":\"W-FAKE-0\",\"name\":\"Fake\"}]},\"context\":{\"target_count\":1}}"'
|
||||
|
||||
check "real worker (W-1) → 200 OK" "200" \
|
||||
'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
|
||||
-H "content-type: application/json" \
|
||||
-d "{\"kind\":\"fill\",\"artifact\":{\"fills\":[{\"candidate_id\":\"W-1\",\"name\":\"Anyone\"}]},\"context\":{\"target_count\":1}}"'
|
||||
|
||||
check "SSN in body → 422 Policy" "422" \
|
||||
'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
|
||||
-H "content-type: application/json" \
|
||||
-d "{\"kind\":\"email\",\"artifact\":{\"to\":\"a@b.com\",\"body\":\"Your SSN 123-45-6789 is on file.\"}}"'
|
||||
|
||||
# ─── 6. /v1/iterate (bounded retry loop) ───────────────────────────
|
||||
# Phantom worker → expect 422 IterateFailure with history (not 200)
|
||||
echo "▶ /v1/iterate (bounded retry)"
|
||||
check "/v1/iterate phantom → bounded fail" "422" \
|
||||
'curl_with_status -m 240 -X POST "$GATEWAY/v1/iterate" \
|
||||
-H "content-type: application/json" \
|
||||
-d "{\"kind\":\"fill\",\"provider\":\"ollama\",\"model\":\"qwen3.5:latest\",\"system\":\"Reply with ONLY: {\\\"fills\\\":[{\\\"candidate_id\\\":\\\"W-99999999\\\",\\\"name\\\":\\\"X\\\"}]}\",\"prompt\":\"emit it\",\"context\":{\"target_count\":1},\"max_iterations\":1,\"max_tokens\":200,\"temperature\":0}"'
|
||||
|
||||
# ─── 7. Doc-drift batch ─────────────────────────────────────────────
|
||||
echo "▶ Doc-drift scan"
|
||||
check "/vectors/playbook_memory/doc_drift/scan" "200" \
|
||||
'curl_with_status -m 60 -X POST "$GATEWAY/vectors/playbook_memory/doc_drift/scan"'
|
||||
|
||||
# ─── 8. Usage tracking ──────────────────────────────────────────────
|
||||
echo "▶ Usage tracking"
|
||||
USAGE=$(curl -sS -m 10 "$GATEWAY/v1/usage" 2>&1)
|
||||
USAGE_REQS=$(echo "$USAGE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('requests',0))" 2>/dev/null || echo 0)
|
||||
echo " usage.requests: $USAGE_REQS (should be > 0 if /v1/chat fired)"
|
||||
if [ "$USAGE_REQS" -ge 1 ]; then
|
||||
PASS=$((PASS + 1))
|
||||
echo " ✓ /v1/usage tracking"
|
||||
else
|
||||
FAIL=$((FAIL + 1))
|
||||
FAILURES+=("/v1/usage didn't increment after /v1/chat call")
|
||||
echo " ✗ /v1/usage didn't increment"
|
||||
fi
|
||||
|
||||
print_summary
|
||||
|
||||
[ $FAIL -eq 0 ] && exit 0 || exit 1
|
||||
@ -1,385 +0,0 @@
|
||||
"""Pipeline Lab notebook UI — served as a single HTML page.
|
||||
|
||||
Note: innerHTML usage in this file is intentional for building the UI.
|
||||
All user-supplied text is escaped through the esc() function before insertion.
|
||||
The only values rendered via innerHTML are pre-formatted HTML strings with
|
||||
escaped user content — no raw user input is ever injected unescaped.
|
||||
"""
|
||||
|
||||
from fastapi import APIRouter
|
||||
from fastapi.responses import HTMLResponse
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
def _get_lab_html() -> str:
|
||||
"""Return the Pipeline Lab HTML. Separated into a function for clarity."""
|
||||
# The HTML is a self-contained notebook UI.
|
||||
# All user-facing text is escaped via the esc() JS function.
|
||||
return r"""<!DOCTYPE html>
|
||||
<html lang="en"><head>
|
||||
<meta charset="UTF-8"><meta name="viewport" content="width=device-width,initial-scale=1.0">
|
||||
<title>Pipeline Lab — Lakehouse</title>
|
||||
<style>
|
||||
:root{--bg:#08090c;--surface:rgba(14,16,22,0.9);--border:#2a2d35;--text:#e8e6e3;--text2:#7a7872;--accent:#4ade80;--gold:#e2b55a;--red:#e05252;--blue:#5b9cf5;--purple:#c084fc}
|
||||
*{box-sizing:border-box;margin:0;padding:0}
|
||||
body{font-family:'SF Mono','Menlo','Consolas',monospace;background:var(--bg);color:var(--text);min-height:100vh;padding:20px 28px;font-size:13px}
|
||||
h1{font-size:18px;font-weight:700;margin-bottom:4px}h1 span{color:var(--accent)}
|
||||
.subtitle{color:var(--text2);font-size:11px;margin-bottom:20px}
|
||||
.cells{display:flex;flex-direction:column;gap:12px;max-width:1100px}
|
||||
.cell{background:var(--surface);border:1px solid var(--border);border-radius:4px;overflow:hidden}
|
||||
.cell.running{border-color:var(--gold)}
|
||||
.cell-header{display:flex;align-items:center;gap:8px;padding:8px 12px;border-bottom:1px solid var(--border);font-size:10px;text-transform:uppercase;letter-spacing:1px;color:var(--text2)}
|
||||
.cell-type{font-weight:700}
|
||||
.cell-time{margin-left:auto;color:var(--text2)}
|
||||
.cell-input{padding:12px;background:rgba(0,0,0,0.3)}
|
||||
.cell-input textarea{width:100%;min-height:60px;background:transparent;border:none;color:var(--text);font-family:inherit;font-size:13px;resize:vertical;outline:none;line-height:1.6}
|
||||
.cell-output{padding:12px;font-size:12px;line-height:1.6;white-space:pre-wrap;max-height:400px;overflow-y:auto;display:none}
|
||||
.cell-output.has-data{display:block;border-top:1px solid var(--border)}
|
||||
.toolbar{display:flex;gap:6px;padding:8px 12px;border-top:1px solid var(--border);flex-wrap:wrap}
|
||||
.btn{font-family:inherit;font-size:10px;text-transform:uppercase;letter-spacing:0.5px;padding:5px 12px;border:1px solid var(--border);border-radius:3px;background:transparent;color:var(--text2);cursor:pointer}
|
||||
.btn:hover{border-color:var(--accent);color:var(--accent)}
|
||||
.btn.primary{border-color:var(--accent);color:var(--accent);background:rgba(74,222,128,0.06)}
|
||||
.btn.gold{border-color:var(--gold);color:var(--gold)}
|
||||
.btn.blue{border-color:var(--blue);color:var(--blue)}
|
||||
.btn.purple{border-color:var(--purple);color:var(--purple)}
|
||||
.btn.red{border-color:var(--red);color:var(--red)}
|
||||
.top-bar{display:flex;gap:8px;margin-bottom:16px;align-items:center;flex-wrap:wrap}
|
||||
.status-bar{display:flex;gap:12px;padding:8px 12px;background:var(--surface);border:1px solid var(--border);border-radius:4px;margin-bottom:16px;font-size:10px;color:var(--text2)}
|
||||
.stat{display:flex;align-items:center;gap:4px}.stat b{color:var(--text)}
|
||||
.result-row{display:flex;gap:8px;padding:6px 8px;border-bottom:1px solid rgba(42,45,53,0.3);align-items:center;font-size:11px}
|
||||
.result-row:last-child{border-bottom:none}
|
||||
.score-bar{width:60px;height:5px;background:rgba(0,0,0,0.2);border-radius:3px;overflow:hidden}
|
||||
.score-fill{height:100%;border-radius:3px}
|
||||
.benchmark-grid{display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px}
|
||||
.bench-col{background:rgba(0,0,0,0.2);border-radius:3px;padding:10px}
|
||||
.bench-label{font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700}
|
||||
.threshold-slider{display:flex;align-items:center;gap:8px;padding:0 12px;margin:4px 0}
|
||||
.threshold-slider input[type=range]{flex:1;accent-color:var(--accent)}
|
||||
.threshold-slider .val{font-weight:700;min-width:36px;text-align:right}
|
||||
</style></head><body>
|
||||
<h1><span>Pipeline Lab</span> // Lakehouse</h1>
|
||||
<div class="subtitle">Embedding-based screening vs LLM classification — iterative experimentation</div>
|
||||
|
||||
<div class="status-bar" id="status-bar">
|
||||
<div class="stat"><span>Exemplars:</span> <b id="st-exemplars">0</b></div>
|
||||
<div class="stat"><span>Categories:</span> <b id="st-categories">0</b></div>
|
||||
<div class="stat"><span>Pipelines:</span> <b id="st-pipelines">0</b></div>
|
||||
<div class="stat" style="margin-left:auto"><span>Sidecar:</span> <b id="st-health" style="color:var(--text2)">...</b></div>
|
||||
</div>
|
||||
|
||||
<div class="top-bar">
|
||||
<button class="btn primary" onclick="addCell('exemplars')">+ Exemplars</button>
|
||||
<button class="btn gold" onclick="addCell('screen')">+ Screen</button>
|
||||
<button class="btn blue" onclick="addCell('classify')">+ Classify</button>
|
||||
<button class="btn purple" onclick="addCell('benchmark')">+ Benchmark</button>
|
||||
<button class="btn" onclick="addCell('similarity')">+ Similarity</button>
|
||||
<button class="btn" onclick="addCell('generate')">+ Generate</button>
|
||||
<button class="btn" onclick="addCell('pipeline')">+ Pipeline</button>
|
||||
<span style="flex:1"></span>
|
||||
<button class="btn red" onclick="clearCells()">Clear All</button>
|
||||
</div>
|
||||
|
||||
<div class="cells" id="cells"></div>
|
||||
|
||||
<script>
|
||||
var BASE = '';
|
||||
var cellCounter = 0;
|
||||
|
||||
function esc(t){var d=document.createElement('span');d.textContent=String(t);return d.innerHTML}
|
||||
|
||||
async function api(path, body) {
|
||||
var opts = body ? {method:'POST', headers:{'Content-Type':'application/json'}, body:JSON.stringify(body)} : {};
|
||||
var r = await fetch(BASE + '/lab' + path, opts);
|
||||
return r.json();
|
||||
}
|
||||
|
||||
async function refreshStatus() {
|
||||
try {
|
||||
var ex = await api('/exemplars');
|
||||
var pl = await api('/pipelines');
|
||||
var h = await fetch(BASE + '/health').then(function(r){return r.json()});
|
||||
document.getElementById('st-exemplars').textContent = ex.total || 0;
|
||||
document.getElementById('st-categories').textContent = Object.keys(ex.categories || {}).length;
|
||||
document.getElementById('st-pipelines').textContent = (pl.pipelines || []).length;
|
||||
document.getElementById('st-health').textContent = h.status || 'ok';
|
||||
document.getElementById('st-health').style.color = 'var(--accent)';
|
||||
} catch(e) {
|
||||
document.getElementById('st-health').textContent = 'error';
|
||||
document.getElementById('st-health').style.color = 'var(--red)';
|
||||
}
|
||||
}
|
||||
|
||||
function addCell(type) {
|
||||
var id = 'cell-' + (++cellCounter);
|
||||
var cells = document.getElementById('cells');
|
||||
var cell = document.createElement('div'); cell.className = 'cell'; cell.id = id;
|
||||
|
||||
var colors = {exemplars:'var(--accent)',screen:'var(--gold)',classify:'var(--blue)',benchmark:'var(--purple)',similarity:'var(--text2)',generate:'var(--text2)',pipeline:'var(--accent)'};
|
||||
var labels = {exemplars:'EXEMPLARS',screen:'SCREEN',classify:'CLASSIFY (LLM)',benchmark:'BENCHMARK A/B',similarity:'SIMILARITY',generate:'GENERATE',pipeline:'PIPELINE'};
|
||||
var placeholders = {
|
||||
exemplars:'Category: decision\n---\nWe decided to use Parquet for all storage\nThe team chose React over Vue\nArchitecture decision: microservices',
|
||||
screen:'Enter texts to classify via embedding similarity (one per line):\n\nWe decided to migrate to PostgreSQL\nThe weather is nice today\nArchitecture: chose event sourcing over CRUD',
|
||||
classify:'Enter texts to classify via LLM (one per line):\n\nWe decided to migrate to PostgreSQL\nThe weather is nice today',
|
||||
benchmark:'Enter texts to benchmark (one per line):\n\nWe decided to use Kubernetes for orchestration\nThe new hire starts Monday\nTechnical debt: refactor the auth module\nLunch menu looks good today',
|
||||
similarity:'Enter texts to compare pairwise (one per line):\n\nWe chose React for the frontend\nReact was selected as our UI framework\nThe database uses PostgreSQL',
|
||||
generate:'Enter a prompt for the LLM...',
|
||||
pipeline:'Pipeline name: my-extraction\n---\nscreen | threshold=0.6\nclassify\nextract | prompt=Extract the key decision and its rationale\nvalidate | dedup_threshold=0.9'
|
||||
};
|
||||
|
||||
var color = colors[type] || 'var(--text2)';
|
||||
var label = labels[type] || type.toUpperCase();
|
||||
var ph = placeholders[type] || '';
|
||||
|
||||
// Build cell using DOM methods
|
||||
var header = document.createElement('div'); header.className = 'cell-header';
|
||||
var typeSpan = document.createElement('span'); typeSpan.className = 'cell-type'; typeSpan.style.color = color; typeSpan.textContent = label; header.appendChild(typeSpan);
|
||||
var numSpan = document.createElement('span'); numSpan.textContent = 'Cell #' + cellCounter; header.appendChild(numSpan);
|
||||
var timeSpan = document.createElement('span'); timeSpan.className = 'cell-time'; timeSpan.id = id + '-time'; header.appendChild(timeSpan);
|
||||
cell.appendChild(header);
|
||||
|
||||
var inputDiv = document.createElement('div'); inputDiv.className = 'cell-input';
|
||||
var textarea = document.createElement('textarea'); textarea.id = id + '-input'; textarea.placeholder = ph; textarea.value = ph;
|
||||
inputDiv.appendChild(textarea); cell.appendChild(inputDiv);
|
||||
|
||||
if (type === 'screen' || type === 'benchmark') {
|
||||
var slider = document.createElement('div'); slider.className = 'threshold-slider';
|
||||
var slLabel = document.createElement('span'); slLabel.style.cssText = 'font-size:10px;color:var(--text2)'; slLabel.textContent = 'Threshold:'; slider.appendChild(slLabel);
|
||||
var range = document.createElement('input'); range.type = 'range'; range.min = '0.3'; range.max = '0.95'; range.step = '0.05'; range.value = '0.65'; range.id = id + '-threshold';
|
||||
var valSpan = document.createElement('span'); valSpan.className = 'val'; valSpan.textContent = '0.65';
|
||||
range.oninput = function() { valSpan.textContent = this.value; };
|
||||
slider.appendChild(range); slider.appendChild(valSpan); cell.appendChild(slider);
|
||||
}
|
||||
|
||||
var outputDiv = document.createElement('div'); outputDiv.className = 'cell-output'; outputDiv.id = id + '-output';
|
||||
cell.appendChild(outputDiv);
|
||||
|
||||
var tb = document.createElement('div'); tb.className = 'toolbar';
|
||||
var runBtn = document.createElement('button'); runBtn.className = 'btn primary'; runBtn.textContent = 'Run';
|
||||
runBtn.onclick = function() { runCell(id, type); }; tb.appendChild(runBtn);
|
||||
var rmBtn = document.createElement('button'); rmBtn.className = 'btn red'; rmBtn.textContent = 'Remove';
|
||||
rmBtn.onclick = function() { removeCell(id); }; tb.appendChild(rmBtn);
|
||||
cell.appendChild(tb);
|
||||
|
||||
cells.appendChild(cell);
|
||||
textarea.focus();
|
||||
return id;
|
||||
}
|
||||
|
||||
function removeCell(id) { var el = document.getElementById(id); if (el) el.remove(); }
|
||||
function clearCells() { document.getElementById('cells').textContent = ''; cellCounter = 0; }
|
||||
function parseLines(text) { return text.split('\n').map(function(l){return l.trim()}).filter(function(l){return l && l.charAt(0) !== '#'}); }
|
||||
|
||||
async function runCell(id, type) {
|
||||
var cell = document.getElementById(id);
|
||||
var input = document.getElementById(id+'-input').value;
|
||||
var output = document.getElementById(id+'-output');
|
||||
var timeEl = document.getElementById(id+'-time');
|
||||
cell.classList.add('running');
|
||||
output.className = 'cell-output has-data';
|
||||
output.textContent = 'Running...';
|
||||
|
||||
try {
|
||||
var t0 = performance.now();
|
||||
var result;
|
||||
|
||||
if (type === 'exemplars') {
|
||||
var parts = input.split('---');
|
||||
var catLine = (parts[0] || '').trim();
|
||||
var category = catLine.replace(/^category:\s*/i, '').trim().toLowerCase();
|
||||
var texts = parseLines(parts.slice(1).join('\n'));
|
||||
if (!category || !texts.length) { output.textContent = 'Format: Category: name\\n---\\ntext1\\ntext2'; return; }
|
||||
result = await api('/exemplars', {category: category, texts: texts});
|
||||
output.textContent = 'Added ' + result.added + ' exemplars to "' + result.category + '" (total: ' + result.total + ')';
|
||||
output.style.color = 'var(--accent)';
|
||||
refreshStatus();
|
||||
}
|
||||
|
||||
else if (type === 'screen') {
|
||||
var texts = parseLines(input);
|
||||
var threshold = parseFloat((document.getElementById(id+'-threshold') || {}).value || '0.65');
|
||||
result = await api('/screen', {texts: texts, threshold: threshold});
|
||||
renderScreenResults(output, result, threshold);
|
||||
}
|
||||
|
||||
else if (type === 'classify') {
|
||||
var texts = parseLines(input);
|
||||
result = await api('/classify', {texts: texts});
|
||||
renderClassifyResults(output, result);
|
||||
}
|
||||
|
||||
else if (type === 'benchmark') {
|
||||
var texts = parseLines(input);
|
||||
var threshold = parseFloat((document.getElementById(id+'-threshold') || {}).value || '0.65');
|
||||
result = await api('/benchmark', {texts: texts, threshold: threshold});
|
||||
renderBenchmark(output, result);
|
||||
}
|
||||
|
||||
else if (type === 'similarity') {
|
||||
var texts = parseLines(input);
|
||||
result = await api('/cell', {action:'similarity', texts: texts});
|
||||
renderSimilarityMatrix(output, result);
|
||||
}
|
||||
|
||||
else if (type === 'generate') {
|
||||
result = await api('/cell', {action:'generate', text: input});
|
||||
output.textContent = result.text || '(empty)';
|
||||
}
|
||||
|
||||
else if (type === 'pipeline') {
|
||||
var parts = input.split('---');
|
||||
var nameLine = (parts[0] || '').trim();
|
||||
var pName = nameLine.replace(/^pipeline\s*name:\s*/i, '').trim();
|
||||
var stageLines = parseLines(parts.slice(1).join('\n'));
|
||||
var stages = stageLines.map(function(line) {
|
||||
var ps = line.split('|').map(function(s){return s.trim()});
|
||||
var mode = ps[0];
|
||||
var config = {};
|
||||
ps.slice(1).forEach(function(p) {
|
||||
var kv = p.split('='); if (kv.length===2) {
|
||||
var v = kv[1].trim();
|
||||
config[kv[0].trim()] = isNaN(parseFloat(v)) ? v : parseFloat(v);
|
||||
}
|
||||
});
|
||||
return {name: mode, mode: mode, config: config};
|
||||
});
|
||||
await api('/pipelines', {name: pName, stages: stages, description: 'Created in Pipeline Lab'});
|
||||
output.textContent = 'Pipeline "' + pName + '" saved (' + stages.length + ' stages). Use the API to run it: POST /lab/pipelines/run';
|
||||
output.style.color = 'var(--accent)';
|
||||
refreshStatus();
|
||||
}
|
||||
|
||||
var elapsed = Math.round(performance.now() - t0);
|
||||
timeEl.textContent = elapsed + 'ms' + (result && result.time_ms ? ' (server: '+result.time_ms+'ms)' : '');
|
||||
} catch(e) {
|
||||
output.textContent = 'Error: ' + e.message;
|
||||
output.style.color = 'var(--red)';
|
||||
} finally {
|
||||
cell.classList.remove('running');
|
||||
}
|
||||
}
|
||||
|
||||
function renderScreenResults(el, results, threshold) {
|
||||
el.textContent = '';
|
||||
results.forEach(function(r) {
|
||||
var row = document.createElement('div'); row.className = 'result-row';
|
||||
var cat = document.createElement('span');
|
||||
cat.style.cssText = 'min-width:80px;font-weight:700;color:' + (r.above_threshold ? 'var(--accent)' : 'var(--text2)');
|
||||
cat.textContent = r.best_category || 'none'; row.appendChild(cat);
|
||||
var sim = document.createElement('span'); sim.style.cssText = 'min-width:50px;font-weight:700';
|
||||
sim.textContent = (r.similarity * 100).toFixed(1) + '%';
|
||||
sim.style.color = r.similarity >= 0.7 ? 'var(--accent)' : r.similarity >= threshold ? 'var(--gold)' : 'var(--text2)';
|
||||
row.appendChild(sim);
|
||||
var bar = document.createElement('div'); bar.className = 'score-bar';
|
||||
var fill = document.createElement('div'); fill.className = 'score-fill';
|
||||
fill.style.width = (r.similarity * 100) + '%';
|
||||
fill.style.background = r.similarity >= 0.7 ? 'var(--accent)' : r.similarity >= threshold ? 'var(--gold)' : 'var(--red)';
|
||||
bar.appendChild(fill); row.appendChild(bar);
|
||||
var txt = document.createElement('span'); txt.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
|
||||
txt.textContent = r.text; row.appendChild(txt);
|
||||
var badge = document.createElement('span');
|
||||
badge.style.cssText = 'font-size:9px;padding:2px 6px;border-radius:2px;border:1px solid;' +
|
||||
(r.above_threshold ? 'color:var(--accent);border-color:var(--accent)' : 'color:var(--text2);border-color:var(--border)');
|
||||
badge.textContent = r.above_threshold ? 'PASS' : 'FILTERED'; row.appendChild(badge);
|
||||
el.appendChild(row);
|
||||
});
|
||||
}
|
||||
|
||||
function renderClassifyResults(el, results) {
|
||||
el.textContent = '';
|
||||
results.forEach(function(r) {
|
||||
var row = document.createElement('div'); row.className = 'result-row';
|
||||
var cat = document.createElement('span'); cat.style.cssText = 'min-width:80px;font-weight:700;color:var(--blue)';
|
||||
cat.textContent = r.category; row.appendChild(cat);
|
||||
var conf = document.createElement('span');
|
||||
conf.style.cssText = 'min-width:50px;font-size:10px;color:' + (r.confidence==='high'?'var(--accent)':r.confidence==='medium'?'var(--gold)':'var(--text2)');
|
||||
conf.textContent = r.confidence; row.appendChild(conf);
|
||||
var txt = document.createElement('span'); txt.style.flex = '1'; txt.textContent = r.text; row.appendChild(txt);
|
||||
el.appendChild(row);
|
||||
});
|
||||
}
|
||||
|
||||
function renderBenchmark(el, result) {
|
||||
el.textContent = '';
|
||||
// Summary stats (using safe DOM construction)
|
||||
var summary = document.createElement('div'); summary.style.cssText = 'display:flex;gap:16px;margin-bottom:12px;flex-wrap:wrap';
|
||||
var stats = [
|
||||
['Agreement', (result.agreement_rate*100).toFixed(1)+'%', result.agreement_rate>=0.8?'var(--accent)':'var(--gold)'],
|
||||
['Speedup', result.speedup+'x', result.speedup>=2?'var(--accent)':'var(--text)'],
|
||||
['Embed', result.embed_time_ms+'ms', 'var(--gold)'],
|
||||
['LLM', result.llm_time_ms+'ms', 'var(--blue)'],
|
||||
['Hybrid est.', result.hybrid_estimated_ms+'ms', 'var(--accent)'],
|
||||
['Screened out', result.texts_screened_out+'/'+result.total_texts, 'var(--purple)']
|
||||
];
|
||||
stats.forEach(function(s) {
|
||||
var box = document.createElement('div'); box.style.cssText = 'background:rgba(0,0,0,0.2);padding:6px 10px;border-radius:3px;text-align:center';
|
||||
var lbl = document.createElement('div'); lbl.style.cssText = 'font-size:9px;color:var(--text2);text-transform:uppercase;letter-spacing:0.5px'; lbl.textContent = s[0]; box.appendChild(lbl);
|
||||
var val = document.createElement('div'); val.style.cssText = 'font-size:16px;font-weight:700;color:'+s[2]; val.textContent = s[1]; box.appendChild(val);
|
||||
summary.appendChild(box);
|
||||
});
|
||||
el.appendChild(summary);
|
||||
// Side-by-side comparison
|
||||
var grid = document.createElement('div'); grid.style.cssText = 'display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px';
|
||||
// Embed column
|
||||
var leftCol = document.createElement('div'); leftCol.style.cssText = 'background:rgba(0,0,0,0.2);border-radius:3px;padding:10px';
|
||||
var leftTitle = document.createElement('div'); leftTitle.style.cssText = 'font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700;color:var(--gold)';
|
||||
leftTitle.textContent = 'EMBEDDING SCREENING (' + result.embed_time_ms + 'ms)'; leftCol.appendChild(leftTitle);
|
||||
(result.embed_results||[]).forEach(function(r) {
|
||||
var row = document.createElement('div'); row.style.cssText = 'font-size:11px;padding:3px 0;display:flex;gap:6px;align-items:center';
|
||||
var c = document.createElement('span'); c.style.cssText = 'min-width:60px;font-weight:700;color:'+(r.above_threshold?'var(--accent)':'var(--text2)'); c.textContent = r.best_category||'none'; row.appendChild(c);
|
||||
var s = document.createElement('span'); s.style.cssText = 'min-width:40px;color:var(--text2)'; s.textContent = (r.similarity*100).toFixed(0)+'%'; row.appendChild(s);
|
||||
var t = document.createElement('span'); t.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap'; t.textContent = r.text; row.appendChild(t);
|
||||
leftCol.appendChild(row);
|
||||
});
|
||||
grid.appendChild(leftCol);
|
||||
// LLM column
|
||||
var rightCol = document.createElement('div'); rightCol.style.cssText = 'background:rgba(0,0,0,0.2);border-radius:3px;padding:10px';
|
||||
var rightTitle = document.createElement('div'); rightTitle.style.cssText = 'font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700;color:var(--blue)';
|
||||
rightTitle.textContent = 'LLM CLASSIFICATION (' + result.llm_time_ms + 'ms)'; rightCol.appendChild(rightTitle);
|
||||
(result.llm_results||[]).forEach(function(r) {
|
||||
var row = document.createElement('div'); row.style.cssText = 'font-size:11px;padding:3px 0;display:flex;gap:6px;align-items:center';
|
||||
var c = document.createElement('span'); c.style.cssText = 'min-width:60px;font-weight:700;color:var(--blue)'; c.textContent = r.category; row.appendChild(c);
|
||||
var s = document.createElement('span'); s.style.cssText = 'min-width:40px;color:'+(r.confidence==='high'?'var(--accent)':'var(--text2)'); s.textContent = r.confidence; row.appendChild(s);
|
||||
var t = document.createElement('span'); t.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap'; t.textContent = r.text; row.appendChild(t);
|
||||
rightCol.appendChild(row);
|
||||
});
|
||||
grid.appendChild(rightCol);
|
||||
el.appendChild(grid);
|
||||
}
|
||||
|
||||
function renderSimilarityMatrix(el, result) {
|
||||
el.textContent = '';
|
||||
var matrix = result.matrix || [];
|
||||
var texts = result.texts || [];
|
||||
if (!matrix.length) { el.textContent = 'No results'; return; }
|
||||
var tbl = document.createElement('table'); tbl.style.cssText = 'border-collapse:collapse;font-size:11px;width:100%';
|
||||
var hdr = document.createElement('tr');
|
||||
var corner = document.createElement('th'); hdr.appendChild(corner);
|
||||
texts.forEach(function(t) {
|
||||
var th = document.createElement('th'); th.style.cssText = 'padding:4px;color:var(--text2);font-size:9px;max-width:100px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
|
||||
th.textContent = t.substring(0, 20); th.title = t; hdr.appendChild(th);
|
||||
});
|
||||
tbl.appendChild(hdr);
|
||||
matrix.forEach(function(row, i) {
|
||||
var tr = document.createElement('tr');
|
||||
var td0 = document.createElement('td'); td0.style.cssText = 'padding:4px;color:var(--text2);font-size:9px;max-width:100px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
|
||||
td0.textContent = texts[i].substring(0, 20); tr.appendChild(td0);
|
||||
row.forEach(function(v, j) {
|
||||
var td = document.createElement('td');
|
||||
var bg = i===j ? 'rgba(74,222,128,0.1)' : v>=0.8 ? 'rgba(74,222,128,0.15)' : v>=0.6 ? 'rgba(226,181,90,0.1)' : 'transparent';
|
||||
td.style.cssText = 'padding:4px;text-align:center;font-weight:'+(v>=0.7?'700':'400')+';color:'+(v>=0.8?'var(--accent)':v>=0.6?'var(--gold)':'var(--text2)')+';background:'+bg;
|
||||
td.textContent = v.toFixed(2); tr.appendChild(td);
|
||||
});
|
||||
tbl.appendChild(tr);
|
||||
});
|
||||
el.appendChild(tbl);
|
||||
}
|
||||
|
||||
refreshStatus();
|
||||
</script>
|
||||
</body></html>"""
|
||||
|
||||
|
||||
@router.get("", response_class=HTMLResponse)
|
||||
async def lab_page():
|
||||
return _get_lab_html()
|
||||
@ -1,503 +0,0 @@
|
||||
"""Pipeline Lab — iterative embedding/LLM pipeline experimentation.
|
||||
|
||||
Provides:
|
||||
- Exemplar-based embedding classification (fast screening)
|
||||
- LLM-based classification (accurate but slow)
|
||||
- A/B benchmarking between the two
|
||||
- Pipeline definition and execution
|
||||
- Notebook-style API for interactive experimentation
|
||||
"""
|
||||
|
||||
import json
|
||||
import math
|
||||
import os
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from fastapi.responses import HTMLResponse
|
||||
from pydantic import BaseModel
|
||||
|
||||
from .ollama import client
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
EMBED_MODEL = os.environ.get("EMBED_MODEL", "nomic-embed-text")
|
||||
GEN_MODEL = os.environ.get("GEN_MODEL", "qwen2.5")
|
||||
LAB_DIR = Path(os.environ.get("LAB_DIR", "./data/_pipeline_lab"))
|
||||
LAB_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
# ─── Vector math ─────────────────────────────────────────────
|
||||
|
||||
def cosine_similarity(a: list[float], b: list[float]) -> float:
|
||||
dot = sum(x * y for x, y in zip(a, b))
|
||||
norm_a = math.sqrt(sum(x * x for x in a))
|
||||
norm_b = math.sqrt(sum(x * x for x in b))
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
# ─── Exemplar store ──────────────────────────────────────────
|
||||
# Exemplars are labeled text+embedding pairs used for classification.
|
||||
# e.g. category="decision" texts=["We decided to use Parquet", "The team chose React"]
|
||||
|
||||
_exemplars: dict[str, list[dict]] = {} # category -> [{text, embedding}]
|
||||
|
||||
|
||||
def _exemplar_file() -> Path:
|
||||
return LAB_DIR / "exemplars.json"
|
||||
|
||||
|
||||
def _load_exemplars():
|
||||
global _exemplars
|
||||
fp = _exemplar_file()
|
||||
if fp.exists():
|
||||
data = json.loads(fp.read_text())
|
||||
_exemplars = data
|
||||
return _exemplars
|
||||
|
||||
|
||||
def _save_exemplars():
|
||||
_exemplar_file().write_text(json.dumps(_exemplars, indent=2))
|
||||
|
||||
|
||||
_load_exemplars()
|
||||
|
||||
|
||||
# ─── Pipeline store ──────────────────────────────────────────
|
||||
|
||||
def _pipelines_dir() -> Path:
|
||||
d = LAB_DIR / "pipelines"
|
||||
d.mkdir(exist_ok=True)
|
||||
return d
|
||||
|
||||
|
||||
# ─── Embedding helper ────────────────────────────────────────
|
||||
|
||||
async def _embed_texts(texts: list[str], model: str = EMBED_MODEL) -> list[list[float]]:
|
||||
embeddings = []
|
||||
async with client() as c:
|
||||
for text in texts:
|
||||
resp = await c.post("/api/embed", json={"model": model, "input": text})
|
||||
if resp.status_code != 200:
|
||||
raise HTTPException(502, f"Ollama embed error: {resp.text}")
|
||||
data = resp.json()
|
||||
embeddings.extend(data.get("embeddings", []))
|
||||
return embeddings
|
||||
|
||||
|
||||
async def _generate(prompt: str, model: str = GEN_MODEL, temperature: float = 0.3) -> str:
|
||||
async with client() as c:
|
||||
resp = await c.post("/api/generate", json={
|
||||
"model": model, "prompt": prompt, "stream": False,
|
||||
"options": {"temperature": temperature, "num_predict": 1024}
|
||||
})
|
||||
if resp.status_code != 200:
|
||||
raise HTTPException(502, f"Ollama generate error: {resp.text}")
|
||||
return resp.json().get("response", "")
|
||||
|
||||
|
||||
# ─── API: Exemplars ──────────────────────────────────────────
|
||||
|
||||
class ExemplarAdd(BaseModel):
|
||||
category: str
|
||||
texts: list[str]
|
||||
|
||||
|
||||
class ExemplarList(BaseModel):
|
||||
categories: dict[str, int] # category -> count
|
||||
|
||||
|
||||
@router.post("/exemplars")
|
||||
async def add_exemplars(req: ExemplarAdd):
|
||||
"""Add labeled exemplar texts for a category. Embeddings generated automatically."""
|
||||
category = req.category.strip().lower()
|
||||
if not category or not req.texts:
|
||||
raise HTTPException(400, "category and texts required")
|
||||
|
||||
embeddings = await _embed_texts(req.texts)
|
||||
|
||||
if category not in _exemplars:
|
||||
_exemplars[category] = []
|
||||
|
||||
for text, emb in zip(req.texts, embeddings):
|
||||
_exemplars[category].append({"text": text, "embedding": emb})
|
||||
|
||||
_save_exemplars()
|
||||
return {"ok": True, "category": category, "added": len(req.texts),
|
||||
"total": len(_exemplars[category])}
|
||||
|
||||
|
||||
@router.get("/exemplars")
|
||||
async def list_exemplars():
|
||||
"""List all exemplar categories and counts."""
|
||||
return {"categories": {k: len(v) for k, v in _exemplars.items()},
|
||||
"total": sum(len(v) for v in _exemplars.values())}
|
||||
|
||||
|
||||
@router.delete("/exemplars/{category}")
|
||||
async def delete_exemplar_category(category: str):
|
||||
if category in _exemplars:
|
||||
del _exemplars[category]
|
||||
_save_exemplars()
|
||||
return {"ok": True}
|
||||
|
||||
|
||||
# ─── API: Screen (embedding-based classification) ────────────
|
||||
|
||||
class ScreenRequest(BaseModel):
|
||||
texts: list[str]
|
||||
threshold: float = 0.65
|
||||
top_k: int = 1
|
||||
|
||||
|
||||
class ScreenResult(BaseModel):
|
||||
text: str
|
||||
best_category: str | None
|
||||
similarity: float
|
||||
above_threshold: bool
|
||||
all_scores: dict[str, float]
|
||||
|
||||
|
||||
@router.post("/screen", response_model=list[ScreenResult])
|
||||
async def screen_texts(req: ScreenRequest):
|
||||
"""Classify texts by cosine similarity to exemplar embeddings (fast path)."""
|
||||
if not _exemplars:
|
||||
raise HTTPException(400, "No exemplars defined. Add exemplars first.")
|
||||
|
||||
embeddings = await _embed_texts(req.texts)
|
||||
results = []
|
||||
|
||||
for text, emb in zip(req.texts, embeddings):
|
||||
category_scores = {}
|
||||
for category, exemplar_list in _exemplars.items():
|
||||
sims = [cosine_similarity(emb, ex["embedding"]) for ex in exemplar_list]
|
||||
category_scores[category] = max(sims) if sims else 0.0
|
||||
|
||||
best_cat = max(category_scores, key=category_scores.get) if category_scores else None
|
||||
best_sim = category_scores.get(best_cat, 0.0) if best_cat else 0.0
|
||||
|
||||
results.append(ScreenResult(
|
||||
text=text[:200],
|
||||
best_category=best_cat if best_sim >= req.threshold else None,
|
||||
similarity=round(best_sim, 4),
|
||||
above_threshold=best_sim >= req.threshold,
|
||||
all_scores={k: round(v, 4) for k, v in sorted(category_scores.items(),
|
||||
key=lambda x: x[1], reverse=True)},
|
||||
))
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ─── API: Classify (LLM-based classification) ────────────────
|
||||
|
||||
class ClassifyRequest(BaseModel):
|
||||
texts: list[str]
|
||||
categories: list[str] | None = None # if None, use exemplar category names
|
||||
model: str | None = None
|
||||
|
||||
|
||||
class ClassifyResult(BaseModel):
|
||||
text: str
|
||||
category: str
|
||||
confidence: str
|
||||
reasoning: str
|
||||
|
||||
|
||||
@router.post("/classify", response_model=list[ClassifyResult])
|
||||
async def classify_texts(req: ClassifyRequest):
|
||||
"""Classify texts using LLM (slow but accurate path)."""
|
||||
categories = req.categories or list(_exemplars.keys())
|
||||
if not categories:
|
||||
raise HTTPException(400, "No categories. Provide categories or add exemplars.")
|
||||
|
||||
model = req.model or GEN_MODEL
|
||||
results = []
|
||||
|
||||
for text in req.texts:
|
||||
prompt = (
|
||||
f"Classify this text into exactly ONE of these categories: {', '.join(categories)}\n\n"
|
||||
f"TEXT: {text[:500]}\n\n"
|
||||
f"Respond with JSON: {{\"category\": \"...\", \"confidence\": \"high|medium|low\", "
|
||||
f"\"reasoning\": \"one sentence\"}}"
|
||||
)
|
||||
raw = await _generate(prompt, model=model, temperature=0.1)
|
||||
|
||||
# Parse
|
||||
try:
|
||||
j_s, j_e = raw.find("{"), raw.rfind("}") + 1
|
||||
parsed = json.loads(raw[j_s:j_e]) if j_s >= 0 and j_e > j_s else {}
|
||||
except Exception:
|
||||
parsed = {}
|
||||
|
||||
results.append(ClassifyResult(
|
||||
text=text[:200],
|
||||
category=parsed.get("category", "unknown"),
|
||||
confidence=parsed.get("confidence", "low"),
|
||||
reasoning=parsed.get("reasoning", raw[:200]),
|
||||
))
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ─── API: Benchmark (A/B comparison) ─────────────────────────
|
||||
|
||||
class BenchmarkRequest(BaseModel):
|
||||
texts: list[str]
|
||||
threshold: float = 0.65
|
||||
model: str | None = None
|
||||
|
||||
|
||||
class BenchmarkResult(BaseModel):
|
||||
total_texts: int
|
||||
# Embedding path
|
||||
embed_time_ms: int
|
||||
embed_results: list[dict]
|
||||
# LLM path
|
||||
llm_time_ms: int
|
||||
llm_results: list[dict]
|
||||
# Comparison
|
||||
agreement_rate: float
|
||||
speedup: float
|
||||
texts_screened_out: int
|
||||
texts_needing_llm: int
|
||||
hybrid_estimated_ms: int
|
||||
|
||||
|
||||
@router.post("/benchmark", response_model=BenchmarkResult)
|
||||
async def benchmark(req: BenchmarkRequest):
|
||||
"""Run same texts through embedding screening and LLM classification. Compare."""
|
||||
if not _exemplars:
|
||||
raise HTTPException(400, "No exemplars. Add exemplars first.")
|
||||
|
||||
categories = list(_exemplars.keys())
|
||||
|
||||
# Embedding path
|
||||
t0 = time.monotonic()
|
||||
embed_results = await screen_texts(ScreenRequest(
|
||||
texts=req.texts, threshold=req.threshold
|
||||
))
|
||||
embed_ms = int((time.monotonic() - t0) * 1000)
|
||||
|
||||
# LLM path
|
||||
t0 = time.monotonic()
|
||||
llm_results = await classify_texts(ClassifyRequest(
|
||||
texts=req.texts, categories=categories, model=req.model
|
||||
))
|
||||
llm_ms = int((time.monotonic() - t0) * 1000)
|
||||
|
||||
# Compare
|
||||
agreements = 0
|
||||
screened_out = 0
|
||||
for er, lr in zip(embed_results, llm_results):
|
||||
if not er.above_threshold:
|
||||
screened_out += 1
|
||||
if er.best_category == lr.category:
|
||||
agreements += 1
|
||||
|
||||
needing_llm = len(req.texts) - screened_out
|
||||
# Hybrid estimate: embed all + LLM only the uncertain ones
|
||||
per_text_embed_ms = embed_ms / max(len(req.texts), 1)
|
||||
per_text_llm_ms = llm_ms / max(len(req.texts), 1)
|
||||
hybrid_ms = int(embed_ms + needing_llm * per_text_llm_ms)
|
||||
|
||||
return BenchmarkResult(
|
||||
total_texts=len(req.texts),
|
||||
embed_time_ms=embed_ms,
|
||||
embed_results=[r.model_dump() for r in embed_results],
|
||||
llm_time_ms=llm_ms,
|
||||
llm_results=[r.model_dump() for r in llm_results],
|
||||
agreement_rate=round(agreements / max(len(req.texts), 1), 3),
|
||||
speedup=round(llm_ms / max(hybrid_ms, 1), 2),
|
||||
texts_screened_out=screened_out,
|
||||
texts_needing_llm=needing_llm,
|
||||
hybrid_estimated_ms=hybrid_ms,
|
||||
)
|
||||
|
||||
|
||||
# ─── API: Pipeline definition & execution ────────────────────
|
||||
|
||||
class PipelineStage(BaseModel):
|
||||
name: str
|
||||
mode: str # "screen", "classify", "extract", "validate", "custom"
|
||||
config: dict = {} # stage-specific config (threshold, prompt, etc.)
|
||||
|
||||
|
||||
class PipelineDef(BaseModel):
|
||||
name: str
|
||||
stages: list[PipelineStage]
|
||||
description: str = ""
|
||||
|
||||
|
||||
class PipelineRunRequest(BaseModel):
|
||||
pipeline_name: str
|
||||
texts: list[str]
|
||||
|
||||
|
||||
@router.post("/pipelines")
|
||||
async def save_pipeline(pipeline: PipelineDef):
|
||||
"""Save a pipeline definition."""
|
||||
fp = _pipelines_dir() / f"{pipeline.name}.json"
|
||||
fp.write_text(pipeline.model_dump_json(indent=2))
|
||||
return {"ok": True, "name": pipeline.name}
|
||||
|
||||
|
||||
@router.get("/pipelines")
|
||||
async def list_pipelines():
|
||||
"""List saved pipeline definitions."""
|
||||
pipelines = []
|
||||
for fp in _pipelines_dir().glob("*.json"):
|
||||
try:
|
||||
data = json.loads(fp.read_text())
|
||||
pipelines.append({"name": data["name"], "stages": len(data["stages"]),
|
||||
"description": data.get("description", "")})
|
||||
except Exception:
|
||||
pass
|
||||
return {"pipelines": pipelines}
|
||||
|
||||
|
||||
@router.get("/pipelines/{name}")
|
||||
async def get_pipeline(name: str):
|
||||
fp = _pipelines_dir() / f"{name}.json"
|
||||
if not fp.exists():
|
||||
raise HTTPException(404, "Pipeline not found")
|
||||
return json.loads(fp.read_text())
|
||||
|
||||
|
||||
@router.post("/pipelines/run")
|
||||
async def run_pipeline(req: PipelineRunRequest):
|
||||
"""Execute a pipeline on a set of texts. Returns per-stage results and timing."""
|
||||
fp = _pipelines_dir() / f"{req.pipeline_name}.json"
|
||||
if not fp.exists():
|
||||
raise HTTPException(404, f"Pipeline '{req.pipeline_name}' not found")
|
||||
|
||||
pipeline = json.loads(fp.read_text())
|
||||
results = {"pipeline": req.pipeline_name, "stages": [], "total_ms": 0}
|
||||
current_texts = req.texts[:]
|
||||
|
||||
for stage_def in pipeline["stages"]:
|
||||
stage_name = stage_def["name"]
|
||||
mode = stage_def["mode"]
|
||||
config = stage_def.get("config", {})
|
||||
t0 = time.monotonic()
|
||||
stage_result = {"name": stage_name, "mode": mode, "input_count": len(current_texts)}
|
||||
|
||||
if mode == "screen":
|
||||
threshold = config.get("threshold", 0.65)
|
||||
screen_res = await screen_texts(ScreenRequest(
|
||||
texts=current_texts, threshold=threshold
|
||||
))
|
||||
passed = [r for r in screen_res if r.above_threshold]
|
||||
stage_result["output_count"] = len(passed)
|
||||
stage_result["filtered_out"] = len(current_texts) - len(passed)
|
||||
stage_result["results"] = [r.model_dump() for r in screen_res]
|
||||
# Pass only above-threshold texts to next stage
|
||||
current_texts = [r.text for r in screen_res if r.above_threshold]
|
||||
|
||||
elif mode == "classify":
|
||||
cls_res = await classify_texts(ClassifyRequest(
|
||||
texts=current_texts,
|
||||
categories=config.get("categories"),
|
||||
model=config.get("model"),
|
||||
))
|
||||
stage_result["output_count"] = len(cls_res)
|
||||
stage_result["results"] = [r.model_dump() for r in cls_res]
|
||||
|
||||
elif mode == "extract":
|
||||
extract_prompt = config.get("prompt", "Extract key information from this text:")
|
||||
extractions = []
|
||||
for text in current_texts:
|
||||
raw = await _generate(f"{extract_prompt}\n\nTEXT: {text[:800]}")
|
||||
extractions.append({"text": text[:200], "extracted": raw})
|
||||
stage_result["output_count"] = len(extractions)
|
||||
stage_result["results"] = extractions
|
||||
|
||||
elif mode == "validate":
|
||||
# Embedding-based dedup: find near-duplicate results
|
||||
if len(current_texts) > 1:
|
||||
embs = await _embed_texts(current_texts)
|
||||
dupes = []
|
||||
threshold = config.get("dedup_threshold", 0.92)
|
||||
for i in range(len(embs)):
|
||||
for j in range(i + 1, len(embs)):
|
||||
sim = cosine_similarity(embs[i], embs[j])
|
||||
if sim >= threshold:
|
||||
dupes.append({"i": i, "j": j, "similarity": round(sim, 4),
|
||||
"text_a": current_texts[i][:100],
|
||||
"text_b": current_texts[j][:100]})
|
||||
stage_result["duplicates_found"] = len(dupes)
|
||||
stage_result["results"] = dupes
|
||||
else:
|
||||
stage_result["duplicates_found"] = 0
|
||||
stage_result["results"] = []
|
||||
stage_result["output_count"] = len(current_texts)
|
||||
|
||||
else:
|
||||
stage_result["error"] = f"Unknown mode: {mode}"
|
||||
stage_result["output_count"] = len(current_texts)
|
||||
|
||||
stage_ms = int((time.monotonic() - t0) * 1000)
|
||||
stage_result["time_ms"] = stage_ms
|
||||
results["stages"].append(stage_result)
|
||||
results["total_ms"] += stage_ms
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ─── API: REPL cell (free-form eval) ─────────────────────────
|
||||
|
||||
class CellRequest(BaseModel):
|
||||
action: str # "embed", "generate", "similarity", "screen", "classify"
|
||||
text: str = ""
|
||||
texts: list[str] = []
|
||||
params: dict = {}
|
||||
|
||||
|
||||
@router.post("/cell")
|
||||
async def run_cell(req: CellRequest):
|
||||
"""Execute a single notebook cell. Flexible entry point for ad-hoc operations."""
|
||||
t0 = time.monotonic()
|
||||
result = {}
|
||||
|
||||
if req.action == "embed":
|
||||
texts = req.texts or ([req.text] if req.text else [])
|
||||
embs = await _embed_texts(texts)
|
||||
result = {"embeddings_count": len(embs), "dimensions": len(embs[0]) if embs else 0,
|
||||
"texts": texts}
|
||||
|
||||
elif req.action == "generate":
|
||||
text = await _generate(req.text, **{k: v for k, v in req.params.items()
|
||||
if k in ("model", "temperature")})
|
||||
result = {"text": text}
|
||||
|
||||
elif req.action == "similarity":
|
||||
if len(req.texts) < 2:
|
||||
raise HTTPException(400, "Need at least 2 texts for similarity")
|
||||
embs = await _embed_texts(req.texts)
|
||||
matrix = []
|
||||
for i in range(len(embs)):
|
||||
row = []
|
||||
for j in range(len(embs)):
|
||||
row.append(round(cosine_similarity(embs[i], embs[j]), 4))
|
||||
matrix.append(row)
|
||||
result = {"matrix": matrix, "texts": [t[:80] for t in req.texts]}
|
||||
|
||||
elif req.action == "screen":
|
||||
texts = req.texts or ([req.text] if req.text else [])
|
||||
threshold = req.params.get("threshold", 0.65)
|
||||
res = await screen_texts(ScreenRequest(texts=texts, threshold=threshold))
|
||||
result = {"results": [r.model_dump() for r in res]}
|
||||
|
||||
elif req.action == "classify":
|
||||
texts = req.texts or ([req.text] if req.text else [])
|
||||
res = await classify_texts(ClassifyRequest(texts=texts))
|
||||
result = {"results": [r.model_dump() for r in res]}
|
||||
|
||||
else:
|
||||
raise HTTPException(400, f"Unknown action: {req.action}")
|
||||
|
||||
result["time_ms"] = int((time.monotonic() - t0) * 1000)
|
||||
return result
|
||||
@ -1,90 +0,0 @@
|
||||
# PRD: Chicago Permit Staffing Recommendation
|
||||
|
||||
## Mission
|
||||
|
||||
You are a staffing-intelligence assistant. Your job is to **analyze a Chicago building permit and produce a one-page staffing recommendation** for our staffing company.
|
||||
|
||||
The output is a markdown document that a human staffing coordinator will read in under 2 minutes to decide whether to pursue the contract for staffing fit.
|
||||
|
||||
## Critical rules
|
||||
|
||||
1. **DO NOT START WRITING THE FINAL ANALYSIS YET.**
|
||||
- First, READ this PRD fully.
|
||||
- Then, PLAN your approach in `note()` — what steps will you take, what tools will you call, what evidence will you need.
|
||||
- Only after planning, begin executing.
|
||||
|
||||
2. **Never invent facts.** If you don't have evidence for a claim (from a tool call), do not make the claim. Say "no evidence available" instead.
|
||||
|
||||
3. **Cite your sources.** Every factual claim in the final output should reference either:
|
||||
- The permit data you read (cite the permit ID)
|
||||
- A matrix-retrieved chunk (cite as `[matrix:source:doc_id]`)
|
||||
|
||||
4. **Stay focused.** This is a one-page deliverable, not a research paper. Aim for 600-1000 words total.
|
||||
|
||||
## Tools available
|
||||
|
||||
- `list_permits(min_cost?: number, permit_type?: string)` — list permits matching filter; default returns top 5 by cost
|
||||
- `read_permit(permit_id: string)` — get full details for one permit
|
||||
- `query_matrix(query: string, top_k?: number)` — search the knowledge base for relevant context (contractor entities, prior permits, SEC tickers, LLM team patterns)
|
||||
- `note(text: string)` — append to your working scratchpad (visible to you across iterations)
|
||||
- `read_scratchpad()` — read your full scratchpad
|
||||
- `done(summary: string)` — finish; pass your final markdown analysis as `summary`
|
||||
|
||||
## Required output structure
|
||||
|
||||
When you call `done(summary=...)`, the summary should contain:
|
||||
|
||||
```markdown
|
||||
# Staffing Recommendation: Permit <ID>
|
||||
|
||||
## Permit Summary
|
||||
[2-3 sentences: type, cost, address, scope of work]
|
||||
|
||||
## Contractor Profile
|
||||
[What we know about the contractor(s) from matrix evidence. If no matrix hits, say so explicitly.]
|
||||
|
||||
## Staffing Implications
|
||||
[What trades + headcount this permit implies. Ground in the work description.]
|
||||
|
||||
## Risk Signals
|
||||
[Any matrix hits suggesting caution: debarment, prior incidents, low-quality history. If none, say so.]
|
||||
|
||||
## Recommendation
|
||||
[Pursue / Pass / Investigate-Further, with one-sentence rationale.]
|
||||
```
|
||||
|
||||
## Example workflow (do not copy verbatim)
|
||||
|
||||
1. Note your plan: "I will list 5 mid-range permits, pick one with a private contractor, read it fully, query the matrix for the contractor name, then write the recommendation."
|
||||
2. Call `list_permits(min_cost=100000)` → see candidates
|
||||
3. **PICK A PERMIT WITH A PRIVATE CONTRACTOR (a person's name or a private LLC), NOT a government agency** like CDOT, City of Chicago, etc. Government permits have no useful contractor profile to recommend on.
|
||||
4. `read_permit(id)` → see all fields
|
||||
5. Call `query_matrix("<contractor name> contractor Chicago renovation")` → see what the matrix has
|
||||
6. Note any evidence found, gaps, surprises
|
||||
7. Call `done(summary="<final markdown>")`
|
||||
|
||||
## Success criteria
|
||||
|
||||
- You called `done()` with a summary that follows the required structure
|
||||
- Every factual claim has a source (permit ID or matrix citation)
|
||||
- Total output is 600-1000 words
|
||||
- You did not invent contractor names, prior incidents, or capabilities
|
||||
- Plan was noted BEFORE execution started
|
||||
|
||||
## What "good" looks like
|
||||
|
||||
- Plan is concrete (which permit, which queries)
|
||||
- Matrix queries are specific (contractor name + work type, not "find anything about this")
|
||||
- When matrix returns nothing useful, you say so honestly
|
||||
- Recommendation reflects the actual evidence, not boilerplate
|
||||
|
||||
## What "bad" looks like
|
||||
|
||||
- Skipping the plan and jumping to execution
|
||||
- Making up contractor histories with no matrix evidence
|
||||
- Generic recommendations that don't reference the actual permit
|
||||
- Walls of text or structured padding to look thorough
|
||||
|
||||
## Begin
|
||||
|
||||
Start by acknowledging you've read this PRD and noting your plan via `note()`. Then proceed.
|
||||
@ -1,404 +0,0 @@
|
||||
// Compounding Stress Battery — the rigorous smoke test.
|
||||
//
|
||||
// Three iterations against /v1/respond, each running:
|
||||
// α baseline (3 easy tasks) — should complete local-only with boost
|
||||
// β drift (3 niche tasks) — forces executor miss → overseer fires
|
||||
// γ impossible (2 zero-supply) — must fail honestly, no token explosion
|
||||
// δ distill outcomes — writes distilled_*.jsonl + vector indexes
|
||||
// ε overseer meta-review — gpt-oss:120b judges the iteration
|
||||
// ζ scrum judgment — gpt-oss:120b reviews overseer proposals
|
||||
//
|
||||
// Iteration N+1 runs the same tasks as iteration N. We measure compounding:
|
||||
// does turns_per_task drop? does overseer_called_rate drop? does
|
||||
// correction_effective rise? If 3/5 metrics trend favorably, architecture
|
||||
// validated; otherwise the scrum verdict points at what to fix.
|
||||
//
|
||||
// Fail-fast: every error bubbles. No silent catches — the run ABORTS with
|
||||
// the underlying stack so we see exactly where the architecture broke.
|
||||
//
|
||||
// Runtime: ~60-90 min. Cloud cost: ~24-32 gpt-oss calls (well under daily cap).
|
||||
|
||||
import { writeFile, mkdir, readFile } from "node:fs/promises";
|
||||
import { join } from "node:path";
|
||||
|
||||
const GATEWAY = process.env.GATEWAY_URL ?? "http://localhost:3100";
|
||||
const LLM_TEAM = process.env.LLM_TEAM_URL ?? "http://localhost:5000";
|
||||
const BATTERY_DIR = process.env.BATTERY_DIR
|
||||
?? "/home/profit/lakehouse/data/_kb/battery";
|
||||
|
||||
// 10-minute timeout per /v1/respond call — cloud executor on a hard task
|
||||
// can chew for a while, and we want to see real behavior, not premature aborts.
|
||||
const RESPOND_TIMEOUT_MS = 10 * 60 * 1000;
|
||||
const META_TIMEOUT_MS = 5 * 60 * 1000;
|
||||
|
||||
interface Task {
|
||||
task_class: string;
|
||||
operation: string;
|
||||
spec: Record<string, any>;
|
||||
}
|
||||
|
||||
interface Tasks {
|
||||
phases: {
|
||||
alpha_baseline: Task[];
|
||||
beta_drift: Task[];
|
||||
gamma_impossible: Task[];
|
||||
};
|
||||
models: {
|
||||
executor_cloud: string;
|
||||
reviewer_cloud: string;
|
||||
overseer_cloud: string;
|
||||
};
|
||||
}
|
||||
|
||||
interface RunResult {
|
||||
status: "ok" | "failed" | "blocked";
|
||||
iterations: number;
|
||||
artifact: any;
|
||||
log: any[];
|
||||
error?: string | null;
|
||||
_elapsed_ms: number;
|
||||
}
|
||||
|
||||
interface TaskRun {
|
||||
task: Task;
|
||||
phase: "alpha" | "beta" | "gamma";
|
||||
result: RunResult;
|
||||
}
|
||||
|
||||
// ─── HTTP helpers ───
|
||||
|
||||
async function runRespond(task: Task, models: Tasks["models"]): Promise<RunResult> {
|
||||
const body = {
|
||||
task_class: task.task_class,
|
||||
operation: task.operation,
|
||||
spec: task.spec,
|
||||
executor_model: models.executor_cloud,
|
||||
reviewer_model: models.reviewer_cloud,
|
||||
};
|
||||
const start = Date.now();
|
||||
const resp = await fetch(`${GATEWAY}/v1/respond`, {
|
||||
method: "POST",
|
||||
headers: { "content-type": "application/json" },
|
||||
body: JSON.stringify(body),
|
||||
signal: AbortSignal.timeout(RESPOND_TIMEOUT_MS),
|
||||
});
|
||||
if (!resp.ok) {
|
||||
const txt = await resp.text();
|
||||
throw new Error(`/v1/respond HTTP ${resp.status}: ${txt.slice(0, 500)}`);
|
||||
}
|
||||
const j = (await resp.json()) as RunResult;
|
||||
j._elapsed_ms = Date.now() - start;
|
||||
return j;
|
||||
}
|
||||
|
||||
async function runDistill(source: string): Promise<any[]> {
|
||||
const body = { mode: "distill", prompt: "battery iteration distill", source };
|
||||
const resp = await fetch(`${LLM_TEAM}/api/run?mode=distill`, {
|
||||
method: "POST",
|
||||
headers: { "content-type": "application/json" },
|
||||
body: JSON.stringify(body),
|
||||
signal: AbortSignal.timeout(META_TIMEOUT_MS),
|
||||
});
|
||||
if (!resp.ok) throw new Error(`distill HTTP ${resp.status}`);
|
||||
const text = await resp.text();
|
||||
// SSE stream — parse data: lines, return parsed event objects
|
||||
const events: any[] = [];
|
||||
for (const line of text.split("\n")) {
|
||||
if (!line.startsWith("data: ")) continue;
|
||||
try { events.push(JSON.parse(line.slice(6))); } catch { /* skip */ }
|
||||
}
|
||||
return events;
|
||||
}
|
||||
|
||||
async function cloudChat(
|
||||
model: string,
|
||||
prompt: string,
|
||||
temperature: number,
|
||||
think: boolean,
|
||||
): Promise<string> {
|
||||
const resp = await fetch(`${GATEWAY}/v1/chat`, {
|
||||
method: "POST",
|
||||
headers: { "content-type": "application/json" },
|
||||
body: JSON.stringify({
|
||||
model,
|
||||
messages: [{ role: "user", content: prompt }],
|
||||
temperature,
|
||||
think,
|
||||
provider: "ollama_cloud",
|
||||
}),
|
||||
signal: AbortSignal.timeout(META_TIMEOUT_MS),
|
||||
});
|
||||
if (!resp.ok) {
|
||||
const txt = await resp.text();
|
||||
throw new Error(`/v1/chat ${model} HTTP ${resp.status}: ${txt.slice(0, 500)}`);
|
||||
}
|
||||
const j = await resp.json() as any;
|
||||
return j.choices?.[0]?.message?.content ?? "";
|
||||
}
|
||||
|
||||
// ─── Meta-review + scrum ───
|
||||
|
||||
async function overseerReview(
|
||||
iterNum: number,
|
||||
artifacts: any,
|
||||
models: Tasks["models"],
|
||||
): Promise<string> {
|
||||
const prompt = `You are the OVERSEER reviewing iteration ${iterNum} of a stress battery run against Lakehouse /v1/respond.
|
||||
|
||||
For each task in the battery below, examine: status (ok/failed/blocked), iterations used, error signature, whether the in-loop overseer fired, total tokens.
|
||||
|
||||
Produce a PR-style meta-review in markdown with these sections:
|
||||
|
||||
## What worked
|
||||
List specific tasks (by operation string) that completed correctly and the evidence — turns_used, citations, tokens. Be concrete.
|
||||
|
||||
## What failed
|
||||
List specific tasks that failed or needed overseer correction. Classify: was it a real failure (impossible task), a drift we should repair, or a false positive from the test?
|
||||
|
||||
## Proposed changes for iteration ${iterNum + 1}
|
||||
At least 3 concrete architectural changes, each with:
|
||||
- **Target file** (e.g. \`crates/gateway/src/execution_loop/mod.rs\`)
|
||||
- **Rationale** (what the metrics show)
|
||||
- **Expected impact** (which metric should move in iter ${iterNum + 1})
|
||||
|
||||
Be honest about weaknesses. Do NOT propose generic best practices — reference specific observations from the artifacts below.
|
||||
|
||||
ARTIFACTS (iteration ${iterNum}):
|
||||
${JSON.stringify(artifacts, null, 2).slice(0, 30000)}`;
|
||||
|
||||
return cloudChat(models.overseer_cloud, prompt, 0.2, true);
|
||||
}
|
||||
|
||||
async function scrumJudge(
|
||||
iterNum: number,
|
||||
review: string,
|
||||
models: Tasks["models"],
|
||||
): Promise<string> {
|
||||
const prompt = `You are the SCRUM MASTER. The OVERSEER proposed these architectural changes for iteration ${iterNum + 1} based on iteration ${iterNum}'s results.
|
||||
|
||||
For each proposal, produce a verdict in markdown:
|
||||
|
||||
- **Proposal N**: <short name>
|
||||
- **Verdict**: APPROVE | REVISE | REJECT
|
||||
- **Reason**: why
|
||||
- **If APPROVE**: is the expected impact realistic? what's the blast radius? is the target file correct?
|
||||
- **If REVISE**: what should change about the proposal before applying?
|
||||
- **If REJECT**: why is the proposal wrong or out of scope?
|
||||
|
||||
Final section:
|
||||
## PR-ready changes
|
||||
Bulleted list of only the APPROVE proposals, ready to apply.
|
||||
|
||||
Be rigorous. Don't rubber-stamp. If a proposal references a file that probably doesn't exist, REJECT and say so. If a proposal is a generic "improve X" without concrete plan, REVISE.
|
||||
|
||||
OVERSEER PROPOSED:
|
||||
${review.slice(0, 15000)}`;
|
||||
|
||||
return cloudChat(models.overseer_cloud, prompt, 0.1, true);
|
||||
}
|
||||
|
||||
// ─── Iteration driver ───
|
||||
|
||||
async function runIteration(iterNum: number, tasks: Tasks): Promise<any> {
|
||||
console.log(`\n${"═".repeat(60)}`);
|
||||
console.log(`▶ ITERATION ${iterNum}`);
|
||||
console.log(`${"═".repeat(60)}\n`);
|
||||
|
||||
const iterDir = join(BATTERY_DIR, `iter_${iterNum}`);
|
||||
await mkdir(iterDir, { recursive: true });
|
||||
|
||||
const runs: TaskRun[] = [];
|
||||
|
||||
for (const [phaseKey, phaseName] of [
|
||||
["alpha_baseline", "alpha"],
|
||||
["beta_drift", "beta"],
|
||||
["gamma_impossible", "gamma"],
|
||||
] as const) {
|
||||
console.log(`\n── Phase ${phaseName} ──`);
|
||||
for (const task of tasks.phases[phaseKey]) {
|
||||
console.log(` ▶ ${task.operation}`);
|
||||
const result = await runRespond(task, tasks.models);
|
||||
const overseerFired = (result.log ?? []).some(e => e.kind === "overseer_correction");
|
||||
console.log(
|
||||
` status=${result.status} turns=${result.iterations}` +
|
||||
` tokens=${result.artifact?.usage?.total_tokens ?? 0}` +
|
||||
` overseer=${overseerFired}` +
|
||||
` elapsed=${Math.round(result._elapsed_ms / 1000)}s`
|
||||
);
|
||||
if (result.error) console.log(` error: ${result.error.slice(0, 200)}`);
|
||||
runs.push({ task, phase: phaseName, result });
|
||||
}
|
||||
}
|
||||
|
||||
// Phase δ
|
||||
console.log(`\n── Phase δ: distill outcomes_tail:20 ──`);
|
||||
const distillEvents = await runDistill("outcomes_tail:20");
|
||||
const distillFinal = [...distillEvents].reverse()
|
||||
.find(e => e.role === "final") ?? distillEvents[distillEvents.length - 1];
|
||||
const distillText = distillFinal?.text ?? JSON.stringify(distillFinal ?? {}).slice(0, 200);
|
||||
console.log(` ${distillText.split("\n")[0]}`);
|
||||
await writeFile(join(iterDir, "distill_output.txt"), distillText);
|
||||
|
||||
// Metrics
|
||||
const collectPhase = (p: string) => runs.filter(r => r.phase === p);
|
||||
const phaseMetrics = (p: string) => {
|
||||
const ps = collectPhase(p);
|
||||
if (ps.length === 0) return { count: 0 };
|
||||
return {
|
||||
count: ps.length,
|
||||
ok: ps.filter(r => r.result.status === "ok").length,
|
||||
failed: ps.filter(r => r.result.status === "failed").length,
|
||||
avg_turns: ps.reduce((s, r) => s + (r.result.iterations || 0), 0) / ps.length,
|
||||
total_tokens: ps.reduce((s, r) => s + (r.result.artifact?.usage?.total_tokens ?? 0), 0),
|
||||
overseer_called: ps.filter(r => (r.result.log ?? []).some(e => e.kind === "overseer_correction")).length,
|
||||
avg_elapsed_s: ps.reduce((s, r) => s + (r.result._elapsed_ms || 0), 0) / ps.length / 1000,
|
||||
};
|
||||
};
|
||||
|
||||
const metrics = {
|
||||
iteration: iterNum,
|
||||
total_tasks: runs.length,
|
||||
ok_tasks: runs.filter(r => r.result.status === "ok").length,
|
||||
failed_tasks: runs.filter(r => r.result.status === "failed").length,
|
||||
blocked_tasks: runs.filter(r => r.result.status === "blocked").length,
|
||||
total_tokens: runs.reduce((s, r) => s + (r.result.artifact?.usage?.total_tokens ?? 0), 0),
|
||||
avg_turns_per_task: runs.reduce((s, r) => s + (r.result.iterations || 0), 0) / runs.length,
|
||||
overseer_called_rate: runs.filter(r => (r.result.log ?? []).some(e => e.kind === "overseer_correction")).length / runs.length,
|
||||
total_elapsed_s: runs.reduce((s, r) => s + (r.result._elapsed_ms || 0), 0) / 1000,
|
||||
by_phase: {
|
||||
alpha: phaseMetrics("alpha"),
|
||||
beta: phaseMetrics("beta"),
|
||||
gamma: phaseMetrics("gamma"),
|
||||
},
|
||||
};
|
||||
|
||||
console.log(`\n── Metrics ──`);
|
||||
console.log(` total_tokens: ${metrics.total_tokens}`);
|
||||
console.log(` avg_turns_per_task: ${metrics.avg_turns_per_task.toFixed(2)}`);
|
||||
console.log(` overseer_called_rate: ${(metrics.overseer_called_rate * 100).toFixed(1)}%`);
|
||||
console.log(` ok/total: ${metrics.ok_tasks}/${metrics.total_tasks}`);
|
||||
|
||||
await writeFile(join(iterDir, "runs.json"), JSON.stringify(runs, null, 2));
|
||||
await writeFile(join(iterDir, "metrics.json"), JSON.stringify(metrics, null, 2));
|
||||
|
||||
// Phase ε: overseer review
|
||||
console.log(`\n── Phase ε: overseer meta-review ──`);
|
||||
const reviewInput = {
|
||||
metrics,
|
||||
task_summary: runs.map(r => ({
|
||||
operation: r.task.operation,
|
||||
phase: r.phase,
|
||||
status: r.result.status,
|
||||
iterations: r.result.iterations,
|
||||
tokens: r.result.artifact?.usage?.total_tokens ?? 0,
|
||||
overseer_called: (r.result.log ?? []).some(e => e.kind === "overseer_correction"),
|
||||
error: r.result.error ?? null,
|
||||
elapsed_s: Math.round((r.result._elapsed_ms || 0) / 1000),
|
||||
})),
|
||||
};
|
||||
const review = await overseerReview(iterNum, reviewInput, tasks.models);
|
||||
await writeFile(join(iterDir, "overseer_review.md"), review);
|
||||
console.log(` ✓ ${review.length} chars`);
|
||||
|
||||
// Phase ζ: scrum
|
||||
console.log(`\n── Phase ζ: scrum judgment ──`);
|
||||
const verdict = await scrumJudge(iterNum, review, tasks.models);
|
||||
await writeFile(join(iterDir, "scrum_findings.md"), verdict);
|
||||
console.log(` ✓ ${verdict.length} chars`);
|
||||
|
||||
return metrics;
|
||||
}
|
||||
|
||||
// ─── Main ───
|
||||
|
||||
async function main() {
|
||||
const tasks = JSON.parse(
|
||||
await readFile("/home/profit/lakehouse/tests/battery/tasks.json", "utf8"),
|
||||
) as Tasks;
|
||||
|
||||
await mkdir(BATTERY_DIR, { recursive: true });
|
||||
|
||||
const iterations: any[] = [];
|
||||
const batteryStart = Date.now();
|
||||
|
||||
for (let i = 1; i <= 3; i++) {
|
||||
const m = await runIteration(i, tasks);
|
||||
iterations.push(m);
|
||||
}
|
||||
|
||||
const batteryElapsed = (Date.now() - batteryStart) / 1000;
|
||||
|
||||
// Summary
|
||||
const delta = (k: keyof any, inverted = false) => {
|
||||
const vals = iterations.map((m: any) => m[k]);
|
||||
if (vals.some(v => v === undefined)) return "—";
|
||||
const diff = vals[2] - vals[0];
|
||||
const pct = vals[0] !== 0 ? (diff / vals[0]) * 100 : 0;
|
||||
const arrow = inverted ? (diff < 0 ? "↓ better" : "↑ worse") : (diff > 0 ? "↑ better" : "↓ worse");
|
||||
return `${arrow} (${diff > 0 ? "+" : ""}${diff.toFixed?.(2) ?? diff}, ${pct.toFixed(1)}%)`;
|
||||
};
|
||||
|
||||
const rows = [
|
||||
["total_tokens", "inverted", "want ↓ — fewer tokens for same work"],
|
||||
["avg_turns_per_task", "inverted", "want ↓ — executor gets smarter"],
|
||||
["overseer_called_rate", "inverted", "want ↓ — fewer cloud escalations"],
|
||||
["ok_tasks", "normal", "want ↑ — more successes"],
|
||||
["total_elapsed_s", "inverted", "want ↓ — faster iterations"],
|
||||
];
|
||||
|
||||
let summary = `# Compounding Stress Battery — Summary\n\n`;
|
||||
summary += `**Run:** ${new Date().toISOString()}\n`;
|
||||
summary += `**Elapsed:** ${Math.round(batteryElapsed)}s (${(batteryElapsed/60).toFixed(1)} min)\n`;
|
||||
summary += `**Models:** executor=${tasks.models.executor_cloud}, reviewer=${tasks.models.reviewer_cloud}, overseer=${tasks.models.overseer_cloud}\n\n`;
|
||||
|
||||
summary += `## Compounding Metrics\n\n`;
|
||||
summary += `| Metric | iter 1 | iter 2 | iter 3 | Trend (1→3) | Goal |\n`;
|
||||
summary += `|---|---|---|---|---|---|\n`;
|
||||
for (const [key, inv, goal] of rows) {
|
||||
const vals = iterations.map((m: any) => {
|
||||
const v = m[key as string];
|
||||
return typeof v === "number" ? v.toFixed(2) : String(v);
|
||||
});
|
||||
summary += `| ${key} | ${vals[0]} | ${vals[1]} | ${vals[2]} | ${delta(key as any, inv === "inverted")} | ${goal} |\n`;
|
||||
}
|
||||
summary += "\n";
|
||||
|
||||
// Count trending metrics
|
||||
const trends = rows.map(([k, inv]) => {
|
||||
const vs = iterations.map((m: any) => m[k as string]) as number[];
|
||||
const improved = inv === "inverted" ? vs[2] < vs[0] : vs[2] > vs[0];
|
||||
return { metric: k, improved };
|
||||
});
|
||||
const improvedCount = trends.filter(t => t.improved).length;
|
||||
|
||||
summary += `## Verdict\n\n`;
|
||||
if (improvedCount >= 3) {
|
||||
summary += `**✓ Architecture validated** — ${improvedCount}/${trends.length} compounding metrics improved from iteration 1 to 3.\n\n`;
|
||||
} else {
|
||||
summary += `**✗ Compounding NOT demonstrated** — only ${improvedCount}/${trends.length} metrics improved. See scrum_findings.md in each iter_N/ directory for the overseer's proposals and the scrum master's review of what to change.\n\n`;
|
||||
}
|
||||
|
||||
summary += `Metrics that ${improvedCount >= 3 ? "improved" : "regressed"}:\n`;
|
||||
for (const t of trends) {
|
||||
summary += `- ${t.metric}: ${t.improved ? "✓ improved" : "✗ flat or worse"}\n`;
|
||||
}
|
||||
|
||||
summary += `\n## Artifacts\n\n`;
|
||||
summary += `- \`iter_1/\`, \`iter_2/\`, \`iter_3/\` — per-iteration runs.json, metrics.json, overseer_review.md, scrum_findings.md, distill_output.txt\n`;
|
||||
summary += `- \`summary.md\` — this file\n`;
|
||||
|
||||
await writeFile(join(BATTERY_DIR, "summary.md"), summary);
|
||||
console.log(`\n${"═".repeat(60)}`);
|
||||
console.log(`✓ BATTERY COMPLETE — ${Math.round(batteryElapsed)}s`);
|
||||
console.log(` Summary: ${join(BATTERY_DIR, "summary.md")}`);
|
||||
console.log(`${"═".repeat(60)}\n`);
|
||||
console.log(summary);
|
||||
}
|
||||
|
||||
main().catch(e => {
|
||||
console.error(`\n${"═".repeat(60)}`);
|
||||
console.error(`✗ BATTERY FAILED: ${e.message}`);
|
||||
console.error(`${"═".repeat(60)}\n`);
|
||||
if (e.stack) console.error(e.stack);
|
||||
process.exit(1);
|
||||
});
|
||||
@ -1,57 +0,0 @@
|
||||
{
|
||||
"description": "Compounding stress battery tasks. Each iteration runs α (baseline) + β (drift) + γ (impossible) phases. The SAME tasks repeat across iterations so we can measure compounding (turns_used, overseer_called_rate, correction_effective).",
|
||||
"phases": {
|
||||
"alpha_baseline": [
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Warehouse Associate x3 in Columbus, OH",
|
||||
"spec": { "target_role": "Warehouse Associate", "target_count": 3, "target_city": "Columbus", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
|
||||
},
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Forklift Operator x2 in Toledo, OH",
|
||||
"spec": { "target_role": "Forklift Operator", "target_count": 2, "target_city": "Toledo", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
|
||||
},
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Packer x4 in Cleveland, OH",
|
||||
"spec": { "target_role": "Packer", "target_count": 4, "target_city": "Cleveland", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
|
||||
}
|
||||
],
|
||||
"beta_drift": [
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Machine Operator x2 in Youngstown, OH (requires OSHA 30 + bilingual Spanish)",
|
||||
"spec": { "target_role": "Machine Operator", "target_count": 2, "target_city": "Youngstown", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1; prefer candidates with OSHA certification and Spanish" }
|
||||
},
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Welder x2 in Dayton, OH (AWS D1.1 certified, night shift)",
|
||||
"spec": { "target_role": "Welder", "target_count": 2, "target_city": "Dayton", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1; filter by certification and shift flexibility" }
|
||||
},
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Assembler x5 in Akron, OH (SMT experience, cleanroom)",
|
||||
"spec": { "target_role": "Assembler", "target_count": 5, "target_city": "Akron", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
|
||||
}
|
||||
],
|
||||
"gamma_impossible": [
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Underwater Welder x2 in Toledo, OH",
|
||||
"spec": { "target_role": "Underwater Welder", "target_count": 2, "target_city": "Toledo", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1 (expected to fail — no supply)" }
|
||||
},
|
||||
{
|
||||
"task_class": "staffing.fill",
|
||||
"operation": "fill: Astronaut x1 in Springfield, OH",
|
||||
"spec": { "target_role": "Astronaut", "target_count": 1, "target_city": "Springfield", "target_state": "OH", "approach_hint": "(expected to fail — out-of-domain role)" }
|
||||
}
|
||||
]
|
||||
},
|
||||
"models": {
|
||||
"executor_cloud": "gpt-oss:20b",
|
||||
"reviewer_cloud": "gpt-oss:20b",
|
||||
"overseer_cloud": "gpt-oss:120b",
|
||||
"notes": "gpt-oss:20b for hot path (faster, cheaper per call), gpt-oss:120b for meta-reviews. All cloud per 2026-04-23 'cloud modes are on' directive."
|
||||
}
|
||||
}
|
||||
@ -1,45 +0,0 @@
|
||||
{
|
||||
"generated_at": "2026-04-21T00:44:59.486489Z",
|
||||
"runs": [
|
||||
{
|
||||
"label": "A(no-T3)",
|
||||
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-30-54",
|
||||
"ok_events": 0,
|
||||
"total_events": 5,
|
||||
"total_turns": 0,
|
||||
"total_gaps": 5,
|
||||
"total_citations": 0,
|
||||
"prior_lessons_loaded": 0
|
||||
},
|
||||
{
|
||||
"label": "B(T3-seed)",
|
||||
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-37-04",
|
||||
"ok_events": 0,
|
||||
"total_events": 5,
|
||||
"total_turns": 0,
|
||||
"total_gaps": 5,
|
||||
"total_citations": 0,
|
||||
"prior_lessons_loaded": 1
|
||||
},
|
||||
{
|
||||
"label": "C(T3-read)",
|
||||
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-39-54",
|
||||
"ok_events": 0,
|
||||
"total_events": 5,
|
||||
"total_turns": 0,
|
||||
"total_gaps": 5,
|
||||
"total_citations": 0,
|
||||
"prior_lessons_loaded": 2
|
||||
},
|
||||
{
|
||||
"label": "D(T3-cloud)",
|
||||
"path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-43-44",
|
||||
"ok_events": 0,
|
||||
"total_events": 5,
|
||||
"total_turns": 0,
|
||||
"total_gaps": 5,
|
||||
"total_citations": 0,
|
||||
"prior_lessons_loaded": 3
|
||||
}
|
||||
]
|
||||
}
|
||||
@ -1,25 +0,0 @@
|
||||
# KB Measurement Report
|
||||
|
||||
Generated from 26 runs across 24 distinct signatures.
|
||||
|
||||
## Recommender confidence
|
||||
- high: 23
|
||||
- medium: 1
|
||||
- low: 3
|
||||
|
||||
## Overall fill + citation
|
||||
- Fill rate: **60/86** (69.8%)
|
||||
- Avg citations per run: **1.38**
|
||||
- Avg turns per run: 6.6
|
||||
|
||||
## Citation coverage by (role, city, state)
|
||||
- Combos with ≥1 citation: 9
|
||||
- Combos with ok fills but 0 citations: 31
|
||||
|
||||
## Item 3 decision signal
|
||||
Non-zero: there are **combos that succeeded but never triggered playbook_memory boost**. Candidates for item 3 investigation:
|
||||
- Machine Operator in Indianapolis, IN: 1/1 ok, 0 cites
|
||||
- Assembler in Indianapolis, IN: 2/2 ok, 0 cites
|
||||
- Warehouse Associate in Indianapolis, IN: 1/1 ok, 0 cites
|
||||
- Forklift Operator in Cleveland, OH: 1/1 ok, 0 cites
|
||||
- Assembler in Cleveland, OH: 2/2 ok, 0 cites
|
||||
@ -1 +0,0 @@
|
||||
{"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-29048","name":"Raymond G. Ward","reason":"Raymond has a high availability score (0.9033) and relevant skills for the Machine Operator position in Toledo, OH. He is also endorsed by past similar fills."},{"candidate_id":"W500K-20613","name":"Pamela V. Green","reason":"Pamela has a high availability score (0.8496) and relevant skills for the Machine Operator position in Toledo, OH. She is also endorsed by past similar fills."}],"turns":5,"duration_secs":12.051,"pool_size":997,"playbook_citations":[]}
|
||||
@ -1,17 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 10:30 recurring — Machine Operator x2
|
||||
|
||||
Subject: 2 Filled
|
||||
|
||||
Dear Riverfront Steel Team,
|
||||
|
||||
We are pleased to confirm that we have filled both Machine Operator roles for your Tuesday/Thursday shifts starting at 11:00 AM.
|
||||
|
||||
- Raymond G. Ward
|
||||
- Pamela V. Green
|
||||
|
||||
Both candidates have high availability scores and relevant experience. Please note this is a recurring slot, and prior workers may still be available.
|
||||
|
||||
Best regards,
|
||||
Dispatch Team Lakehouse
|
||||
@ -1,45 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 28.9 | 0 | 1 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | 997 | ✓ 2 | 5 | 12.1 | 0 | 1 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 20.3 | 0 | 1 |
|
||||
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 35.7 | 0 | 1 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 11.5 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
| undefined Raymond G. Ward | 10:30 | Machine Operator | Toledo, OH | confirmed |
|
||||
| undefined Pamela V. Green | 10:30 | Machine Operator | Toledo, OH | confirmed |
|
||||
|
||||
## Gap signals
|
||||
|
||||
### drift_or_tool
|
||||
- **08:00** — invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {"kind":"plan","steps":["Verify one candidate from the current list using sql tool for SQL verification.","Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH."]}
|
||||
{"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id, name, role, city, state, availability FROM
|
||||
- **12:15** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications
|
||||
- **14:00** — no consensus after 14 turns
|
||||
- **15:45** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search", "args":{"index_name":"workers_500k_v1","sql_filter":"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN ["EXCLUDE_WORKERS_ID1", "EXCLUDE_WORKERS_ID2"
|
||||
|
||||
### double_book
|
||||
- **10:30** — undefined Pamela V. Green already booked for 10:30
|
||||
|
||||
### fairness
|
||||
- _cross-event_ — Raymond G. Ward (undefined) booked 2 times today
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 33 entries (ran 5 events, expected ≥ 1 new entries from this run)
|
||||
|
||||
## Narrative
|
||||
|
||||
- 1/5 events reached consensus.
|
||||
- Final roster: 2 bookings across 1 distinct workers.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 08:00 baseline_fill, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
|
||||
@ -1,118 +0,0 @@
|
||||
[
|
||||
{
|
||||
"event": {
|
||||
"kind": "baseline_fill",
|
||||
"at": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 3,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "08:00 AM",
|
||||
"scenario_note": "Regular Monday morning shift, 8-hour."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 28.888,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {\"kind\":\"plan\",\"steps\":[\"Verify one candidate from the current list using sql tool for SQL verification.\",\"Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH.\"]}\n{\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state, availability FROM ",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {\"kind\":\"plan\",\"steps\":[\"Verify one candidate from the current list using sql tool for SQL verification.\",\"Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH.\"]}\n{\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state, availability FROM "
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "recurring",
|
||||
"at": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"count": 2,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "11:00 AM",
|
||||
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
|
||||
},
|
||||
"ok": true,
|
||||
"fills": [
|
||||
{
|
||||
"candidate_id": "W500K-29048",
|
||||
"name": "Raymond G. Ward",
|
||||
"reason": "Raymond has a high availability score (0.9033) and relevant skills for the Machine Operator position in Toledo, OH. He is also endorsed by past similar fills."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-20613",
|
||||
"name": "Pamela V. Green",
|
||||
"reason": "Pamela has a high availability score (0.8496) and relevant skills for the Machine Operator position in Toledo, OH. She is also endorsed by past similar fills."
|
||||
}
|
||||
],
|
||||
"turns": 5,
|
||||
"duration_secs": 12.051,
|
||||
"gap_signals": [
|
||||
"double_book: undefined Pamela V. Green already booked for 10:30"
|
||||
],
|
||||
"sources_first_score": 0.6692528,
|
||||
"sources_last_score": 0.64494026,
|
||||
"pool_size": 997,
|
||||
"playbook_citations": []
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "expansion",
|
||||
"at": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"count": 5,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "01:00 PM",
|
||||
"scenario_note": "New warehouse location opening, five-worker team needed."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 20.342,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications ",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications "
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "emergency",
|
||||
"at": "14:00",
|
||||
"role": "Loader",
|
||||
"count": 4,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "04:00 PM same day",
|
||||
"deadline": "16:00",
|
||||
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 35.727,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "misplacement",
|
||||
"at": "15:45",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 1,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "remainder of 08:00 shift",
|
||||
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
|
||||
"replaces_event": "08:00"
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 11.518,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\", \"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN [\"EXCLUDE_WORKERS_ID1\", \"EXCLUDE_WORKERS_ID2\"",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\", \"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN [\"EXCLUDE_WORKERS_ID1\", \"EXCLUDE_WORKERS_ID2\""
|
||||
]
|
||||
}
|
||||
]
|
||||
@ -1,18 +0,0 @@
|
||||
[
|
||||
{
|
||||
"name": "Raymond G. Ward",
|
||||
"booked_for": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Pamela V. Green",
|
||||
"booked_for": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
}
|
||||
]
|
||||
@ -1,11 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 10:30 recurring — Machine Operator x2 in Toledo, OH
|
||||
|
||||
TO: Raymond G. Ward
|
||||
Confirming your Machine Operator shift at Riverfront Steel in Toledo, OH starting at 11:00 AM on Tuesday/Thursday. Still available!
|
||||
|
||||
---
|
||||
|
||||
TO: Pamela V. Green
|
||||
Your Machine Operator shift at Riverfront Steel in Toledo, OH starts at 11:00 AM on Tuesday/Thursday. Confirm your availability please.
|
||||
@ -1 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
@ -1 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
@ -1,22 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 12:15 expansion — Forklift Operator x5
|
||||
|
||||
Subject: 5 Workers Confirmed
|
||||
|
||||
Dear Riverfront Steel Team,
|
||||
|
||||
I am pleased to confirm that we have filled all five positions for Forklift Operators at your new warehouse location opening today starting at 1:00 PM. The workers are:
|
||||
|
||||
- Laura F. Morales
|
||||
- Kyle F. Brooks
|
||||
- Maria K. Cruz
|
||||
- Jeffrey D. Taylor
|
||||
- Charles T. Walker
|
||||
|
||||
All meet the criteria of being Forklift Operators in Toledo, OH.
|
||||
|
||||
Looking forward to a successful shift!
|
||||
|
||||
Best regards,
|
||||
Dispatch Team Lakehouse
|
||||
@ -1,26 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 12:15 expansion — Forklift Operator x5 in Toledo, OH
|
||||
|
||||
TO: Laura F. Morales
|
||||
Confirming your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starting at 1:00 PM for the new warehouse location opening. W500K-22625
|
||||
|
||||
---
|
||||
|
||||
TO: Kyle F. Brooks
|
||||
You're scheduled to start your Forklift Operator shift at Riverfront Steel in Toledo, OH at 1:00 PM for the new warehouse location opening. Confirm W500K-33961
|
||||
|
||||
---
|
||||
|
||||
TO: Maria K. Cruz
|
||||
Your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starts at 1:00 PM for the new warehouse location opening. Please confirm your attendance. W500K-19588
|
||||
|
||||
---
|
||||
|
||||
TO: Jeffrey D. Taylor
|
||||
Confirming your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starting at 1:00 PM for the new warehouse location opening. W500K-37729
|
||||
|
||||
---
|
||||
|
||||
TO: Charles T. Walker
|
||||
You're scheduled to start your Forklift Operator shift at Riverfront Steel in Toledo, OH at 1:00 PM for the new warehouse location opening. Confirm your attendance. W500K-17543
|
||||
@ -1,2 +0,0 @@
|
||||
{"at":"08:00","kind":"baseline_fill","operation":"fill: Warehouse Associate x3 in Toledo, OH","fills":[{"candidate_id":"W500K-49164","name":"Christopher Y. Phillips","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-40928","name":"Janet E. Hill","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-34704","name":"Fatima U. Rivera","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."}],"turns":7,"duration_secs":20.128,"pool_size":770,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"}
|
||||
{"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-19759","name":"Carmen Z. Rodriguez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.75, verified via sql tool."},{"candidate_id":"W500K-29298","name":"Robert W. Gonzalez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.74, not yet SQL verified but highly likely to meet requirements."}],"turns":5,"duration_secs":17.426,"pool_size":997,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"}
|
||||
@ -1,40 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 08:00 baseline_fill — Warehouse Associate x3
|
||||
|
||||
Subject: 3 Filled
|
||||
|
||||
Dear Riverfront Steel Staffing Team,
|
||||
|
||||
I am pleased to confirm that we have filled all three roles of Warehouse Associate for your Monday morning shift starting at 08:00 AM.
|
||||
|
||||
The workers assigned are:
|
||||
- Christopher Y. Phillips
|
||||
- Janet E. Hill
|
||||
- Fatima U. Rivera
|
||||
|
||||
All three have confirmed their availability and are reliable team members.
|
||||
|
||||
Best regards,
|
||||
Dispatch Team Lakehouse
|
||||
|
||||
## 10:30 recurring — Machine Operator x2
|
||||
|
||||
To: staffing@riverfrontsteel.example
|
||||
From: dispatch@lakehouse.example
|
||||
Subject: Confirmed
|
||||
|
||||
Dear Riverfront Steel Team,
|
||||
|
||||
We are pleased to confirm that we have filled both Machine Operator roles for your Tuesday/Thursday shifts starting at 11:00 AM. The workers assigned are:
|
||||
|
||||
- Carmen Z. Rodriguez
|
||||
- Robert W. Gonzalez
|
||||
|
||||
Both are recurring Machine Operators in Toledo, OH with a score of 0.7.
|
||||
|
||||
Please note this is a recurring slot; prior workers may still be available.
|
||||
|
||||
Best regards,
|
||||
|
||||
Dispatch Team
|
||||
@ -1,74 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | 770 | ✓ 3 | 7 | 20.1 | 0 | 2 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | 997 | ✓ 2 | 5 | 17.4 | 0 | 2 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 46.4 | 0 | 1 |
|
||||
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 54.1 | 0 | 1 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 59.6 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
| undefined Christopher Y. Phillips | 08:00 | Warehouse Associate | Toledo, OH | no_show |
|
||||
| undefined Janet E. Hill | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
|
||||
| undefined Fatima U. Rivera | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
|
||||
| undefined Carmen Z. Rodriguez | 10:30 | Machine Operator | Toledo, OH | confirmed |
|
||||
| undefined Robert W. Gonzalez | 10:30 | Machine Operator | Toledo, OH | confirmed |
|
||||
|
||||
## Gap signals
|
||||
|
||||
### double_book
|
||||
- **08:00** — undefined Janet E. Hill already booked for 08:00
|
||||
- **08:00** — undefined Fatima U. Rivera already booked for 08:00
|
||||
- **10:30** — undefined Carmen Z. Rodriguez already booked for 08:00
|
||||
- **10:30** — undefined Robert W. Gonzalez already booked for 08:00
|
||||
|
||||
### drift_or_tool
|
||||
- **12:15** — invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {"kind":"plan", "steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \'Toledo\' AND state = \'OH\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O
|
||||
- **14:00** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
|
||||
"args":{"index_name":"workers_500k_v1","sql_filter":"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor
|
||||
- **15:45** — no consensus after 14 turns
|
||||
|
||||
### fairness
|
||||
- _cross-event_ — Christopher Y. Phillips (undefined) booked 4 times today
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 2 new entries from this run)
|
||||
|
||||
## Workers touched across the week
|
||||
|
||||
6 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
|
||||
|
||||
| Worker ID | Name | Events | Outcome |
|
||||
|---|---|---|---|
|
||||
| W500K-49164 | Christopher Y. Phillips | 08:00 baseline_fill | booked |
|
||||
| W500K-40928 | Janet E. Hill | 08:00 baseline_fill | booked |
|
||||
| W500K-34704 | Fatima U. Rivera | 08:00 baseline_fill | booked |
|
||||
| W500K-19759 | Carmen Z. Rodriguez | 10:30 recurring | booked |
|
||||
| W500K-29298 | Robert W. Gonzalez | 10:30 recurring | booked |
|
||||
| undefined | Christopher Y. Phillips | 08:00 | no_show |
|
||||
|
||||
## Discovered patterns (meta-index)
|
||||
|
||||
What the system identified across semantically-similar past fills as each event ran:
|
||||
|
||||
- **08:00 baseline_fill** (Warehouse Associate): Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)
|
||||
- **10:30 recurring** (Machine Operator): Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)
|
||||
- **12:15 expansion** (Forklift Operator): —
|
||||
- **14:00 emergency** (Loader): —
|
||||
- **15:45 misplacement** (Warehouse Associate): —
|
||||
|
||||
## Narrative
|
||||
|
||||
- 2/5 events reached consensus.
|
||||
- Final roster: 5 bookings across 1 distinct workers.
|
||||
- Workers touched (booked, failed, or otherwise decided): 6.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 12:15 expansion, 14:00 emergency, 15:45 misplacement.
|
||||
@ -1,146 +0,0 @@
|
||||
[
|
||||
{
|
||||
"event": {
|
||||
"kind": "baseline_fill",
|
||||
"at": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 3,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "08:00 AM",
|
||||
"scenario_note": "Regular Monday morning shift, 8-hour."
|
||||
},
|
||||
"ok": true,
|
||||
"fills": [
|
||||
{
|
||||
"candidate_id": "W500K-49164",
|
||||
"name": "Christopher Y. Phillips",
|
||||
"reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-40928",
|
||||
"name": "Janet E. Hill",
|
||||
"reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-34704",
|
||||
"name": "Fatima U. Rivera",
|
||||
"reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
|
||||
}
|
||||
],
|
||||
"turns": 7,
|
||||
"duration_secs": 20.128,
|
||||
"gap_signals": [
|
||||
"double_book: undefined Janet E. Hill already booked for 08:00",
|
||||
"double_book: undefined Fatima U. Rivera already booked for 08:00"
|
||||
],
|
||||
"sources_first_score": 0.7124013,
|
||||
"sources_last_score": 0.66623676,
|
||||
"pool_size": 770,
|
||||
"playbook_citations": [],
|
||||
"discovered_pattern": "Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "recurring",
|
||||
"at": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"count": 2,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "11:00 AM",
|
||||
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
|
||||
},
|
||||
"ok": true,
|
||||
"fills": [
|
||||
{
|
||||
"candidate_id": "W500K-19759",
|
||||
"name": "Carmen Z. Rodriguez",
|
||||
"reason": "Recurring Machine Operator in Toledo, OH with a score of 0.75, verified via sql tool."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-29298",
|
||||
"name": "Robert W. Gonzalez",
|
||||
"reason": "Recurring Machine Operator in Toledo, OH with a score of 0.74, not yet SQL verified but highly likely to meet requirements."
|
||||
}
|
||||
],
|
||||
"turns": 5,
|
||||
"duration_secs": 17.426,
|
||||
"gap_signals": [
|
||||
"double_book: undefined Carmen Z. Rodriguez already booked for 08:00",
|
||||
"double_book: undefined Robert W. Gonzalez already booked for 08:00"
|
||||
],
|
||||
"sources_first_score": 0.72546995,
|
||||
"sources_last_score": 0.6690281,
|
||||
"pool_size": 997,
|
||||
"playbook_citations": [],
|
||||
"discovered_pattern": "Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "expansion",
|
||||
"at": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"count": 5,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "01:00 PM",
|
||||
"scenario_note": "New warehouse location opening, five-worker team needed."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 46.391,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \\'Toledo\\' AND state = \\'OH\\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \\'Toledo\\' AND state = \\'OH\\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "emergency",
|
||||
"at": "14:00",
|
||||
"role": "Loader",
|
||||
"count": 4,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "04:00 PM same day",
|
||||
"deadline": "16:00",
|
||||
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 54.123,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "misplacement",
|
||||
"at": "15:45",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 1,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "remainder of 08:00 shift",
|
||||
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
|
||||
"replaces_event": "08:00",
|
||||
"exclude_worker_ids": [
|
||||
null,
|
||||
null,
|
||||
null
|
||||
]
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 59.593,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
}
|
||||
]
|
||||
@ -1,42 +0,0 @@
|
||||
[
|
||||
{
|
||||
"name": "Christopher Y. Phillips",
|
||||
"booked_for": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "no_show"
|
||||
},
|
||||
{
|
||||
"name": "Janet E. Hill",
|
||||
"booked_for": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Fatima U. Rivera",
|
||||
"booked_for": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Carmen Z. Rodriguez",
|
||||
"booked_for": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Robert W. Gonzalez",
|
||||
"booked_for": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
}
|
||||
]
|
||||
@ -1,26 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 08:00 baseline_fill — Warehouse Associate x3 in Toledo, OH
|
||||
|
||||
TO: Christopher Y. Phillips
|
||||
Confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting 8 AM today.
|
||||
|
||||
---
|
||||
|
||||
TO: Janet E. Hill
|
||||
Good morning! Confirming your shift as a Warehouse Associate from 8 AM onwards at our Toledo, OH location.
|
||||
|
||||
---
|
||||
|
||||
TO: Fatima U. Rivera
|
||||
Morning Fatima! Just confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting at 8 AM.
|
||||
|
||||
## 10:30 recurring — Machine Operator x2 in Toledo, OH
|
||||
|
||||
TO: Carmen Z. Rodriguez
|
||||
Confirming your shift as a Machine Operator at Riverfront Steel in Toledo, OH starting 11:00 AM on Tuesday/Thursday. Still available!
|
||||
|
||||
---
|
||||
|
||||
TO: Robert W. Gonzalez
|
||||
Your recurring Tuesday/Thursday Machine Operator shift at Riverfront Steel in Toledo, OH starts at 11:00 AM. Confirm your availability please.
|
||||
@ -1 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
@ -1,57 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 63.8 | 0 | 1 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 9.5 | 0 | 1 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 47.8 | 0 | 1 |
|
||||
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 60.1 | 0 | 1 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 62.3 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
|
||||
## Gap signals
|
||||
|
||||
### drift_or_tool
|
||||
- **08:00** — aborted — 3 consecutive drift flags
|
||||
- **10:30** — invalid JSON from executor: JSON Parse error: Unterminated string | raw: {"kind":"plan","steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})",
|
||||
"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A
|
||||
- **12:15** — aborted — 3 consecutive drift flags
|
||||
- **14:00** — aborted — 3 consecutive drift flags
|
||||
- **15:45** — invalid JSON from executor: JSON Parse error: Unterminated string | raw: {"kind": "plan", "steps": ["1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})",
|
||||
"2.2. TOOL_CALL sql({'qu
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
|
||||
|
||||
## Workers touched across the week
|
||||
|
||||
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
|
||||
|
||||
| Worker ID | Name | Events | Outcome |
|
||||
|---|---|---|---|
|
||||
|
||||
## Discovered patterns (meta-index)
|
||||
|
||||
What the system identified across semantically-similar past fills as each event ran:
|
||||
|
||||
- **08:00 baseline_fill** (Warehouse Associate): —
|
||||
- **10:30 recurring** (Machine Operator): —
|
||||
- **12:15 expansion** (Forklift Operator): —
|
||||
- **14:00 emergency** (Loader): —
|
||||
- **15:45 misplacement** (Warehouse Associate): —
|
||||
|
||||
## Narrative
|
||||
|
||||
- 0/5 events reached consensus.
|
||||
- Final roster: 0 bookings across 0 distinct workers.
|
||||
- Workers touched (booked, failed, or otherwise decided): 0.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
|
||||
@ -1,104 +0,0 @@
|
||||
[
|
||||
{
|
||||
"event": {
|
||||
"kind": "baseline_fill",
|
||||
"at": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 3,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "08:00 AM",
|
||||
"scenario_note": "Regular Monday morning shift, 8-hour."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 63.815,
|
||||
"error": "aborted — 3 consecutive drift flags",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: aborted — 3 consecutive drift flags"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "recurring",
|
||||
"at": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"count": 2,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "11:00 AM",
|
||||
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 9.538,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "expansion",
|
||||
"at": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"count": 5,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "01:00 PM",
|
||||
"scenario_note": "New warehouse location opening, five-worker team needed."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 47.797,
|
||||
"error": "aborted — 3 consecutive drift flags",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: aborted — 3 consecutive drift flags"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "emergency",
|
||||
"at": "14:00",
|
||||
"role": "Loader",
|
||||
"count": 4,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "04:00 PM same day",
|
||||
"deadline": "16:00",
|
||||
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 60.115,
|
||||
"error": "aborted — 3 consecutive drift flags",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: aborted — 3 consecutive drift flags"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "misplacement",
|
||||
"at": "15:45",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 1,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "remainder of 08:00 shift",
|
||||
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
|
||||
"replaces_event": "08:00"
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 62.283,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\": \"plan\", \"steps\": [\"1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})\",\n\"2.2. TOOL_CALL sql({'qu",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\": \"plan\", \"steps\": [\"1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})\",\n\"2.2. TOOL_CALL sql({'qu"
|
||||
]
|
||||
}
|
||||
]
|
||||
@ -1 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
@ -1 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
@ -1,55 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `qwen2.5:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 6.4 | 0 | 1 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 16.8 | 0 | 1 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 7.2 | 0 | 1 |
|
||||
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 54.0 | 0 | 1 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 49.3 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
|
||||
## Gap signals
|
||||
|
||||
### drift_or_tool
|
||||
- **08:00** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3"},"rationale":"verify top candidates via SQL query")}
|
||||
- **10:30** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search","args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7","question":"machine operator Toledo OH high reliability","k":2}
|
||||
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5"},"rationale":"verify top candidates via SQL query to me
|
||||
- **14:00** — no consensus after 14 turns
|
||||
- **15:45** — no consensus after 14 turns
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
|
||||
|
||||
## Workers touched across the week
|
||||
|
||||
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
|
||||
|
||||
| Worker ID | Name | Events | Outcome |
|
||||
|---|---|---|---|
|
||||
|
||||
## Discovered patterns (meta-index)
|
||||
|
||||
What the system identified across semantically-similar past fills as each event ran:
|
||||
|
||||
- **08:00 baseline_fill** (Warehouse Associate): —
|
||||
- **10:30 recurring** (Machine Operator): —
|
||||
- **12:15 expansion** (Forklift Operator): —
|
||||
- **14:00 emergency** (Loader): —
|
||||
- **15:45 misplacement** (Warehouse Associate): —
|
||||
|
||||
## Narrative
|
||||
|
||||
- 0/5 events reached consensus.
|
||||
- Final roster: 0 bookings across 0 distinct workers.
|
||||
- Workers touched (booked, failed, or otherwise decided): 0.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
|
||||
@ -1,104 +0,0 @@
|
||||
[
|
||||
{
|
||||
"event": {
|
||||
"kind": "baseline_fill",
|
||||
"at": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 3,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "08:00 AM",
|
||||
"scenario_note": "Regular Monday morning shift, 8-hour."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 6.434,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3\"},\"rationale\":\"verify top candidates via SQL query\")}",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3\"},\"rationale\":\"verify top candidates via SQL query\")}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "recurring",
|
||||
"at": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"count": 2,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "11:00 AM",
|
||||
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 16.752,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7\",\"question\":\"machine operator Toledo OH high reliability\",\"k\":2}",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7\",\"question\":\"machine operator Toledo OH high reliability\",\"k\":2}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "expansion",
|
||||
"at": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"count": 5,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "01:00 PM",
|
||||
"scenario_note": "New warehouse location opening, five-worker team needed."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 7.181,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5\"},\"rationale\":\"verify top candidates via SQL query to me",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5\"},\"rationale\":\"verify top candidates via SQL query to me"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "emergency",
|
||||
"at": "14:00",
|
||||
"role": "Loader",
|
||||
"count": 4,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "04:00 PM same day",
|
||||
"deadline": "16:00",
|
||||
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 54.028,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "misplacement",
|
||||
"at": "15:45",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 1,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "remainder of 08:00 shift",
|
||||
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
|
||||
"replaces_event": "08:00"
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 49.298,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
}
|
||||
]
|
||||
@ -1 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
@ -1 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
@ -1,55 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 47.4 | 0 | 1 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 40.4 | 0 | 1 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 9.4 | 0 | 1 |
|
||||
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 44.7 | 0 | 1 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 45.1 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
|
||||
## Gap signals
|
||||
|
||||
### drift_or_tool
|
||||
- **08:00** — no consensus after 14 turns
|
||||
- **10:30** — aborted — 3 consecutive drift flags
|
||||
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"propose_done","args":{"fills":[{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"verified Toledo forklift op, reliability 0.9"}],"rationale":"one SQL-verified candidate from surfaced candidates"}
|
||||
- **14:00** — aborted — 3 consecutive drift flags
|
||||
- **15:45** — no consensus after 14 turns
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
|
||||
|
||||
## Workers touched across the week
|
||||
|
||||
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
|
||||
|
||||
| Worker ID | Name | Events | Outcome |
|
||||
|---|---|---|---|
|
||||
|
||||
## Discovered patterns (meta-index)
|
||||
|
||||
What the system identified across semantically-similar past fills as each event ran:
|
||||
|
||||
- **08:00 baseline_fill** (Warehouse Associate): —
|
||||
- **10:30 recurring** (Machine Operator): —
|
||||
- **12:15 expansion** (Forklift Operator): —
|
||||
- **14:00 emergency** (Loader): —
|
||||
- **15:45 misplacement** (Warehouse Associate): —
|
||||
|
||||
## Narrative
|
||||
|
||||
- 0/5 events reached consensus.
|
||||
- Final roster: 0 bookings across 0 distinct workers.
|
||||
- Workers touched (booked, failed, or otherwise decided): 0.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
|
||||
@ -1,104 +0,0 @@
|
||||
[
|
||||
{
|
||||
"event": {
|
||||
"kind": "baseline_fill",
|
||||
"at": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 3,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "08:00 AM",
|
||||
"scenario_note": "Regular Monday morning shift, 8-hour."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 47.404,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "recurring",
|
||||
"at": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"count": 2,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "11:00 AM",
|
||||
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 40.374,
|
||||
"error": "aborted — 3 consecutive drift flags",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: aborted — 3 consecutive drift flags"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "expansion",
|
||||
"at": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"count": 5,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "01:00 PM",
|
||||
"scenario_note": "New warehouse location opening, five-worker team needed."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 9.414,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"propose_done\",\"args\":{\"fills\":[{\"candidate_id\":\"W500K-37736\",\"name\":\"Jennifer K. Robinson\",\"reason\":\"verified Toledo forklift op, reliability 0.9\"}],\"rationale\":\"one SQL-verified candidate from surfaced candidates\"}",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"propose_done\",\"args\":{\"fills\":[{\"candidate_id\":\"W500K-37736\",\"name\":\"Jennifer K. Robinson\",\"reason\":\"verified Toledo forklift op, reliability 0.9\"}],\"rationale\":\"one SQL-verified candidate from surfaced candidates\"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "emergency",
|
||||
"at": "14:00",
|
||||
"role": "Loader",
|
||||
"count": 4,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "04:00 PM same day",
|
||||
"deadline": "16:00",
|
||||
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 44.673,
|
||||
"error": "aborted — 3 consecutive drift flags",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: aborted — 3 consecutive drift flags"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "misplacement",
|
||||
"at": "15:45",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 1,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "remainder of 08:00 shift",
|
||||
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
|
||||
"replaces_event": "08:00"
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 45.149,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
}
|
||||
]
|
||||
@ -1 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
@ -1,2 +0,0 @@
|
||||
{"at":"12:15","kind":"expansion","operation":"fill: Forklift Operator x5 in Toledo, OH","fills":[{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-33961","name":"Kyle F. Brooks","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-31297","name":"Jacob T. Diaz","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-40884","name":"Jerry M. Jones","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-37729","name":"Jeffrey D. Taylor","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."}],"turns":7,"duration_secs":28.23,"pool_size":687,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.66–0.96)"}
|
||||
{"at":"14:00","kind":"emergency","operation":"fill: Loader x4 in Toledo, OH","fills":[{"candidate_id":"W500K-15305","name":"Mary R. Richardson","reason":"Verified availability score of 0.988 via SQL and ranked highest among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-12325","name":"Raj Torres","reason":"Ranked second among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-16975","name":"Brian X. Price","reason":"Ranked third among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-22851","name":"Fatima X. Gutierrez","reason":"Ranked fourth among the candidates with an availability score greater than 0.7."}],"turns":6,"duration_secs":22.25,"pool_size":380,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.66–0.96)"}
|
||||
@ -1,40 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 12:15 expansion — Forklift Operator x5
|
||||
|
||||
Subject: 5 Workers Confirmed
|
||||
|
||||
Dear Riverfront Steel Team,
|
||||
|
||||
We are pleased to confirm that we have filled all five positions for Forklift Operators at your new warehouse location opening. The workers starting at 01:00 PM today are:
|
||||
|
||||
- Jennifer K. Robinson
|
||||
- Kyle F. Brooks
|
||||
- Jacob T. Diaz
|
||||
- Jerry M. Jones
|
||||
- Jeffrey D. Taylor
|
||||
|
||||
Each meets the criteria of being a Forklift Operator in Toledo, OH.
|
||||
|
||||
Best regards,
|
||||
Dispatch Team Lakehouse
|
||||
|
||||
## 14:00 emergency — Loader x4
|
||||
|
||||
Subject: 4 Loader Workers Confirmed
|
||||
|
||||
Dear Riverfront Steel Team,
|
||||
|
||||
I am pleased to confirm that we have filled all four loader positions as requested:
|
||||
|
||||
- Mary R. Richardson
|
||||
- Raj Torres
|
||||
- Brian X. Price
|
||||
- Fatima X. Gutierrez
|
||||
|
||||
All workers will start their shift at 04:00 PM today. Please note the walkoff incident requiring a replacement crew by 16:00 sharp.
|
||||
|
||||
Thank you for your trust in Lakehouse Dispatch.
|
||||
|
||||
Best regards,
|
||||
Dispatch Team
|
||||
@ -1,85 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 20.2 | 0 | 1 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 47.4 | 0 | 1 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | 687 | ✓ 5 | 7 | 28.2 | 0 | 4 |
|
||||
| 14:00 | emergency | Loader × 4 | 380 | ✓ 4 | 6 | 22.3 | 0 | 4 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 52.5 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
| undefined Jennifer K. Robinson | 12:15 | Forklift Operator | Toledo, OH | confirmed |
|
||||
| undefined Kyle F. Brooks | 12:15 | Forklift Operator | Toledo, OH | confirmed |
|
||||
| undefined Jacob T. Diaz | 12:15 | Forklift Operator | Toledo, OH | confirmed |
|
||||
| undefined Jerry M. Jones | 12:15 | Forklift Operator | Toledo, OH | confirmed |
|
||||
| undefined Jeffrey D. Taylor | 12:15 | Forklift Operator | Toledo, OH | confirmed |
|
||||
| undefined Mary R. Richardson | 14:00 | Loader | Toledo, OH | confirmed |
|
||||
| undefined Raj Torres | 14:00 | Loader | Toledo, OH | confirmed |
|
||||
| undefined Brian X. Price | 14:00 | Loader | Toledo, OH | confirmed |
|
||||
| undefined Fatima X. Gutierrez | 14:00 | Loader | Toledo, OH | confirmed |
|
||||
|
||||
## Gap signals
|
||||
|
||||
### drift_or_tool
|
||||
- **08:00** — invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {"kind":"plan","steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = \'Warehouse Associate\' AND city = \'Toledo\' AND state = \'OH\' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})",
|
||||
"TOOL_CALL sql({'query':'SELECT worker_i
|
||||
- **10:30** — no consensus after 14 turns
|
||||
- **15:45** — no consensus after 14 turns
|
||||
|
||||
### double_book
|
||||
- **12:15** — undefined Kyle F. Brooks already booked for 12:15
|
||||
- **12:15** — undefined Jacob T. Diaz already booked for 12:15
|
||||
- **12:15** — undefined Jerry M. Jones already booked for 12:15
|
||||
- **12:15** — undefined Jeffrey D. Taylor already booked for 12:15
|
||||
- **14:00** — undefined Mary R. Richardson already booked for 12:15
|
||||
- **14:00** — undefined Raj Torres already booked for 12:15
|
||||
- **14:00** — undefined Brian X. Price already booked for 12:15
|
||||
- **14:00** — undefined Fatima X. Gutierrez already booked for 12:15
|
||||
|
||||
### fairness
|
||||
- _cross-event_ — Jennifer K. Robinson (undefined) booked 9 times today
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 167 entries (ran 5 events, expected ≥ 2 new entries from this run)
|
||||
|
||||
## Workers touched across the week
|
||||
|
||||
9 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
|
||||
|
||||
| Worker ID | Name | Events | Outcome |
|
||||
|---|---|---|---|
|
||||
| W500K-37736 | Jennifer K. Robinson | 12:15 expansion | booked |
|
||||
| W500K-33961 | Kyle F. Brooks | 12:15 expansion | booked |
|
||||
| W500K-31297 | Jacob T. Diaz | 12:15 expansion | booked |
|
||||
| W500K-40884 | Jerry M. Jones | 12:15 expansion | booked |
|
||||
| W500K-37729 | Jeffrey D. Taylor | 12:15 expansion | booked |
|
||||
| W500K-15305 | Mary R. Richardson | 14:00 emergency | booked |
|
||||
| W500K-12325 | Raj Torres | 14:00 emergency | booked |
|
||||
| W500K-16975 | Brian X. Price | 14:00 emergency | booked |
|
||||
| W500K-22851 | Fatima X. Gutierrez | 14:00 emergency | booked |
|
||||
|
||||
## Discovered patterns (meta-index)
|
||||
|
||||
What the system identified across semantically-similar past fills as each event ran:
|
||||
|
||||
- **08:00 baseline_fill** (Warehouse Associate): —
|
||||
- **10:30 recurring** (Machine Operator): —
|
||||
- **12:15 expansion** (Forklift Operator): Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.66–0.96)
|
||||
- **14:00 emergency** (Loader): Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.66–0.96)
|
||||
- **15:45 misplacement** (Warehouse Associate): —
|
||||
|
||||
## Narrative
|
||||
|
||||
- 2/5 events reached consensus.
|
||||
- Final roster: 9 bookings across 1 distinct workers.
|
||||
- Workers touched (booked, failed, or otherwise decided): 9.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 15:45 misplacement.
|
||||
@ -1,165 +0,0 @@
|
||||
[
|
||||
{
|
||||
"event": {
|
||||
"kind": "baseline_fill",
|
||||
"at": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 3,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "08:00 AM",
|
||||
"scenario_note": "Regular Monday morning shift, 8-hour."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 20.215,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = \\'Warehouse Associate\\' AND city = \\'Toledo\\' AND state = \\'OH\\' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_i",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = \\'Warehouse Associate\\' AND city = \\'Toledo\\' AND state = \\'OH\\' AND CAST(availability AS DOUBLE) > 0.5','question':'reliable warehouse associate Toledo'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_i"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "recurring",
|
||||
"at": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"count": 2,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "11:00 AM",
|
||||
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 47.392,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "expansion",
|
||||
"at": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"count": 5,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "01:00 PM",
|
||||
"scenario_note": "New warehouse location opening, five-worker team needed."
|
||||
},
|
||||
"ok": true,
|
||||
"fills": [
|
||||
{
|
||||
"candidate_id": "W500K-37736",
|
||||
"name": "Jennifer K. Robinson",
|
||||
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-33961",
|
||||
"name": "Kyle F. Brooks",
|
||||
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-31297",
|
||||
"name": "Jacob T. Diaz",
|
||||
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-40884",
|
||||
"name": "Jerry M. Jones",
|
||||
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-37729",
|
||||
"name": "Jeffrey D. Taylor",
|
||||
"reason": "Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."
|
||||
}
|
||||
],
|
||||
"turns": 7,
|
||||
"duration_secs": 28.23,
|
||||
"gap_signals": [
|
||||
"double_book: undefined Kyle F. Brooks already booked for 12:15",
|
||||
"double_book: undefined Jacob T. Diaz already booked for 12:15",
|
||||
"double_book: undefined Jerry M. Jones already booked for 12:15",
|
||||
"double_book: undefined Jeffrey D. Taylor already booked for 12:15"
|
||||
],
|
||||
"sources_first_score": 0.6336688,
|
||||
"sources_last_score": 0.55183524,
|
||||
"pool_size": 687,
|
||||
"playbook_citations": [],
|
||||
"discovered_pattern": "Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.66–0.96)"
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "emergency",
|
||||
"at": "14:00",
|
||||
"role": "Loader",
|
||||
"count": 4,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "04:00 PM same day",
|
||||
"deadline": "16:00",
|
||||
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
|
||||
},
|
||||
"ok": true,
|
||||
"fills": [
|
||||
{
|
||||
"candidate_id": "W500K-15305",
|
||||
"name": "Mary R. Richardson",
|
||||
"reason": "Verified availability score of 0.988 via SQL and ranked highest among the candidates with an availability score greater than 0.7."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-12325",
|
||||
"name": "Raj Torres",
|
||||
"reason": "Ranked second among the candidates with an availability score greater than 0.7."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-16975",
|
||||
"name": "Brian X. Price",
|
||||
"reason": "Ranked third among the candidates with an availability score greater than 0.7."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-22851",
|
||||
"name": "Fatima X. Gutierrez",
|
||||
"reason": "Ranked fourth among the candidates with an availability score greater than 0.7."
|
||||
}
|
||||
],
|
||||
"turns": 6,
|
||||
"duration_secs": 22.25,
|
||||
"gap_signals": [
|
||||
"double_book: undefined Mary R. Richardson already booked for 12:15",
|
||||
"double_book: undefined Raj Torres already booked for 12:15",
|
||||
"double_book: undefined Brian X. Price already booked for 12:15",
|
||||
"double_book: undefined Fatima X. Gutierrez already booked for 12:15"
|
||||
],
|
||||
"sources_first_score": 0.73792297,
|
||||
"sources_last_score": 0.7001053,
|
||||
"pool_size": 380,
|
||||
"playbook_citations": [],
|
||||
"discovered_pattern": "Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.66–0.96)"
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "misplacement",
|
||||
"at": "15:45",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 1,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "remainder of 08:00 shift",
|
||||
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
|
||||
"replaces_event": "08:00"
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 52.523,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
}
|
||||
]
|
||||
@ -1,74 +0,0 @@
|
||||
[
|
||||
{
|
||||
"name": "Jennifer K. Robinson",
|
||||
"booked_for": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Kyle F. Brooks",
|
||||
"booked_for": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Jacob T. Diaz",
|
||||
"booked_for": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Jerry M. Jones",
|
||||
"booked_for": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Jeffrey D. Taylor",
|
||||
"booked_for": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Mary R. Richardson",
|
||||
"booked_for": "14:00",
|
||||
"role": "Loader",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Raj Torres",
|
||||
"booked_for": "14:00",
|
||||
"role": "Loader",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Brian X. Price",
|
||||
"booked_for": "14:00",
|
||||
"role": "Loader",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Fatima X. Gutierrez",
|
||||
"booked_for": "14:00",
|
||||
"role": "Loader",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
}
|
||||
]
|
||||
@ -1,46 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 12:15 expansion — Forklift Operator x5 in Toledo, OH
|
||||
|
||||
TO: Jennifer K. Robinson
|
||||
Confirming your shift as a Forklift Operator at Riverfront Steel's new warehouse in Toledo, OH starting 1:00 PM.
|
||||
|
||||
---
|
||||
|
||||
TO: Kyle F. Brooks
|
||||
Your shift as a Forklift Operator at the new Toledo, OH warehouse starts at 1:00 PM today.
|
||||
|
||||
---
|
||||
|
||||
TO: Jacob T. Diaz
|
||||
Confirm your shift as a Forklift Operator at Riverfront Steel's new Toledo, OH location starting at 1:00 PM.
|
||||
|
||||
---
|
||||
|
||||
TO: Jerry M. Jones
|
||||
Your shift as a Forklift Operator at the new Toledo, OH warehouse starts at 1:00 PM today.
|
||||
|
||||
---
|
||||
|
||||
TO: Jeffrey D. Taylor
|
||||
Confirming your shift as a Forklift Operator at Riverfront Steel's new warehouse in Toledo, OH starting 1:00 PM.
|
||||
|
||||
## 14:00 emergency — Loader x4 in Toledo, OH
|
||||
|
||||
TO: Mary R. Richardson
|
||||
Confirming your shift start at 4 PM today as a replacement. See you at Toledo, OH.
|
||||
|
||||
---
|
||||
|
||||
TO: Raj Torres
|
||||
Replacing shift starting now at 4 PM. Toledo, OH.
|
||||
|
||||
---
|
||||
|
||||
TO: Brian X. Price
|
||||
You're on at 4 PM replacing the crew. Toledo, OH.
|
||||
|
||||
---
|
||||
|
||||
TO: Fatima X. Gutierrez
|
||||
Confirming your walkoff shift start at 4 PM today. Toledo, OH.
|
||||
@ -1,2 +0,0 @@
|
||||
{"after_event":"12:15","event_kind":"expansion","ok":true,"model":"gpt-oss:20b","duration_secs":10.228,"risk":"Forklift Operator JSON error","hint":"Ensure JSON is valid; test with a JSON validator; correct syntax before executing the tool call."}
|
||||
{"after_event":"15:45","event_kind":"misplacement","ok":false,"model":"gpt-oss:20b","duration_secs":13.935,"hint":"(T3 unavailable)","risk":"generate returned empty text from gpt-oss:20b: {\"text\":\"\",\"model\":\"gpt-oss:20b\","}
|
||||
@ -1 +0,0 @@
|
||||
{"at":"08:00","kind":"baseline_fill","operation":"fill: Warehouse Associate x3 in Toledo, OH","fills":[{"candidate_id":"W500K-49164","name":"Christopher Y. Phillips","reason":"SQL verified for high availability and semantic score of 0.63, making him the top candidate."},{"candidate_id":"W500K-34704","name":"Fatima U. Rivera","reason":"Semantic score of 0.61 and skills in cold storage make her a strong candidate."},{"candidate_id":"W500K-40928","name":"Janet E. Hill","reason":"Semantic score of 0.61, RF scanner skill, and high reliability score make her a suitable candidate."}],"turns":5,"duration_secs":19.474,"pool_size":770,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: communicator · reliability median 0.83 (range 0.75–0.96)"}
|
||||
@ -1,18 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 08:00 baseline_fill — Warehouse Associate x3
|
||||
|
||||
Subject: 3 Filled
|
||||
|
||||
Dear Riverfront Steel Team,
|
||||
|
||||
I am pleased to confirm that we have filled all three positions with the following Warehouse Associates:
|
||||
|
||||
- Christopher Y. Phillips
|
||||
- Fatima U. Rivera
|
||||
- Janet E. Hill
|
||||
|
||||
Shift starts at 08:00 AM on a regular Monday morning, 8-hour shift.
|
||||
|
||||
Best regards,
|
||||
Dispatch Team Lakehouse
|
||||
@ -1,9 +0,0 @@
|
||||
# Cross-day lesson — Riverfront Steel, 2026-04-21
|
||||
|
||||
_Generated by `gpt-oss:20b` in 7.1s. Based on 5 events + 2 mid-day checkpoints._
|
||||
|
||||
**
|
||||
Validate every JSON payload with a validator before invoking a tool; a malformed payload caused the Forklift Operator expansion to fail.
|
||||
Confirm the GPT model is available and that the tool returns non‑empty text; if it returns an empty string, retry or switch to a fallback model.
|
||||
For recurring, expansion, and emergency events, pre‑fetch the candidate pool and verify it meets the required count before attempting placement.
|
||||
Log any tool failures immediately and update the risk mitigation plan for the next run.
|
||||
@ -1,71 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest` Overview(T3): `gpt-oss:20b`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | 770 | ✓ 3 | 5 | 19.5 | 0 | 2 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 49.0 | 0 | 1 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 2.8 | 0 | 1 |
|
||||
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 48.9 | 0 | 1 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 47.8 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
| undefined Christopher Y. Phillips | 08:00 | Warehouse Associate | Toledo, OH | no_show |
|
||||
| undefined Fatima U. Rivera | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
|
||||
| undefined Janet E. Hill | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
|
||||
|
||||
## Gap signals
|
||||
|
||||
### double_book
|
||||
- **08:00** — undefined Fatima U. Rivera already booked for 08:00
|
||||
- **08:00** — undefined Janet E. Hill already booked for 08:00
|
||||
|
||||
### drift_or_tool
|
||||
- **10:30** — no consensus after 14 turns
|
||||
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
|
||||
"args":{"index_name":"workers_500k_v1",
|
||||
"sql_filter":"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND worker_id NOT IN (42319, 68741, 34927)",
|
||||
"rationale":"Se
|
||||
- **14:00** — no consensus after 14 turns
|
||||
- **15:45** — no consensus after 14 turns
|
||||
|
||||
### fairness
|
||||
- _cross-event_ — Christopher Y. Phillips (undefined) booked 2 times today
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 1163 entries (ran 5 events, expected ≥ 1 new entries from this run)
|
||||
|
||||
## Workers touched across the week
|
||||
|
||||
4 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
|
||||
|
||||
| Worker ID | Name | Events | Outcome |
|
||||
|---|---|---|---|
|
||||
| W500K-49164 | Christopher Y. Phillips | 08:00 baseline_fill | booked |
|
||||
| W500K-34704 | Fatima U. Rivera | 08:00 baseline_fill | booked |
|
||||
| W500K-40928 | Janet E. Hill | 08:00 baseline_fill | booked |
|
||||
| undefined | Christopher Y. Phillips | 08:00 | no_show |
|
||||
|
||||
## Discovered patterns (meta-index)
|
||||
|
||||
What the system identified across semantically-similar past fills as each event ran:
|
||||
|
||||
- **08:00 baseline_fill** (Warehouse Associate): Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: communicator · reliability median 0.83 (range 0.75–0.96)
|
||||
- **10:30 recurring** (Machine Operator): —
|
||||
- **12:15 expansion** (Forklift Operator): —
|
||||
- **14:00 emergency** (Loader): —
|
||||
- **15:45 misplacement** (Warehouse Associate): —
|
||||
|
||||
## Narrative
|
||||
|
||||
- 1/5 events reached consensus.
|
||||
- Final roster: 3 bookings across 1 distinct workers.
|
||||
- Workers touched (booked, failed, or otherwise decided): 4.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
|
||||
@ -1,130 +0,0 @@
|
||||
[
|
||||
{
|
||||
"event": {
|
||||
"kind": "baseline_fill",
|
||||
"at": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 3,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "08:00 AM",
|
||||
"scenario_note": "Regular Monday morning shift, 8-hour."
|
||||
},
|
||||
"ok": true,
|
||||
"fills": [
|
||||
{
|
||||
"candidate_id": "W500K-49164",
|
||||
"name": "Christopher Y. Phillips",
|
||||
"reason": "SQL verified for high availability and semantic score of 0.63, making him the top candidate."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-34704",
|
||||
"name": "Fatima U. Rivera",
|
||||
"reason": "Semantic score of 0.61 and skills in cold storage make her a strong candidate."
|
||||
},
|
||||
{
|
||||
"candidate_id": "W500K-40928",
|
||||
"name": "Janet E. Hill",
|
||||
"reason": "Semantic score of 0.61, RF scanner skill, and high reliability score make her a suitable candidate."
|
||||
}
|
||||
],
|
||||
"turns": 5,
|
||||
"duration_secs": 19.474,
|
||||
"gap_signals": [
|
||||
"double_book: undefined Fatima U. Rivera already booked for 08:00",
|
||||
"double_book: undefined Janet E. Hill already booked for 08:00"
|
||||
],
|
||||
"sources_first_score": 0.6233225,
|
||||
"sources_last_score": 0.55385745,
|
||||
"pool_size": 770,
|
||||
"playbook_citations": [],
|
||||
"discovered_pattern": "Across 25 similar past playbooks (6 workers examined) · recurring certifications: Forklift (67%), OSHA-10 (50%) · recurring skills: mill (50%), 6S (50%) · archetype mostly: communicator · reliability median 0.83 (range 0.75–0.96)"
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "recurring",
|
||||
"at": "10:30",
|
||||
"role": "Machine Operator",
|
||||
"count": 2,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "11:00 AM",
|
||||
"scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 48.986,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "expansion",
|
||||
"at": "12:15",
|
||||
"role": "Forklift Operator",
|
||||
"count": 5,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "01:00 PM",
|
||||
"scenario_note": "New warehouse location opening, five-worker team needed."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 2.845,
|
||||
"error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\n\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND worker_id NOT IN (42319, 68741, 34927)\",\n\"rationale\":\"Se",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\n\"sql_filter\":\"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND worker_id NOT IN (42319, 68741, 34927)\",\n\"rationale\":\"Se"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "emergency",
|
||||
"at": "14:00",
|
||||
"role": "Loader",
|
||||
"count": 4,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "04:00 PM same day",
|
||||
"deadline": "16:00",
|
||||
"scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 48.905,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"event": {
|
||||
"kind": "misplacement",
|
||||
"at": "15:45",
|
||||
"role": "Warehouse Associate",
|
||||
"count": 1,
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"shift_start": "remainder of 08:00 shift",
|
||||
"scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
|
||||
"replaces_event": "08:00",
|
||||
"exclude_worker_ids": [
|
||||
null,
|
||||
null,
|
||||
null
|
||||
]
|
||||
},
|
||||
"ok": false,
|
||||
"fills": [],
|
||||
"turns": 0,
|
||||
"duration_secs": 47.789,
|
||||
"error": "no consensus after 14 turns",
|
||||
"gap_signals": [
|
||||
"drift_or_tool: no consensus after 14 turns"
|
||||
]
|
||||
}
|
||||
]
|
||||
@ -1,26 +0,0 @@
|
||||
[
|
||||
{
|
||||
"name": "Christopher Y. Phillips",
|
||||
"booked_for": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "no_show"
|
||||
},
|
||||
{
|
||||
"name": "Fatima U. Rivera",
|
||||
"booked_for": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
},
|
||||
{
|
||||
"name": "Janet E. Hill",
|
||||
"booked_for": "08:00",
|
||||
"role": "Warehouse Associate",
|
||||
"city": "Toledo",
|
||||
"state": "OH",
|
||||
"status": "confirmed"
|
||||
}
|
||||
]
|
||||
@ -1,16 +0,0 @@
|
||||
# SMS drafts — Riverfront Steel, 2026-04-21
|
||||
|
||||
## 08:00 baseline_fill — Warehouse Associate x3 in Toledo, OH
|
||||
|
||||
TO: Christopher Y. Phillips
|
||||
Confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting 08:00 AM today.
|
||||
|
||||
---
|
||||
|
||||
TO: Fatima U. Rivera
|
||||
Your shift as a Warehouse Associate at Riverfront Steel is confirmed for 08:00 AM today.
|
||||
|
||||
---
|
||||
|
||||
TO: Janet E. Hill
|
||||
Confirming your 08:00 AM shift as a Warehouse Associate at Riverfront Steel in Toledo, OH.
|
||||
@ -1,2 +0,0 @@
|
||||
{"after_event":"12:15","event_kind":"expansion","ok":true,"model":"gpt-oss:20b","duration_secs":10.901,"risk":"JSON parse error","hint":"Validate JSON structure, close braces, escape quotes, and test with a JSON linter before executing hybrid_search."}
|
||||
{"after_event":"15:45","event_kind":"misplacement","ok":true,"model":"gpt-oss:20b","duration_secs":11.83,"risk":"JSON parsing failure in tool call","hint":"Ensure JSON syntax is correct before invoking hybrid_search for Warehouse Associate in Toledo, OH. Validate tool call structure."}
|
||||
@ -1 +0,0 @@
|
||||
# Client emails — Riverfront Steel, 2026-04-21
|
||||
@ -1,6 +0,0 @@
|
||||
# Cross-day lesson — Riverfront Steel, 2026-04-21
|
||||
|
||||
_Generated by `gpt-oss:20b` in 4.0s. Based on 5 events + 2 mid-day checkpoints._
|
||||
|
||||
**
|
||||
Always validate the JSON payload before calling `hybrid_search`. Ensure all braces are closed, quotes are escaped, and the structure matches the expected schema—use a linter or schema validator in a sandbox first. Construct the JSON programmatically or via a template rather than embedding raw text in the tool call. This prevents parse errors that cause job failures.
|
||||
@ -1,58 +0,0 @@
|
||||
# Scenario retrospective — Riverfront Steel, 2026-04-21
|
||||
|
||||
Executor: `mistral:latest` Reviewer: `qwen2.5:latest` Draft: `qwen2.5:latest` Overview(T3): `gpt-oss:20b`
|
||||
|
||||
## Events
|
||||
|
||||
| At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 33.1 | 0 | 1 |
|
||||
| 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 35.1 | 0 | 1 |
|
||||
| 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 55.3 | 0 | 1 |
|
||||
| 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 14.7 | 0 | 1 |
|
||||
| 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 28.8 | 0 | 1 |
|
||||
|
||||
## Final roster
|
||||
|
||||
| Worker | Booked | Role | City, ST | Status |
|
||||
|---|---|---|---|---|
|
||||
|
||||
## Gap signals
|
||||
|
||||
### drift_or_tool
|
||||
- **08:00** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["TOOL_CALL",{"tool":"sql","args":{"query":"SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_id = '49164'"}},"TOOL_CALL",{"tool":"hybrid_search","args":{"index_name":"workers_500k_v1","sql_filter":"CAST(availability AS DOUBLE) > 0.5 AND role = 'Warehous
|
||||
- **10:30** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})","TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state FROM workers_500k WHERE worker_i
|
||||
- **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
|
||||
"args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 AND NOT worker_id IN (W500K-22375, W500K-19588, W500K-28024,
|
||||
- **14:00** — aborted — 3 consecutive drift flags
|
||||
- **15:45** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
|
||||
"args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (CANDIDATES SURFACED SO FAR)",
|
||||
"rationale":"Find a reliable Warehouse Associa
|
||||
|
||||
### write_through_audit
|
||||
- _post-run_ — playbook_memory has 1163 entries (ran 5 events, expected ≥ 0 new entries from this run)
|
||||
|
||||
## Workers touched across the week
|
||||
|
||||
0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
|
||||
|
||||
| Worker ID | Name | Events | Outcome |
|
||||
|---|---|---|---|
|
||||
|
||||
## Discovered patterns (meta-index)
|
||||
|
||||
What the system identified across semantically-similar past fills as each event ran:
|
||||
|
||||
- **08:00 baseline_fill** (Warehouse Associate): —
|
||||
- **10:30 recurring** (Machine Operator): —
|
||||
- **12:15 expansion** (Forklift Operator): —
|
||||
- **14:00 emergency** (Loader): —
|
||||
- **15:45 misplacement** (Warehouse Associate): —
|
||||
|
||||
## Narrative
|
||||
|
||||
- 0/5 events reached consensus.
|
||||
- Final roster: 0 bookings across 0 distinct workers.
|
||||
- Workers touched (booked, failed, or otherwise decided): 0.
|
||||
- Playbook citations across the day: 0 (proof the feedback loop fired across events).
|
||||
- Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
|
||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user