| Path | Owns |
|---|---|
| crates/shared/ | Types, errors, Arrow helpers, schema fingerprints, PII detection, secrets provider. Every other crate depends on this. |
| crates/storaged/ | Raw object I/O. BucketRegistry (multi-bucket, rescue-aware), AppendLog (write-once batched append), ErrorJournal (bucket op failures). ADR-017 (federation), ADR-018 (append pattern). |
| crates/catalogd/ | Metadata authority. Dataset manifests, schema fingerprints (ADR-020), tombstones (soft delete), AI-safe views, model profiles (Phase 17). In-memory index persisted as Parquet on storage. |
| crates/queryd/ | SQL engine. DataFusion over Parquet + MemTable cache + delta merge-on-read + compaction. Registers every bucket as an object_store so SQL can join across them. |
| crates/ingestd/ | Data on-ramp. CSV / JSON / PDF (+OCR via Tesseract) / Postgres streaming / MySQL streaming / inbox watcher / cron schedules. Every ingest path auto-tags PII (emails, phones, SSNs, addresses), records lineage, and marks embeddings stale. |
| crates/vectord/ | The vector + learning surface. Embeddings stored as Parquet (ADR-008), HNSW index (Phase 15), trial system (autotune), promotion registry (Phase 16), playbook_memory (Phase 19). Core feedback loop lives here. |
| crates/vectord-lance/ | Firewall crate. Lance 4.0 + Arrow 57, isolated from the main Arrow-55 workspace. Provides secondary vector backend for large-scale, random-access, and append-heavy workloads (ADR-019). |
| crates/journald/ | Append-only mutation event log (ADR-012). Every insert/update/delete writes here — who, when, what, old/new value. Never mutated. Foundation for time-travel + compliance audit. |
| crates/truth/ | File-backed rule store. evaluate(task_class, ctx) → Vec<RuleOutcome> (ADR-021 — semantic-correctness matrix layer). Loaded from truth/*.toml at gateway boot. |
| crates/aibridge/ | Rust ↔ Python sidecar + provider adapter trait. HTTP client over FastAPI wrapper around Ollama for local; ProviderAdapter dispatch for cloud (ollama_cloud, openrouter, opencode, kimi). VRAM introspection via nvidia-smi. All LLM calls flow through here. |
| crates/gateway/ | Axum HTTP (:3100) + gRPC (:3101). OpenAI-compat /v1/* (drop-in middleware), mode runner (/v1/mode/execute), validator (/v1/validate), iterate loop (/v1/iterate), tools registry, cost telemetry, Langfuse + observer fan-out on every chat. Every external request enters here. |
| crates/validator/ | Phase 43 production validator. Schema / completeness / consistency / policy gates over LLM outputs. FillValidator, EmailValidator, ParquetWorkerLookup (loads workers_500k.parquet at boot). Fail-closed when roster absent. |
| crates/ui/ | Dioxus WASM developer UI. Internal tool. Not exposed externally. |
| mcp-server/ | Bun/TypeScript public-facing app + MCP tool surface. Serves devop.live/lakehouse. Pages: dashboard / console / profiler / contractor / proof / spec / onboard / alerts / workspaces. Routes: /search /match /log /log_failure /clients/:c/blacklist /intelligence/* /staffers /memory/query /models/matrix /system/summary. Observer sibling at observer.ts on :3800 for event ingest. |
| auditor/ | External claim-vs-diff verifier on PRs. Polls Gitea for open PRs, builds adversarial prompt from PRD invariants + staffing matrix, alternates Kimi K2.6 ↔ Haiku 4.5 by SHA, auto-promotes Claude Opus 4.7 on diffs >100k chars. Per-PR cap=3 with auto-reset on each new head SHA. Verdicts at data/_auditor/kimi_verdicts/. |
| tests/multi-agent/ | Multi-agent scenario harness + memory stack. agent.ts, scenario.ts (contracts + staffer + tool_level), kb.ts (KB indexing, competence scoring), normalize.ts, memory_query.ts, run_e2e_rated.ts. Unit tests colocated. |
| scripts/distillation/ | Distillation substrate v1.0.0 (frozen at tag distillation-v1.0.0 / commit e7636f2). 145 unit tests, 22/22 acceptance, 16/16 audit-full, bit-identical reproducibility. Multi-layer contamination firewall on SFT exports. |
| config/ | modes.toml — task_class → mode/model router (scrum_review, contract_analysis, staffing_inference, pr_audit, doc_drift_check, fact_extract). providers.toml — 5 active providers (ollama, ollama_cloud, openrouter, opencode 40-model, kimi direct). routing.toml — cost gates per task class. |
| docs/ | PRD.md, PHASES.md, DECISIONS.md (21 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why. |
| data/ | Default local object store. Parquet datasets, append-log batches, HNSW trial journals, promotion registries, _playbook_memory/state.json, _pathway_memory/state.json (88 traces, 11/11 successful replays, ADR-021), catalog manifests. Plus learning-loop directories: _kb/, _playbook_lessons/, _observer/ops.jsonl, _auditor/kimi_verdicts/. Rebuildable from repo + this dir alone. |
POST /ingest/file, (b) inbox watcher (drops in ./inbox/ → auto-ingested in under 15s), (c) Postgres or MySQL streaming connector (POST /ingest/db with DSN), (d) scheduled ingest via ingestd::schedule with cron.String on ambiguity (ADR-010 — better to ingest everything than reject on type mismatch). JSON parser flattens nested objects. PDF extractor uses lopdf first; falls back to Tesseract OCR for scanned/image PDFs. Output is always an Arrow RecordBatch.shared::pii scans column values and names. Identifies emails, phone numbers, SSNs, salaries, street addresses, medical terms. Tags columns with sensitivity: PII | PHI | Financial | Internal | Public (Phase 10 catalog v2).arrow_helpers::record_batch_to_parquet → storaged::ops::put → file lands under data/datasets/<name>.parquet (or bucket-scoped via BucketRegistry). Schema fingerprint computed.catalogd::Registry::register(name, fingerprint, objects) — idempotent on (name, fingerprint). Same name + same fingerprint = reuse manifest, bump updated_at. Same name + different fingerprint = 409 Conflict (ADR-020 — prevents silent schema drift). New name = create new manifest with owner, lineage, freshness SLA, column metadata, PII tags.Registry::mark_embeddings_stale flips a flag; POST /vectors/refresh/<dataset> runs an incremental re-embed (only new rows, not the whole corpus).queryd::context picks up the new manifest on next query. Hot-cache warms on first hit. Delta merge-on-read means updates land without rewriting the base Parquet.Two backends, chosen per profile (ADR-019):
| HNSW over Parquet (primary) | Lance (secondary) | |
|---|---|---|
| Storage | Embeddings as Parquet columns (doc_id, chunk_text, vector) | Lance native dataset |
| Index | HNSW in RAM, serialized sidecar | IVF_PQ on disk |
| Build time (100K × 768d) | ~230s | ~16s (14× faster) |
| Search p50 (100K) | ~873μs | ~7.4ms at recall 1.0 |
| Append | Rewrite required | Structural (0.08s for 100 rows) |
| Random fetch by doc_id | Full scan | ~311μs (112× faster) |
| RAM ceiling | ~5M vectors | Scales past RAM — disk-resident |
The vectord::agent background task runs continuously. Per index, it proposes HNSW configurations (ef_construction × ef_search), executes a trial against a stored eval set, journals the result as JSONL, and — if recall beats the min_recall gate (0.9) and latency wins the Pareto test — promotes the new config atomically via promotion_registry. No downtime. Rollback in milliseconds.
Model profiles (Phase 17) are not routing strings — they are named scopes. Each profile has bound_datasets[], hnsw_config, vector_backend, and bucket. When a staffer activates a profile:
POST /vectors/profile/<id>/search rejects out-of-scope queries with 403 + list of allowed bindingskeep_alive=0; only one model in VRAM at a timeDeclared in config/providers.toml + config/modes.toml. Gateway is an OpenAI-compatible drop-in middleware: any consumer that speaks POST /v1/chat/completions gets routing, audit, cost telemetry, and the full memory substrate behind every call.
| Provider | Reach | Use case |
|---|---|---|
ollama | localhost:3200 — local sidecar over Ollama | Hot-path JSON emitters, embeddings, last-resort rescue |
ollama_cloud | ollama.com bearer key — gpt-oss:120b, qwen3-coder:480b, deepseek-v3.1:671b, kimi-k2:1t, mistral-large-3:675b, qwen3.5:397b | Strong-model reviewer rungs, T3+ overview, scrum master pipeline |
openrouter | openrouter.ai/api/v1 — 343 models incl. Anthropic/Google/OpenAI/MiniMax/Qwen, paid + free tiers | Paid ladder for observer escalations, free-tier rescue |
opencode | opencode.ai/zen/v1 — 40 frontier models reachable through ONE sk-* key: Claude Opus 4.7 / Sonnet / Haiku, GPT-5.5-pro / 5.4 / codex variants, Gemini 3.1-pro, Kimi K2.6, GLM 5.1, DeepSeek, Qwen 3.6+, MiniMax, plus 4 free-tier | Cross-architecture tie-breakers, auditor cross-lineage (Haiku 4.5 + Opus 4.7), high-context reasoning (Opus on diffs >100k chars) |
kimi | api.kimi.com/coding/v1 — direct Kimi For Coding | kimi_architect when ollama_cloud rate-limits; TOS-clean primary path |
Defined in tests/real-world/scrum_master_pipeline.ts as const LADDER. Each attempt is evaluated by isAcceptable() = chars ≥ 3800 ∧ not malformed JSON-only. On reject, the next rung sees a learning preamble carrying the prior rejection reason.
1 ollama_cloud / kimi-k2:1t 1T params · flagship 2 ollama_cloud / qwen3-coder:480b coding specialist 3 ollama_cloud / deepseek-v3.1:671b reasoning 4 ollama_cloud / mistral-large-3:675b deep analysis 5 ollama_cloud / gpt-oss:120b reliable workhorse 6 ollama_cloud / qwen3.5:397b dense final thinker 7 openrouter / openai/gpt-oss-120b:free rescue tier 8 openrouter / google/gemma-3-27b-it:free fastest rescue 9 ollama / qwen3.5:latest last-resort local
Every audit and every consensus-required call fires the primary reviewer N=3 times in parallel (Promise.all — wall-clock = single call). Aggregate votes per claim_idx, majority wins. On a 1-1-1 split, a tie-breaker model with different architecture (qwen3-coder:480b vs primary gpt-oss/kimi) is invoked. Every disagreement, even when majority resolves, writes to data/_kb/audit_discrepancies.jsonl. Closes the cloud-non-determinism gap: temp=0 isn't actually deterministic in practice across hours; consensus + cross-architecture tie-break stabilizes verdicts.
Every push to PR #11 triggers auditor/audit.ts within ~90s. To prevent a single model lineage's blind spots from becoming the system's blind spots, audits alternate between Kimi K2.6 (Moonshot lineage) and Haiku 4.5 (Anthropic lineage) by head SHA. Diffs over 100k chars auto-promote to Claude Opus 4.7 (Anthropic frontier). Per-PR cap of 3 audits with auto-reset on each new head SHA prevents infinite-loop spend. Latest verdict on c3c9c21: Haiku 4.5, 24.6s, 100% grounding-verified across 10 findings.
The substrate the auditor and mode runner sit on is tagged at distillation-v1.0.0 / commit e7636f2. 145 unit tests pass · 22/22 acceptance invariants · 16/16 audit-full checks · bit-identical reproducibility verified. The distillation phase exports clean SFT / RAG / preference samples with a multi-layer contamination firewall (SFT_NEVER constant + scorer category mapping + acceptance fixtures); the auditor consumes the substrate. The frozen tag means: any future "the system regressed" question has a baseline to bisect against, byte-for-byte.
generateContinuable() handles output-overflow without max_tokens tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. generateTreeSplit() handles input-overflow via map-reduce with running scratchpad. Both respect assertContextBudget() so silent truncation can't happen. Now Rust-native in crates/aibridge/src/continuation.rs (Phase 44).
Scenarios can be scoped to a specific coordinator (staffer: {id, name, tenure_months, role, tool_level}). tool_level controls which tiers are available:
full — T1/T2 local, T3 cloud, cloud rescue on failurelocal — T1/T2/T3 all local (gpt-oss:20b as overseer)basic — kimi-k2.5 cloud executor + local reviewer + local T3, no rescueminimal — kimi-k2.5 cloud executor, no T3, no rescue. Playbook inheritance is the only signal.Measured 2026-04-21 on a 36-run demo (4 staffers × 3 contracts × 3 rounds): James Park (mid, local tools) ranked first at 92.9% fill and 36.8 cites/run, Maria Chen (senior, full tools) second at 81.0%. Cloud T3 adds latency without measurable benefit on this workload. Alex Rivera (trainee, minimal) still hit 59.5% fill purely from playbook inheritance — proof the memory carries knowledge when the model is capable.
The concrete example running on devop.live/lakehouse is Chicago Department of Buildings permit data (public Socrata API). Every permit is a signal that construction — and therefore staffing — is coming.
/intelligence/market and /intelligence/permit_contracts hit data.cityofchicago.org/resource/ydr8-5enu.json live. No caching of permit data — every page load is fresh.overdue, urgent, soon, scheduled.POST /vectors/hybrid with sql_filter on role+state+city+availability, use_playbook_memory: true, playbook_memory_k: 200. Returns top-5 candidates with boost + citations.POST /vectors/playbook_memory/patterns aggregates traits across similar past playbooks — recurring certs, skills, archetype, reliability distribution. Surfaces signal the operator didn't query for./intelligence/staffing_forecast aggregates the last 30 days of permits into predicted role-level demand, joins against the IL bench supply, computes coverage %, and classifies each role as critical / tight / watch / ok. The dashboard's top panel renders this — staffers see supply gaps before they query.
| Capability | CRM | This system |
|---|---|---|
| Store candidate records | Yes | Yes (workers_500k, candidates) |
| Search by structured field | Yes | Yes (DataFusion SQL, sub-100ms on 3M rows) |
| Search by semantic meaning | No | Yes (HNSW + nomic-embed-text) |
| Combine SQL filter + semantic rank | No | Yes (/vectors/hybrid) |
| Boost workers based on past success | No | Yes (Phase 19 playbook_memory) |
| Penalize workers based on past failure | No | Yes (/log_failure + 0.5n penalty) |
| Surface traits across past fills | No | Yes (/vectors/playbook_memory/patterns) |
| Per-staffer relevance gradient | No | Yes — same query reshapes per coordinator (staffer_id on /intelligence/chat); MARIA'S MEMORY pill labels the playbook context with the active coordinator |
| Triage in one shot — late-worker → backfills + draft SMS | No | Yes (/intelligence/chat Route 6 — pulls profile + 5 same-role same-geo backfills sorted by responsiveness + drafts client SMS in ~250ms) |
| Permit → fill plan derivation (forward demand) | No | Yes (/intelligence/permit_contracts — Chicago Socrata permit → role / headcount / deadline / fill probability / gross revenue per card) |
| Public-issuer attribution across contractor graph | No | Yes (/intelligence/profiler_index — direct + parent + co-permit associated tickers; live Stooq prices) |
| Cross-lineage AI audit on every PR | No | Yes (auditor crate — Kimi K2.6 ↔ Haiku 4.5 alternation + Opus 4.7 auto-promote on big diffs) |
| Pathway memory — system-level hot-swap by task fingerprint | No | Yes (88 traces, 11/11 successful replays, 100% reuse rate, ADR-021) |
| Predict staffing demand from external data | No | Yes (Chicago permit feed + 30-day rolling forecast) |
| Count down to staffing deadline per contract | No | Yes (permit issue_date + heuristic timeline) |
| Explain why each candidate ranked | No | Yes (boost chip + narrative citations + memory pattern) |
| Improve ranking from operator actions | No | Yes (every Call/SMS/No-show click → re-rank signal) |
Every sealed fill is seeded to playbook_memory. The boost fires inside /vectors/hybrid when use_playbook_memory: true. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:
per_worker = cosine(query_emb, playbook_emb) × 0.5 × e^(-age/30) × 0.5^failures / n_workers boost[(city, state, name)] = min(Σ per_worker, 0.25)
Multi-strategy retrieval (new): before cosine, compute_boost_for_filtered_with_role(target_geo, target_role) prefilters to same-city playbooks, then gives exact (role, city, state) matches similarity=1.0 and fills up to half the top-k. Cosine fills the rest. Mirrors 2026 Mem0/Zep guidance on parallel-strategy rerank.
Measured lift: before geo-filter, Nashville Welder query returned boosts=170 matched=0 (zero intersection with candidate pool). After: boosts=36 matched=11. On the Riverfront Steel scenario, total playbook citations went from 2 → 28 per run — a 14× delta on identical inputs. The diagnostic log playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=? runs on every hybrid call so the class of silent-miss bug stays visible.
/vectors/playbook_memory/patterns goes beyond "who was endorsed" to answer "what did past similar fills have in common?" Aggregates recurring certifications, skills, archetype, reliability distribution across the top-K semantically similar playbooks. Surfaces signal the operator didn't explicitly query for.
The vectord::agent background task runs continuously. Watches the HNSW trial journal, proposes configs, executes trials, promotes Pareto winners — without human intervention. Operator sees "the index got faster overnight" and doesn't know why. The journal knows why.
Meta-layer over playbook_memory. Files under data/_kb/:
signatures.jsonl — sig_hash + embedding of every run's event shapeoutcomes.jsonl — per-run summary (models, fill rate, turns, citations, rescue stats, staffer, elapsed)pathway_recommendations.jsonl — AI-synthesized advice for the next run of a similar sigerror_corrections.jsonl — fail→succeed deltas on the same sigconfig_snapshots.jsonl — hash of active models + tool_level at each runCycle: scenario ends → kb.indexRun() appends outcome → kb.recommendFor(nextSpec) finds k-NN signatures, feeds outcome history to an overview model, writes structured JSON advice → next scenario reads it via kb.loadRecommendation(spec) and injects pathway_notes into the executor's context alongside prior T3 lessons.
When an event fails (drift abort, JSON parse, pool exhaustion) and cloud T3 is enabled, requestCloudRemediation() feeds the full failure trace — SQL filters attempted, row counts, reviewer drift notes, gap signals, contract terms — to gpt-oss:120b on Ollama Cloud. Cloud returns structured {retry, new_city, new_state, new_role, new_count, rationale}. Event retries once with the pivot. Verified on stress_01: Gary IN (zero workers indexed) misplacement → cloud proposed South Bend IN → retry filled 1/1.
Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries staffer: {id, name, tenure_months, role, tool_level}. After every run, recomputeStafferStats(staffer_id) aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single competence_score (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).
findNeighbors returns weighted_score = cosine × max_staffer_competence — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.
Memory at the system layer, not the worker layer. Every accepted scrum review writes a PathwayTrace with the full backtrack: file fingerprint, model used, signal class, KB chunks consulted, observer events, semantic flags (UnitMismatch, TypeConfusion, OffByOne, StaleReference, DeadCode, BoundaryViolation, …), bug fingerprints. A new query that fingerprints to the same trace can hot-swap to the prior result without re-running the 9-rung escalation. Five-factor hot-swap gate: narrow fingerprint match AND audit consensus pass AND replay_count ≥ 3 (probation) AND success_rate ≥ 0.80 AND NOT retired AND vector cosine ≥ 0.90.
Live state (verified on this load): 88 traces · 11 / 11 successful replays · 100% reuse rate · probation gate crossed. Endpoints: /vectors/pathway/insert · /query · /record_replay · /stats · /bug_fingerprints. Spec: docs/DECISIONS.md ADR-021.
Memory scoped to whoever's acting. /intelligence/chat accepts staffer_id; on match, defaults state filter to staffer territory, scopes playbook-pattern geo to staffer's primary city/state, and surfaces response.staffer.name so the UI relabels MEMORY → MARIA'S MEMORY. Same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha. The corpus stays intact; the relevance gradient is per coordinator; each accumulates fills independently.
Roster: /staffers endpoint reads from STAFFERS in mcp-server/index.ts. Three personas today (Maria/Devon/Aisha); architecture generalizes — every new metro adds territories, not code paths.
Memory at the network layer. Every contractor in the corpus is also a forward indicator on the public equities they touch via three attribution flavors: direct (contractor IS the public issuer — SEC tickers index match), parent (subsidiary of a public parent — curated KNOWN_PARENT_MAP, e.g. Turner → HOC.DE via Hochtief AG), associated (co-permit network — Bob's Electric appears with TARGET CORPORATION 3+ times → inherits TGT). The associated path is the moat: a staffing-permit dataset that maps contractor-to-public-issuer is not commercially available; we synthesize it from the Socrata co-occurrence graph.
BAI (Building Activity Index) = attribution-weighted average day-change across surfaced issuers. Indexed build value = total $ of permits attributable to ANY public issuer in scope. Network depth = issuers / total attribution edges. Cross-metro replication explicit in the architecture — Chicago is Phase 1; NYC DOB / LA County / Houston BCD / Boston ISD / DC DCRA are all Socrata-shaped, ship as config-only adapters.
Observer runs as lakehouse-observer.service, now with an HTTP listener on :3800. Scenarios POST per-event outcomes to /event with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old /ingest/file REPLACE path to an append-only data/_observer/ops.jsonl journal so the trace survives across restarts.
Two surfaces added 2026-04-21 to make the memory stack respond coherently to any input shape:
normalizeInput(raw) — accepts structured JSON, natural language, or mixed. Three-tier: structured fast-path → regex (handles "need 3 welders in Nashville, TN" in 0ms without an LLM call) → qwen3 LLM fallback for low-signal inputs.POST /memory/query — one endpoint, returns every memory surface in a single bundle: playbook_workers (with boost + citations), pathway_recommendation (KB), neighbor_signatures (competence-weighted), prior_lessons (T3 overseer history), top_staffers (leaderboard), discovered_patterns (auto-surfaced reliable workers for this role+city), latency_ms (per-source). Natural-language query end-to-end: 319ms.Three of the five 2026-era memory findings remain unwired. Flagged for near-term implementation, not hidden:
valid_until or schema_fingerprint. Load-bearing: when a schema migration changes a column, stale playbooks silently keep boosting. Biggest-value remaining fix./seed only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Playbook file grows faster than necessary.Validity windows is next — preserves the trust signal (boost only fires on playbooks that are still true given the current schema) rather than the latency signal (which the current scale doesn't need yet).
Axum is async. The gateway handles concurrent requests on Tokio with work-stealing. No per-request thread. Tested at 10 parallel queries in 82ms total on this hardware.
Per-staffer profile isolation. Each staffer activates their own profile (Phase 17) or workspace (Phase 8.5). Profile scopes their search to bound datasets. Workspace carries their in-progress contracts across sessions.
Per-client blacklists. Auto-applied when the caller passes client: "X" on /search. Staffer A filling for Acme never sees Acme's flagged workers. Staffer B filling for MidState sees them normally.
SQL on job_orders is cheap. 300 rows is nothing — a scan is microseconds.
Workspace per contract. Each contract gets its own workspace with saved searches, shortlists, activity log. Zero-copy handoff between staffers (pointer swap, not data copy).
Forecast remains coherent. /intelligence/staffing_forecast aggregates 30-day permit data regardless of contract count. The bench supply query (GROUP BY role over workers_500k) is a single sub-second SQL.
The delta arrives at 12:30. Here's what happens in the following minutes:
mark_embeddings_stale flips the flag.POST /vectors/refresh/workers_500k reads only the new rows (diff against existing embeddings), embeds them in batches of 64 via Ollama, writes delta embedding Parquet. Measured on threat_intel: 34 new rows in 970ms (6× faster than full re-embed).promotion_registry atomically flips the active pointer. Next search hits the new config. Rollback stays available.DatasetAppended trigger and schedules a fresh HNSW trial cycle against the expanded index.vector_backend: lance flip — disk-resident IVF_PQ scales past the RAM line (ADR-019).keep_alive=0). Phase 17 profile activation unloads the prior model on swap.valid_until + schema_fingerprint fields + POST /vectors/playbook_memory/retire endpoint (manual or schema-drift triggered). Active vs retired split surfaced on GET /vectors/playbook_memory/status. Brute-force cosine still sub-ms at current size; Letta-style working-memory hot cache deferred until entry count crosses ~100K.| Failure mode | Surface / response | Recovery |
|---|---|---|
| Ingest receives file with schema mismatch vs existing dataset | 409 Conflict with both fingerprints named (ADR-020) | Re-ingest under a new name, or migrate the existing via Phase 14 schema evolution |
| Bucket unreachable on write | Hard 503, error journaled to primary://_errors/bucket_errors/ | GET /storage/errors lists failures; GET /storage/bucket-health shows per-bucket status |
| Bucket unreachable on read | Rescue bucket fallback, X-Lakehouse-Rescue-Used: true header on response | Response still succeeds; operator sees rescue flag |
| /log receives name that doesn't exist in workers_500k | Seed is SKIPPED; response includes rejected_ghost_names: [...] and a note | Operator sees exactly which names were rejected and why |
| Dual-agent executor malforms tool call | Result appended to log with error field; counter increments | After 3 consecutive: abort with full log dump at tests/multi-agent/playbooks/<id>-FAILED.json |
| Dual-agent drifts from target | Reviewer verdict = drift, counter increments | After 3 consecutive drifts: abort with full log |
| Hybrid search finds zero candidates | Returns empty sources[] + sql_matches: 0 | Gap signal captured by scenario runner; operator prompted to broaden filter |
| Ollama sidecar down | 502 Bad Gateway from aibridge; embed calls fail fast | Restart: systemctl restart lakehouse-sidecar; vector search falls back to pre-computed embeddings |
| Gateway restart mid-operation | In-memory state (playbook_memory, HNSW) reloaded from persisted state.json / trial journals | Zero data loss; catalog, storage, journals are all source-of-truth |
| Schema fingerprint diverges across manifests | catalog::dedupe reports DedupeReport with winner selection (non-null row_count first, then newest updated_at) | POST /catalog/dedupe collapses duplicates idempotently |
| Scenario event fails on zero-supply city | Cloud rescue (Phase 22 item B) fires — gpt-oss:120b sees SQL filters attempted, row counts, reviewer drift notes, contract terms; returns structured {retry, new_city, new_state, new_role, new_count, rationale} | Retry with pivot runs same executor loop with new geography; verified Gary IN → South Bend IN filled 1/1 after original drift-abort |
| LLM response truncated mid-JSON (thinking model ate token budget) | Phase 21 generateContinuable() detects via brace-balance + JSON.parse; no silent truncation | Auto-continue with partial as scratchpad, or geometric backoff if initial call returned empty. Bounded by max_continuations. |
| Schema migration invalidates existing playbooks | Phase 25 — POST /vectors/playbook_memory/retire with current fingerprint retires all mismatched entries in scope; diagnostic log shows counts | Retired entries stay in journal for forensics but are skipped by all boost calculations. Scoped by (city, state) so unrelated geos aren't touched. |
| Observer fails to reach scenario outcome stream | Scenario postObserverEvent() uses 2s AbortSignal.timeout; silent skip if :3800 is down | Scenario log is still the source of truth; observer re-ingest on next run restores the stream. data/_observer/ops.jsonl is append-only so prior events survive. |
Maria runs Chicago. Devon runs Indianapolis. Aisha runs Wisconsin/Michigan. They share one corpus, but search results, recurring-skill patterns, and playbook context all reshape to whoever is acting. /intelligence/chat accepts staffer_id; on match, defaults state filter to the staffer's territory, scopes playbook-pattern geo to their primary city/state, and surfaces response.staffer.name so the UI relabels MEMORY → MARIA'S MEMORY.
Verified end-to-end: same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha (live numbers; refresh the profiler page to recompute). The corpus stays intact; the relevance gradient is per coordinator. As each accumulates fills, their slice of the playbook compounds independently. Roster: /staffers endpoint, declared in STAFFERS in mcp-server/index.ts. Adding a staffer is one append; the architecture is metro-agnostic by construction.
Scopes every search. A staffing-recruiter profile bound to workers_500k sees only that dataset. A security-analyst profile bound to threat_intel cannot see worker data. GET /vectors/profile/<id>/audit records every tool invocation by model identity.
Per-contract state. Each workspace has daily/weekly/monthly tiers, saved searches, shortlists, activity logs. Survives across sessions. Instant zero-copy handoff between staffers — pointer swap, not data copy. Persisted to object storage, rebuilt on startup.
Per-client worker exclusion. Populated via POST /clients/:client/blacklist. Auto-applied when the caller passes client: "X" on /search. JSON-backed; would move to catalog table under real client load.
Phase 12 tool registry logs every governed-action invocation (who called what, with what args, when, outcome). GET /tools/audit queryable. Phase 13 access control layers on top — role-based field masking, query audit log.
Workspace activity log + per-staffer filter on the event journal gives "what did Sarah do today" as a direct query. The foundation for shift-handoff reports.
Each scenario run carries an explicit staffer: {id, name, tenure_months, role, tool_level}. The KB aggregates per-staffer stats (data/_kb/staffers.jsonl) that roll up into a single competence_score:
competence_score = 0.45·fill_rate + 0.20·turn_efficiency + 0.20·citation_density + 0.15·rescue_rate
When any query runs kb.findNeighbors(spec, k), the ranking isn't just cosine similarity — it's weighted_score = cosine × max_staffer_competence over the best coordinator who ran that signature. Senior staffers' playbooks surface above juniors' on similar scenarios, even when the juniors' scenario was marginally closer in embedding space.
The tool_level knob (full / local / basic / minimal) controls which tiers are available to a given staffer's runs. See Ch 3 for the mapping. Variance is real and measurable: 36-run demo produced a 46pt fill-rate delta between James (local tools, 93%) and Alex (minimal tools, 60%) on identical contracts.
A second-order output of the competence path: when multiple staffers independently endorse the same worker on similar-signature playbooks, that worker accumulates cross-staffer endorsements. scripts/kb_staffer_report.py surfaces them — after 36 runs, Rachel D. Lewis (Welder Nashville) had 18 endorsements across 4 staffers, Angela U. Ward (Machine Op Indianapolis) 19. These are high-confidence "reliable" labels the system produced without human tagging. The UI could badge these workers on future queries; today they're visible via /memory/query's discovered_patterns bundle.
/log fires → playbook_memory.seed → persist_sql → successful_playbooks_live grows by one. Button flashes "Logged" for 1.4s. No modal, no form, no second click./log_failure on Dave records the penalty for the next similar query.tests/multi-agent/scenario.ts gets report.md auto-generated. Workspace activity logs aggregate per staffer. GET /vectors/playbook_memory/status shows active vs retired counts. KB indexes the run (kb.indexRun) and the overview model synthesizes a pathway recommendation for the next matching signature. Every event outcome has already streamed to lakehouse-observer.service on :3800 for ERROR_ANALYZER + PLAYBOOK_BUILDER consumption.POST /memory/query on the Bun MCP. Regex normalizer extracts role / city / count / deadline / intent in 0ms (no LLM call). The unified response returns playbook workers (auto-surfaced reliable performers for Joliet Forklift with citation counts), pathway recommendation from KB, prior T3 lessons for Joliet, and top staffers by competence — all in ~300ms.detectErrorCorrections scans today's outcomes for fail→succeed deltas on the same signature; any correction gets logged to data/_kb/error_corrections.jsonl with the config diff. Tomorrow morning, the system is measurably better at something it got asked about today.After each sealed fill (via scenario.ts or manual /log flow with downstream hooks), generateArtifacts in the scenario runner produces: (a) one SMS per worker (TO: Name, message under 180 chars), (b) one client confirmation email. Drafts are saved to sms.md and emails.md under the scenario output dir. Ollama drafts them; the staffer reviews and sends. No auto-send; human-in-the-loop.
data/_kb/audit_baselines.jsonl append pattern); just hasn't run long enough.pay_rate to workers, bill_rate to contracts, and a filter + warning path. Partially addressed via ContractTerms.budget_per_hour_max passed to T3/rescue prompts, but the match-time filter isn't wired yet./seed only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Cheap to add, moderate payoff.