From d571d62e9b1d554b41e815650feef322f799e7f4 Mon Sep 17 00:00:00 2001 From: root Date: Mon, 27 Apr 2026 23:22:07 -0500 Subject: [PATCH] =?UTF-8?q?demo:=20spec=20=E2=80=94=20refresh=20repo=20lay?= =?UTF-8?q?out=20+=20model=20fleet=20+=20per-staffer=20+=20paths=208-9?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit J: "how about devop.live/lakehouse/spec." Spec was anchored on 2026-04-21 state (v2 footer): mistral mentioned in the model matrix, 13 crates not 15, missing validator/truth/auditor crates, no mention of OpenCode 40-model fleet, no pathway memory, no per-staffer hot-swap, no Construction Activity Signal Engine, ADR count was 20. Footer claimed Phases 19-25. Edits, in order: Ch1 Repository layout + crates/truth/ (ADR-021 rule store) + crates/validator/ (Phase 43 — schema/completeness/policy gates) + auditor/ (cross-lineage Kimi↔Haiku/Opus auto-promote) + scripts/distillation/ (frozen substrate v1.0.0 at e7636f2) Updated aibridge to mention ProviderAdapter dispatch Updated gateway to mention OpenAI-compat /v1/* drop-in middleware Updated mcp-server route list to include /staffers + profiler/contractor pages Updated config/ to mention modes.toml + providers.toml + routing.toml Updated docs/ ADR count from 20 → 21 Updated data/ to mention _pathway_memory + _auditor/kimi_verdicts Ch3 Measurement & indexing REPLACED stale "Model matrix (Phase 20)" T1-T5 table that mentioned mistral with the current 5-provider fleet: ollama / ollama_cloud / openrouter / opencode (40 models, one sk-* key reaches Claude Opus 4.7, GPT-5.5-pro, Gemini 3.1-pro, Kimi K2.6, GLM, DeepSeek, Qwen, MiniMax, free) / kimi ADDED 9-rung cloud-first ladder pseudocode ADDED N=3 consensus + cross-architecture tie-breaker math ADDED auditor cross-lineage Kimi K2.6 ↔ Haiku 4.5 + Opus auto-promote ADDED distillation v1.0.0 freeze paragraph (145 tests, 22/22, 16/16) Updated Continuation primitive to mention Phase 44 Rust port Ch5 What a CRM can't do Extended the table with 6 new capabilities: - Per-staffer relevance gradient - Triage in one shot (late-worker → backfills + draft SMS) - Permit → fill plan derivation - Public-issuer attribution across contractor graph - Cross-lineage AI audit on every PR - Pathway memory (system-level hot-swap, ADR-021) Ch6 How it gets better over time Lede updated from 7 paths → 10 paths NEW Path 7 — Pathway memory (ADR-021) NEW Path 8 — Per-staffer hot-swap index NEW Path 9 — Construction Activity Signal Engine Original Path 7 (observer ingest) renumbered to Path 10 Ch9 Per-staffer context Lede now anchors Path 8 from Ch6 NEW lead section: Per-staffer hot-swap index — Maria/Devon/Aisha, same query reshapes per coordinator (167 IL / 89 IN / 16 WI), MARIA'S MEMORY pill, /staffers endpoint, metro-agnostic by construction. The original Phase 17 profile / Phase 23 competence sections retained beneath as the deeper architecture detail. Ch10 A day in the life Updated 14:00 emergency event to use the late-worker triage handler — coordinator types "Dave running late site 4422", gets profile + draft SMS + 5 backfills + Copy SMS button in 250ms. The old Click No-show button → /log_failure flow remains valid (penalty still records); the user-facing surface is the new triage card. Ch11 Known limits REPLACED the Mem0/Letta/Phase-26 era list with current honest limits: BAI persistence + backtesting, NYC DOB adapter, 12 awaiting public-data sources for contractor profile, rate/margin awareness, Mem0-style UPDATE/DELETE, Letta hot cache (now 5K not 1.9K), confidence calibration, SEC fuzzy precision, tighter pathway+scrum integration. Footer v2 2026-04-21 → v3 2026-04-27 Phases 19-25 → 19-45 Lists today's phases: distillation v1.0.0 substrate, gateway as OpenAI-compat drop-in, mode runner, validator + iterate, ADR-021 pathway memory, per-staffer hot-swap, Construction Activity Signal Engine. Nav + Profiler link Date pill v1 · 2026-04-20 → v3 · 2026-04-27 Verified end-to-end on devop.live/lakehouse/spec — 11 chapter h2s render in order, 67KB page (was 50KB-ish), all internal links resolve. --- mcp-server/spec.html | 108 +++++++++++++++++++++++++++++++------------ 1 file changed, 79 insertions(+), 29 deletions(-) diff --git a/mcp-server/spec.html b/mcp-server/spec.html index c54646c..4b3394b 100644 --- a/mcp-server/spec.html +++ b/mcp-server/spec.html @@ -78,13 +78,14 @@ table.plain tr:hover td{background:#0d1117} -
v1 · 2026-04-20
+
v3 · 2026-04-27
@@ -120,14 +121,18 @@ table.plain tr:hover td{background:#0d1117} crates/vectord/The vector + learning surface. Embeddings stored as Parquet (ADR-008), HNSW index (Phase 15), trial system (autotune), promotion registry (Phase 16), playbook_memory (Phase 19). Core feedback loop lives here. crates/vectord-lance/Firewall crate. Lance 4.0 + Arrow 57, isolated from the main Arrow-55 workspace. Provides secondary vector backend for large-scale, random-access, and append-heavy workloads (ADR-019). crates/journald/Append-only mutation event log (ADR-012). Every insert/update/delete writes here — who, when, what, old/new value. Never mutated. Foundation for time-travel + compliance audit. -crates/aibridge/Rust ↔ Python sidecar. HTTP client over FastAPI wrapper around Ollama. VRAM introspection via nvidia-smi. All LLM calls (embed, generate, rerank) flow through here. -crates/gateway/Axum HTTP (:3100) + gRPC (:3101). Auth middleware, tools registry (Phase 12 — governed actions), CORS. Every external request enters here. +crates/truth/File-backed rule store. evaluate(task_class, ctx) → Vec<RuleOutcome> (ADR-021 — semantic-correctness matrix layer). Loaded from truth/*.toml at gateway boot. +crates/aibridge/Rust ↔ Python sidecar + provider adapter trait. HTTP client over FastAPI wrapper around Ollama for local; ProviderAdapter dispatch for cloud (ollama_cloud, openrouter, opencode, kimi). VRAM introspection via nvidia-smi. All LLM calls flow through here. +crates/gateway/Axum HTTP (:3100) + gRPC (:3101). OpenAI-compat /v1/* (drop-in middleware), mode runner (/v1/mode/execute), validator (/v1/validate), iterate loop (/v1/iterate), tools registry, cost telemetry, Langfuse + observer fan-out on every chat. Every external request enters here. +crates/validator/Phase 43 production validator. Schema / completeness / consistency / policy gates over LLM outputs. FillValidator, EmailValidator, ParquetWorkerLookup (loads workers_500k.parquet at boot). Fail-closed when roster absent. crates/ui/Dioxus WASM developer UI. Internal tool. Not exposed externally. -mcp-server/Bun/TypeScript recruiter-facing app. Serves devop.live/lakehouse. Routes: /search /match /log /log_failure /clients/:c/blacklist /intelligence/* /memory/query /models/matrix /system/summary. Observer sibling at observer.ts with HTTP listener on :3800 for scenario event ingest. Proxies to the Rust gateway for heavy work. -tests/multi-agent/Dual-agent scenario harness + memory stack. agent.ts (prompts, continuation + tree-split primitives, cloud routing), orchestrator.ts, scenario.ts (contracts + staffer + tool_level), kb.ts (KB indexing, competence scoring, neighbor retrieval), normalize.ts (input normalizer — structured / regex / LLM), memory_query.ts (unified /memory/query), gen_scenarios.ts + gen_staffer_demo.ts (corpus generators), run_e2e_rated.ts, chain_of_custody.ts. Unit tests colocated (kb.test.ts, normalize.test.ts). -config/models.json — authoritative 5-tier model matrix (T1 hot local / T2 review local / T3 overview cloud / T4 strategic / T5 gatekeeper). Per-tier context_window + context_budget + overflow_policy. Read at runtime by scenario.ts; hot-swap friendly. -docs/PRD.md, PHASES.md, DECISIONS.md (20 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why. -data/Default local object store. Parquet files per dataset, append-log batches, HNSW trial journals, promotion registries, _playbook_memory/state.json (now with retirement fields — Phase 25), catalog manifests. Plus four learning-loop directories: _kb/ (signatures, outcomes, recommendations, error_corrections, config_snapshots, staffers), _playbook_lessons/ (T3 cross-day lessons archived per run), _observer/ops.jsonl (append journal, durable scenario outcome stream), _chunk_cache/ (spec'd for Phase 21 Rust port). Rebuildable from repo + this dir alone. +mcp-server/Bun/TypeScript public-facing app + MCP tool surface. Serves devop.live/lakehouse. Pages: dashboard / console / profiler / contractor / proof / spec / onboard / alerts / workspaces. Routes: /search /match /log /log_failure /clients/:c/blacklist /intelligence/* /staffers /memory/query /models/matrix /system/summary. Observer sibling at observer.ts on :3800 for event ingest. +auditor/External claim-vs-diff verifier on PRs. Polls Gitea for open PRs, builds adversarial prompt from PRD invariants + staffing matrix, alternates Kimi K2.6 ↔ Haiku 4.5 by SHA, auto-promotes Claude Opus 4.7 on diffs >100k chars. Per-PR cap=3 with auto-reset on each new head SHA. Verdicts at data/_auditor/kimi_verdicts/. +tests/multi-agent/Multi-agent scenario harness + memory stack. agent.ts, scenario.ts (contracts + staffer + tool_level), kb.ts (KB indexing, competence scoring), normalize.ts, memory_query.ts, run_e2e_rated.ts. Unit tests colocated. +scripts/distillation/Distillation substrate v1.0.0 (frozen at tag distillation-v1.0.0 / commit e7636f2). 145 unit tests, 22/22 acceptance, 16/16 audit-full, bit-identical reproducibility. Multi-layer contamination firewall on SFT exports. +config/modes.toml — task_class → mode/model router (scrum_review, contract_analysis, staffing_inference, pr_audit, doc_drift_check, fact_extract). providers.toml — 5 active providers (ollama, ollama_cloud, openrouter, opencode 40-model, kimi direct). routing.toml — cost gates per task class. +docs/PRD.md, PHASES.md, DECISIONS.md (21 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why. +data/Default local object store. Parquet datasets, append-log batches, HNSW trial journals, promotion registries, _playbook_memory/state.json, _pathway_memory/state.json (88 traces, 11/11 successful replays, ADR-021), catalog manifests. Plus learning-loop directories: _kb/, _playbook_lessons/, _observer/ops.jsonl, _auditor/kimi_verdicts/. Rebuildable from repo + this dir alone.
@@ -199,20 +204,42 @@ table.plain tr:hover td{background:#0d1117}
  • Ollama swaps to the profile's model via keep_alive=0; only one model in VRAM at a time
  • -

    Model matrix (Phase 20)

    -

    Five tiers declared in config/models.json. Each call site picks the tier appropriate to its purpose — hot-path JSON emitters get fast local, overview/strategic/gatekeeper decisions get thinking models on cloud. Every tier carries context_window, context_budget, and overflow_policy.

    +

    Provider fleet — 5 active, 40+ frontier models reachable

    +

    Declared in config/providers.toml + config/modes.toml. Gateway is an OpenAI-compatible drop-in middleware: any consumer that speaks POST /v1/chat/completions gets routing, audit, cost telemetry, and the full memory substrate behind every call.

    - + - - - - - + + + + +
    TierPurposePrimary modelFrequency
    ProviderReachUse case
    T1 hotPer tool call — SQL gen, hybrid_search, propose_doneqwen3.5:latest local, think:false50-200/scenario
    T2 reviewPer-step consensus, drift flaggingqwen3:latest local, think:false5-14/event
    T3 overviewMid-day checkpoints + cross-day lesson distillgpt-oss:120b Ollama Cloud, thinking on1-3/scenario
    T4 strategicPattern re-ranking, weekly gap auditqwen3.5:397b cloud1-10/day
    T5 gatekeeperSchema migrations, autotune config changeskimi-k2-thinking cloud, audit-logged1-5/day
    ollamalocalhost:3200 — local sidecar over OllamaHot-path JSON emitters, embeddings, last-resort rescue
    ollama_cloudollama.com bearer key — gpt-oss:120b, qwen3-coder:480b, deepseek-v3.1:671b, kimi-k2:1t, mistral-large-3:675b, qwen3.5:397bStrong-model reviewer rungs, T3+ overview, scrum master pipeline
    openrouteropenrouter.ai/api/v1 — 343 models incl. Anthropic/Google/OpenAI/MiniMax/Qwen, paid + free tiersPaid ladder for observer escalations, free-tier rescue
    opencodeopencode.ai/zen/v1 — 40 frontier models reachable through ONE sk-* key: Claude Opus 4.7 / Sonnet / Haiku, GPT-5.5-pro / 5.4 / codex variants, Gemini 3.1-pro, Kimi K2.6, GLM 5.1, DeepSeek, Qwen 3.6+, MiniMax, plus 4 free-tierCross-architecture tie-breakers, auditor cross-lineage (Haiku 4.5 + Opus 4.7), high-context reasoning (Opus on diffs >100k chars)
    kimiapi.kimi.com/coding/v1 — direct Kimi For Codingkimi_architect when ollama_cloud rate-limits; TOS-clean primary path
    -

    Key mechanical finding (2026-04-21): qwen3.5 and qwen3 are thinking models — they burn ~650 tokens of hidden reasoning before emitting the visible response. For hot-path JSON emitters this meant 400-token budgets returned empty strings. Fix: think: false plumbed through sidecar's /generate endpoint; hot path disables thinking (structure matters more than reasoning depth), overseer tiers keep it on. Mistral was dropped entirely after a 0/14 fill rate on complex scenarios (decoder-level malformed-JSON bug, not a prompt issue).

    -

    Continuation primitive (Phase 21): generateContinuable() handles output-overflow without max_tokens tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. generateTreeSplit() handles input-overflow via map-reduce with running scratchpad. Both respect assertContextBudget() so silent truncation can't happen.

    + +

    The 9-rung cloud-first ladder

    +

    Defined in tests/real-world/scrum_master_pipeline.ts as const LADDER. Each attempt is evaluated by isAcceptable() = chars ≥ 3800 ∧ not malformed JSON-only. On reject, the next rung sees a learning preamble carrying the prior rejection reason.

    +
    1  ollama_cloud / kimi-k2:1t            1T params · flagship
    +2  ollama_cloud / qwen3-coder:480b      coding specialist
    +3  ollama_cloud / deepseek-v3.1:671b    reasoning
    +4  ollama_cloud / mistral-large-3:675b  deep analysis
    +5  ollama_cloud / gpt-oss:120b          reliable workhorse
    +6  ollama_cloud / qwen3.5:397b          dense final thinker
    +7  openrouter   / openai/gpt-oss-120b:free  rescue tier
    +8  openrouter   / google/gemma-3-27b-it:free fastest rescue
    +9  ollama       / qwen3.5:latest        last-resort local
    + +

    N=3 consensus + cross-architecture tie-breaker

    +

    Every audit and every consensus-required call fires the primary reviewer N=3 times in parallel (Promise.all — wall-clock = single call). Aggregate votes per claim_idx, majority wins. On a 1-1-1 split, a tie-breaker model with different architecture (qwen3-coder:480b vs primary gpt-oss/kimi) is invoked. Every disagreement, even when majority resolves, writes to data/_kb/audit_discrepancies.jsonl. Closes the cloud-non-determinism gap: temp=0 isn't actually deterministic in practice across hours; consensus + cross-architecture tie-break stabilizes verdicts.

    + +

    Auditor cross-lineage (Kimi ↔ Haiku ↔ Opus)

    +

    Every push to PR #11 triggers auditor/audit.ts within ~90s. To prevent a single model lineage's blind spots from becoming the system's blind spots, audits alternate between Kimi K2.6 (Moonshot lineage) and Haiku 4.5 (Anthropic lineage) by head SHA. Diffs over 100k chars auto-promote to Claude Opus 4.7 (Anthropic frontier). Per-PR cap of 3 audits with auto-reset on each new head SHA prevents infinite-loop spend. Latest verdict on c3c9c21: Haiku 4.5, 24.6s, 100% grounding-verified across 10 findings.

    + +

    Distillation v1.0.0 — the frozen substrate

    +

    The substrate the auditor and mode runner sit on is tagged at distillation-v1.0.0 / commit e7636f2. 145 unit tests pass · 22/22 acceptance invariants · 16/16 audit-full checks · bit-identical reproducibility verified. The distillation phase exports clean SFT / RAG / preference samples with a multi-layer contamination firewall (SFT_NEVER constant + scorer category mapping + acceptance fixtures); the auditor consumes the substrate. The frozen tag means: any future "the system regressed" question has a baseline to bisect against, byte-for-byte.

    + +

    Continuation primitive (Phase 21)

    +

    generateContinuable() handles output-overflow without max_tokens tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. generateTreeSplit() handles input-overflow via map-reduce with running scratchpad. Both respect assertContextBudget() so silent truncation can't happen. Now Rust-native in crates/aibridge/src/continuation.rs (Phase 44).

    Per-staffer tool_level (Phase 23)

    Scenarios can be scoped to a specific coordinator (staffer: {id, name, tenure_months, role, tool_level}). tool_level controls which tiers are available:

    @@ -265,6 +292,12 @@ table.plain tr:hover td{background:#0d1117} Boost workers based on past successNoYes (Phase 19 playbook_memory) Penalize workers based on past failureNoYes (/log_failure + 0.5n penalty) Surface traits across past fillsNoYes (/vectors/playbook_memory/patterns) +Per-staffer relevance gradientNoYes — same query reshapes per coordinator (staffer_id on /intelligence/chat); MARIA'S MEMORY pill labels the playbook context with the active coordinator +Triage in one shot — late-worker → backfills + draft SMSNoYes (/intelligence/chat Route 6 — pulls profile + 5 same-role same-geo backfills sorted by responsiveness + drafts client SMS in ~250ms) +Permit → fill plan derivation (forward demand)NoYes (/intelligence/permit_contracts — Chicago Socrata permit → role / headcount / deadline / fill probability / gross revenue per card) +Public-issuer attribution across contractor graphNoYes (/intelligence/profiler_index — direct + parent + co-permit associated tickers; live Stooq prices) +Cross-lineage AI audit on every PRNoYes (auditor crate — Kimi K2.6 ↔ Haiku 4.5 alternation + Opus 4.7 auto-promote on big diffs) +Pathway memory — system-level hot-swap by task fingerprintNoYes (88 traces, 11/11 successful replays, 100% reuse rate, ADR-021) Predict staffing demand from external dataNoYes (Chicago permit feed + 30-day rolling forecast) Count down to staffing deadline per contractNoYes (permit issue_date + heuristic timeline) Explain why each candidate rankedNoYes (boost chip + narrative citations + memory pattern) @@ -278,7 +311,7 @@ table.plain tr:hover td{background:#0d1117}
    Chapter 6

    How it gets better over time

    -
    Compounding learning across seven paths. The first three are automatic background loops. Paths 4-7 landed 2026-04-21 and turn the system into a reinforcement-learning pipeline: outcomes → knowledge base → pathway recommendations → cloud rescue → competence-weighted retrieval → observer analysis. All seven happen without operator intervention.
    +
    Compounding learning across ten paths. The first three are automatic background loops. Paths 4-7 (Phase 22-24) added the reinforcement layer: outcomes → KB → recommendations → cloud rescue → competence-weighted retrieval → observer analysis. Paths 7-9 (Phase 25-43, 2026-04-26→27) added the system-level memory layers: pathway memory by task fingerprint (ADR-021), per-staffer hot-swap, and the Construction Activity Signal Engine. All ten happen without operator intervention.

    Path 1 — Playbook boost with geo + role prefilter (Phase 19 + refinement)

    Every sealed fill is seeded to playbook_memory. The boost fires inside /vectors/hybrid when use_playbook_memory: true. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:

    @@ -311,7 +344,19 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)

    Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries staffer: {id, name, tenure_months, role, tool_level}. After every run, recomputeStafferStats(staffer_id) aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single competence_score (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).

    findNeighbors returns weighted_score = cosine × max_staffer_competence — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.

    -

    Path 7 — Observer outcome ingest (Phase 24)

    +

    Path 7 — Pathway memory (ADR-021 — semantic-correctness matrix layer)

    +

    Memory at the system layer, not the worker layer. Every accepted scrum review writes a PathwayTrace with the full backtrack: file fingerprint, model used, signal class, KB chunks consulted, observer events, semantic flags (UnitMismatch, TypeConfusion, OffByOne, StaleReference, DeadCode, BoundaryViolation, …), bug fingerprints. A new query that fingerprints to the same trace can hot-swap to the prior result without re-running the 9-rung escalation. Five-factor hot-swap gate: narrow fingerprint match AND audit consensus pass AND replay_count ≥ 3 (probation) AND success_rate ≥ 0.80 AND NOT retired AND vector cosine ≥ 0.90.

    +

    Live state (verified on this load): 88 traces · 11 / 11 successful replays · 100% reuse rate · probation gate crossed. Endpoints: /vectors/pathway/insert · /query · /record_replay · /stats · /bug_fingerprints. Spec: docs/DECISIONS.md ADR-021.

    + +

    Path 8 — Per-staffer hot-swap index

    +

    Memory scoped to whoever's acting. /intelligence/chat accepts staffer_id; on match, defaults state filter to staffer territory, scopes playbook-pattern geo to staffer's primary city/state, and surfaces response.staffer.name so the UI relabels MEMORY → MARIA'S MEMORY. Same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha. The corpus stays intact; the relevance gradient is per coordinator; each accumulates fills independently.

    +

    Roster: /staffers endpoint reads from STAFFERS in mcp-server/index.ts. Three personas today (Maria/Devon/Aisha); architecture generalizes — every new metro adds territories, not code paths.

    + +

    Path 9 — Construction Activity Signal Engine

    +

    Memory at the network layer. Every contractor in the corpus is also a forward indicator on the public equities they touch via three attribution flavors: direct (contractor IS the public issuer — SEC tickers index match), parent (subsidiary of a public parent — curated KNOWN_PARENT_MAP, e.g. Turner → HOC.DE via Hochtief AG), associated (co-permit network — Bob's Electric appears with TARGET CORPORATION 3+ times → inherits TGT). The associated path is the moat: a staffing-permit dataset that maps contractor-to-public-issuer is not commercially available; we synthesize it from the Socrata co-occurrence graph.

    +

    BAI (Building Activity Index) = attribution-weighted average day-change across surfaced issuers. Indexed build value = total $ of permits attributable to ANY public issuer in scope. Network depth = issuers / total attribution edges. Cross-metro replication explicit in the architecture — Chicago is Phase 1; NYC DOB / LA County / Houston BCD / Boston ISD / DC DCRA are all Socrata-shaped, ship as config-only adapters.

    + +

    Path 10 — Observer outcome ingest (Phase 24)

    Observer runs as lakehouse-observer.service, now with an HTTP listener on :3800. Scenarios POST per-event outcomes to /event with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old /ingest/file REPLACE path to an append-only data/_observer/ops.jsonl journal so the trace survives across restarts.

    Input normalizer + unified memory query

    @@ -399,7 +444,11 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)
    Chapter 9

    Per-staffer context

    -
    Twenty staffers don't see the same UI state. Each one's session is shaped by their active profile, their workspaces, their assigned contracts, and their client's blacklists.
    +
    Twenty staffers don't see the same UI state. Each one's session is shaped by their identity (the per-staffer hot-swap index — Path 8 in Ch6), their active profile, their workspaces, their assigned contracts, and their client's blacklists.
    + +

    Per-staffer hot-swap index (the recent layer)

    +

    Maria runs Chicago. Devon runs Indianapolis. Aisha runs Wisconsin/Michigan. They share one corpus, but search results, recurring-skill patterns, and playbook context all reshape to whoever is acting. /intelligence/chat accepts staffer_id; on match, defaults state filter to the staffer's territory, scopes playbook-pattern geo to their primary city/state, and surfaces response.staffer.name so the UI relabels MEMORY → MARIA'S MEMORY.

    +

    Verified end-to-end: same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha (live numbers; refresh the profiler page to recompute). The corpus stays intact; the relevance gradient is per coordinator. As each accumulates fills, their slice of the playbook compounds independently. Roster: /staffers endpoint, declared in STAFFERS in mcp-server/index.ts. Adding a staffer is one append; the architecture is metro-agnostic by construction.

    Active profile (Phase 17)

    Scopes every search. A staffing-recruiter profile bound to workers_500k sees only that dataset. A security-analyst profile bound to threat_intel cannot see worker data. GET /vectors/profile/<id>/audit records every tool invocation by model identity.

    @@ -446,7 +495,7 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)
    12:30
    Client pushes 20 new contracts + 1M ATS delta. Ch7 scale flow fires. Ingest in seconds; embedding refresh kicks off as a background job. Searches continue against old embeddings.
    -
    14:00
    Emergency: worker Dave no-showed. Sarah clicks No-show button on Dave's card → /log_failuremark_failed records a penalty. Next similar query dampens Dave's boost by 0.5. Sarah continues the refill — the refill excludes Dave and the 2 others already booked for this shift.
    +
    14:00
    Emergency: worker Dave no-showed. Sarah types "Dave running late site 4422" into the search box. ~250ms later: triage card with Dave's profile + reliability + responsiveness, draft SMS to client ("dispatching X from local bench, 96% reliability, will confirm arrival"), and 5 same-role same-geo backfills sorted by responsiveness rendered as a green list below. Sarah clicks Copy SMS, pastes to client, clicks Call on the top backfill. /log_failure on Dave records the penalty for the next similar query.
    15:00
    New embeddings live. Hot-swap promotion. Searches now see all 1M new profiles. Sarah's noon query re-run would produce different top-5.
    @@ -468,14 +517,15 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)

    Deferred — real architectural work, just not shipped yet

      +
    • BAI persistence + backtesting. Building Activity Index is computed live per page load. To validate the thesis (permit activity precedes equity moves) we need the daily series saved over months. Architectural support exists (data/_kb/audit_baselines.jsonl append pattern); just hasn't run long enough.
    • +
    • NYC DOB adapter. Architecture is metro-agnostic — Chicago is Phase 1. NYC DOB ships next as a config-only Socrata adapter; LA County, Houston BCD, Boston ISD, DC DCRA queue behind it. Each new metro multiplies network edges without multiplying the codebase.
    • +
    • 12 awaiting public-data sources for contractor profile. DOL Wage & Hour, EPA ECHO, MSHA, BBB, PACER civil suits, UCC liens, D&B credit, State licensure, Surety bonds, DOT/FMCSA, State UI claims, DOL RAPIDS apprenticeships. Listed by name on every contractor profile with a one-line "would show:" sample. Each ships as a Socrata-style adapter; engineering scope is concrete.
    • Rate / margin awareness. Worker pay expectations vs contract bill rate not modeled. Requires adding pay_rate to workers, bill_rate to contracts, and a filter + warning path. Partially addressed via ContractTerms.budget_per_hour_max passed to T3/rescue prompts, but the match-time filter isn't wired yet.
    • -
    • Mem0-style UPDATE / DELETE / NOOP operations on playbooks. Today /seed only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Phase 26 item — cheap to add, moderate payoff.
    • -
    • Letta working-memory hot cache. Every boost query scans all active playbook entries from in-memory state. 1.9K today; cheap. Will bite somewhere north of 100K. LRU for the last-N playbooks or current-sig neighborhood deferred until that ceiling approaches.
    • -
    • Chunking cache (Phase 21 Rust port). TS primitives generateContinuable + generateTreeSplit are wired, but crates/aibridge/src/{continuation.rs, tree_split.rs} + crates/storaged/src/chunk_cache.rs remain queued. Gateway-side callers currently don't have the same protection against silent truncation that the TS test harness does.
    • +
    • Mem0-style UPDATE / DELETE / NOOP operations on playbooks. Today /seed only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Cheap to add, moderate payoff.
    • +
    • Letta working-memory hot cache. Every boost query scans all active playbook entries from in-memory state. ~5K today; cheap. Will bite somewhere north of 100K. Deferred until the ceiling approaches.
    • Confidence calibration. Top-K is a rank, not a probability. No calibrated "85% likely to accept" score. Requires outcome-labeled training data.
    • -
    • Neural re-ranker. Phase 19 is statistical + semantic (now with geo + role prefilter, Phase 25 retirement). A (query, candidate, outcome)-trained re-ranker is deferred only if the statistical floor plateaus below usable recall — current 14× citation lift on identical inputs suggests it hasn't.
    • -
    • Observer → autotune feedback wire. Phase 24 streams scenario outcomes into data/_observer/ops.jsonl; autotune agent still runs on its own HNSW-trial schedule and hasn't subscribed to the outcome metric stream yet. Phase 26+ item — connects the last loop.
    • -
    • call_log cross-reference. Infrastructure present; current synthetic candidates table is too small to cross-ref. Fixes when real ATS lands.
    • +
    • SEC name-to-ticker fuzzy precision. Current matcher requires ≥2 non-stopword overlap; rare false positives still surface (saw FLG attach to a PNC-adjacent contractor once). Tightenable to require an explicit allow-list for production trading use.
    • +
    • Tighter integration of pathway memory + scrum loop. ADR-021 substrate is shipped (88 traces, 11/11 replays). The hot-swap gate fires correctly; what's deferred is automatic mode-runner short-circuit when a high-confidence pathway match is available before any cloud call burns.

    Non-goals — explicitly out of scope

    @@ -496,6 +546,6 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)
    - +