lakehouse/docs/PHASES.md
root 55f8e0fe6e Phase 40: Routing Engine + Policy
- RoutingEngine with RouteDecision (model_pattern → provider)
- config/routing.toml: rules, fallback chain, cost gating
- Per-provider Usage tracking in /v1/usage response
- 12 gateway tests green
2026-04-23 02:36:45 -05:00

36 KiB
Raw Blame History

Phase Tracker

Phase 0: Bootstrap

  • Cargo workspace with all crate stubs compiling
  • shared crate: error types, ObjectRef, DatasetId
  • gateway with Axum: GET /health → 200
  • tracing + tracing-subscriber wired in gateway
  • justfile with build, test, run recipes
  • docs committed to git

Phase 1: Storage + Catalog

  • storaged: object_store backend init (LocalFileSystem)
  • storaged: Axum endpoints (PUT/GET/DELETE/LIST)
  • shared/arrow_helpers.rs: RecordBatch ↔ Parquet + schema fingerprinting
  • catalogd/registry.rs: in-memory index + manifest persistence
  • catalogd service: POST/GET /datasets + by-name
  • gateway routes wired

Phase 2: Query Engine

  • queryd: SessionContext + object_store config
  • queryd: ListingTable from catalog ObjectRefs
  • queryd service: POST /query/sql → JSON
  • queryd → catalogd wiring
  • gateway routes /query

Phase 3: AI Integration

  • Python sidecar: FastAPI + Ollama (embed/generate/rerank)
  • Dockerfile for sidecar
  • aibridge/client.rs: HTTP client
  • aibridge service: Axum proxy endpoints
  • Model config via env vars

Phase 4: Frontend

  • Dioxus scaffold, WASM build
  • Ask tab: natural language → AI SQL → results
  • Explore tab: dataset browser + AI summary
  • SQL tab: raw DataFusion editor
  • System tab: health checks for all services

Phase 5: Hardening

  • Proto definitions (lakehouse.proto)
  • Internal gRPC: CatalogService on :3101
  • OpenTelemetry tracing: stdout exporter
  • Auth middleware: X-API-Key (toggleable)
  • Config-driven startup: lakehouse.toml

Phase 6: Ingest Pipeline

  • CSV ingest with auto schema detection
  • JSON ingest (array + newline-delimited, nested flattening)
  • PDF text extraction (lopdf)
  • Text/SMS file ingest
  • Content hash dedup (SHA-256)
  • POST /ingest/file multipart upload
  • 12 unit tests

Phase 7: Vector Index + RAG

  • chunker: configurable size + overlap, sentence-boundary aware
  • store: embeddings as Parquet (binary f32 vectors)
  • search: brute-force cosine similarity
  • rag: embed → search → retrieve → LLM answer with citations
  • POST /vectors/index, /search, /rag
  • Background job system with progress tracking
  • Dual-pipeline supervisor with checkpointing + retry
  • 100K embeddings: 177/sec on A4000, zero failures
  • 6 unit tests

Phase 8: Hot Cache + Incremental Updates

  • MemTable hot cache: LRU, configurable max (16GB)
  • POST /query/cache/pin, /cache/evict, GET /cache/stats
  • Delta store: append-only delta Parquet files
  • Merge-on-read: queries combine base + deltas
  • Compaction: POST /query/compact
  • Benchmarked: 9.8x speedup (1M rows: 942ms → 96ms)

Phase 8.5: Agent Workspaces

  • WorkspaceManager with daily/weekly/monthly/pinned tiers
  • Saved searches, shortlists, activity logs per workspace
  • Instant zero-copy handoff between agents
  • Persistence to object storage, rebuild on startup

Phase 9: Event Journal

  • journald crate: append-only mutation log
  • Event schema: entity, field, old/new value, actor, source, workspace
  • In-memory buffer with auto-flush to Parquet
  • GET /journal/history/{entity_id}, /recent, /stats
  • POST /journal/event, /update, /flush

Phase 10: Rich Catalog v2

  • DatasetManifest: description, owner, sensitivity, columns, lineage, freshness, tags
  • PII auto-detection: email, phone, SSN, salary, address, medical
  • Column-level metadata with sensitivity flags
  • Lineage tracking: source_system → ingest_job → dataset
  • PATCH /catalog/datasets/by-name/{name}/metadata
  • Backward compatible (serde default)

Phase 11: Embedding Versioning

  • IndexRegistry: model_name, model_version, dimensions per index
  • Index metadata persisted as JSON, rebuilt on startup
  • GET /vectors/indexes — list all (filter by source/model)
  • GET /vectors/indexes/{name} — metadata
  • Background jobs auto-register metadata on completion

Phase 12: Tool Registry

  • 6 built-in staffing tools (search_candidates, get_candidate, revenue_by_client, recruiter_performance, cold_leads, open_jobs)
  • Parameter validation + SQL template substitution
  • Permission levels: read / write / admin
  • Full audit trail per invocation
  • GET /tools, GET /tools/{name}, POST /tools/{name}/call, GET /tools/audit

Phase 13: Security & Access Control

  • Role-based access: admin, recruiter, analyst, agent
  • Field-level sensitivity enforcement
  • Column masking determination per agent
  • Query audit logging
  • GET/POST /access/roles, GET /access/audit, POST /access/check

Phase 14: Schema Evolution

  • Schema diff detection (added, removed, type changed, renamed)
  • Fuzzy rename detection (shared word parts)
  • Auto-generated migration rules with confidence scores
  • AI migration prompt builder for complex cases
  • 5 unit tests

Phase 15+: Horizon

  • HNSW vector index with iteration-friendly trial system (2026-04-16)
    • HnswStore.build_index_with_config — parameterized ef_construction, ef_search, seed
    • EmbeddingCache — pins 100K vectors in memory, shared across trials
    • harness::EvalSet — named query sets with brute-force ground truth
    • TrialJournal — append-only JSONL at _hnsw_trials/{index}.jsonl
    • Endpoints: /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best?metric={recall|latency|pareto}, /hnsw/evals, /hnsw/evals/{name}/autogen, /hnsw/cache/stats
    • Measured on 100K resumes: brute-force 44-54ms → HNSW 509us-1830us, recall 0.92-1.00 depending on ef_construction. Sweet spot: ec=80 es=30 → p50=873us recall=1.00 — locked in as HnswConfig::default()
  • Catalog manifest repair — POST /catalog/resync-missing restores row_count and columns from parquet footers (2026-04-16). All 7 staffing tables recovered to PRD-matching 2.47M rows.
  • [~] Federated multi-bucket query — foundation complete 2026-04-16, see ADR-017
    • StorageConfig.buckets + rescue_bucket + profile_root config shape
    • SecretsProvider trait + FileSecretsProvider (reads /etc/lakehouse/secrets.toml, checks 0600 perms)
    • storaged::BucketRegistry — multi-backend, rescue-aware, reachability probes
    • storaged::error_journal::ErrorJournal — append-only JSONL at primary://_errors/bucket_errors.jsonl
    • Endpoints: GET /storage/buckets, GET /storage/errors, GET /storage/bucket-health
    • Bucket-aware I/O: PUT/GET /storage/buckets/{bucket}/objects/{*key} with rescue fallback + X-Lakehouse-Rescue-Used observability headers
    • Backward compat: empty [[storage.buckets]] synthesizes a primary from legacy root
    • Three-bucket test (primary + rescue + testing) verified: normal reads, rescue fallback with headers, hard-fail missing, write to unknown bucket 503, error journal + health summary
    • X-Lakehouse-Bucket header middleware on ingest endpoints (2026-04-16)
    • Catalog migration: POST /catalog/migrate-buckets stamps bucket = "primary" on legacy refs (12 renamed, 14 total now canonical)
    • queryd registers every bucket with DataFusion for cross-bucket SQL — verified with people_test (testing) × animals (primary) CROSS JOIN
    • Profile hot-load endpoints: bucket auto-provisioning on POST /vectors/profile/{id}/activate (2026-04-17)
    • vectord bucket-scoped paths: TrialJournal + PromotionRegistry resolve per-index via IndexMeta.bucket (2026-04-17)
    • Runtime bucket lifecycle: POST /storage/buckets (provision) + DELETE /storage/buckets/{name} (unregister, refuses primary/rescue) (2026-04-17)
    • ModelProfile.bucket field — per-profile artifact isolation (2026-04-17)
  • Database connector ingest (Postgres first) — 2026-04-16
    • pg_stream::stream_table_to_parquet — ORDER BY + LIMIT/OFFSET pagination, configurable batch_size
    • parse_dsn — postgresql:// and postgres:// URL scheme, user/password/host/port/db
    • POST /ingest/db endpoint: {dsn, table, dataset_name?, batch_size?, order_by?, limit?} → streams to Parquet, registers in catalog with PII detection + redacted-password lineage
    • Existing POST /ingest/postgres/import (structured config) preserved alongside
    • 4 DSN-parser unit tests + live end-to-end test against knowledge_base.team_runs (586 rows, 13 cols, 6 batches, 196ms)
  • Phase B: Lance storage evaluation — 2026-04-16
    • crates/lance-bench standalone pilot (Lance 4.0) avoids DataFusion/Arrow version conflict with main stack
    • 8-dimension benchmark on resumes_100k_v2 — see docs/ADR-019-vector-storage.md for scorecard
    • Decision: hybrid architecture. Parquet+HNSW stays primary (2.55× faster search at 100K in-RAM). Lance added as per-profile second backend for random access (112× faster), append (0.08s vs full rewrite), hot-swap (14× faster index builds), and scale past 5M RAM ceiling.
  • Phase E.2 — Compaction integrates tombstones (physical deletion) — 2026-04-16
    • delta::compact accepts tombstones: &[Tombstone] param, filters rows at merge time via arrow filter_record_batch
    • CompactResult gains tombstones_applied + rows_dropped_by_tombstones
    • Atomic write: ArrowWriter → single Parquet file (fixes latent bug where concatenated Parquet byte streams produced garbage — footer-only-first-segment visible), verify-parse before overwrite, temp_key staging, delete delta files AFTER base write succeeds
    • Snappy compression on output matches ingest defaults (avoids 3× size inflation on every compact)
    • TombstoneStore::clear drops all batch files for a dataset; called by queryd after successful compact
    • Query engine exposes catalog() accessor so service handler can reach the tombstone store
    • E2E verified on candidates (100K rows): tombstone 3 IDs → compact → 99,997 rows physically in parquet, tombstones empty, IDs gone from __raw__candidates too; file size 10.59 MB → 10.72 MB (proportional to data, not inflated)
  • Phase 16: Hot-swap generations + autotune agent — 2026-04-16
    • vectord::promotion::PromotionRegistry — per-index current config + history at _hnsw_promotions/{index}.json, cap 50 history entries
    • Endpoints: POST /vectors/hnsw/promote/{index}/{trial_id}, POST /vectors/hnsw/rollback/{index}, GET /vectors/hnsw/promoted/{index}
    • vectord::autotune::run_autotune — grid of trials (configurable or default 5 configs), Pareto winner selection (max recall, then min p50), min_recall safety gate (default 0.9), config bounds (ec ∈ [10,400], es ∈ [10,200])
    • POST /vectors/hnsw/autotune — runs the full loop synchronously, journals every trial, auto-promotes winner
    • activate_profile uses promotion_registry.config_or(..., profile_default) so newly-promoted configs flow automatically into next activation
    • End-to-end: autogen harness for threat_intel_v1 (10 queries), autotune ran 5 trials (all recall=1.00, p50 64-68us), promoted ec=20 es=30 at recall=1.0 p50=64us as winner. Manual promote of ec=80 es=30 pushed autotune pick onto history. Rollback restored autotune winner. Second rollback cleared to None. Re-promote + restart verified persistence. Activation after promotion logged "building HNSW ef_construction=80 ef_search=30 seed=42" — config flowed through correctly.
  • Phase 17: Model profiles + scoped search — 2026-04-16
    • shared::types::ModelProfile — { id, ollama_name, description, bound_datasets, hnsw_config, embed_model, created_at, created_by }
    • shared::types::ProfileHnswConfig — mirror of vectord's HnswConfig to avoid cross-crate dep cycle (defaults ec=80 es=30 matching Phase 15 winner)
    • Registry::{put_profile, get_profile, list_profiles, delete_profile} persisted at _catalog/profiles/{id}.json, validates bindings exist (raw dataset OR AiView)
    • Endpoints: POST/GET /catalog/profiles, GET/DELETE /catalog/profiles/{id}
    • POST /vectors/profile/{id}/activate — warms EmbeddingCache + builds HNSW with profile's config for every bound dataset's vector index; reports warmed indexes + failures + duration
    • POST /vectors/profile/{id}/search — rejects 403 if requested index's source isn't in profile.bound_datasets; falls through to HNSW if warm, brute-force otherwise
    • Fixed refresh to register new index metadata (was silently no-op for first-time indexes)
    • End-to-end: security-analyst profile bound to threat_intel → activate warms 54 vectors in 156ms → within-scope HNSW search works (0.625 score); out-of-scope search for candidates returns 403 with allowed bindings listed
  • Phase E: Soft deletes (tombstones) — 2026-04-16
    • shared::types::Tombstone — { dataset, row_key_column, row_key_value, deleted_at, actor, reason }
    • catalogd::tombstones::TombstoneStore per-dataset append-log at _catalog/tombstones/{dataset}/, flush_threshold=1 + explicit flush so every tombstone is durable on return (compliance requirement)
    • All tombstones for a dataset must share the same row_key_column (validated at write — query filter is built as a single WHERE NOT IN clause)
    • Registry::add_tombstone / list_tombstones
    • Endpoint: POST /catalog/datasets/by-name/{name}/tombstone accepting {row_key_column, row_key_values[], actor, reason}; companion GET lists active tombstones
    • queryd::context::build_context wraps tombstoned tables: raw goes to __raw__{name}, public name becomes a DataFusion view with WHERE CAST(col AS VARCHAR) NOT IN (...) filter
    • End-to-end on candidates: tombstone 3 IDs, COUNT drops 100,000 → 99,997, specific WHERE returns empty, AiView candidates_safe transitively excludes them too, restart preserves all tombstones
    • Limits / not in MVP: journal integration (tombstones don't yet emit Phase 9 mutation events — covered by audit fields on the tombstone itself). Physical compaction integration was closed by Phase E.2 below.
  • Phase D: AI-safe views — 2026-04-16
    • shared::types::AiView — name, base_dataset, columns whitelist, optional row_filter, column_redactions
    • shared::types::Redaction — Null | Hash | Mask { keep_prefix, keep_suffix }
    • Registry::put_view / get_view / list_views / delete_view persisted to _catalog/views/{name}.json
    • queryd::context registers each view as a DataFusion view with the safe projection + filter + redactions baked into the SELECT
    • Endpoints: POST/GET /catalog/views, GET/DELETE /catalog/views/{name}
    • End-to-end on candidates: candidates_safe view exposes 8 of 15 columns, masks candidate_id (CAN******01), filters out status='blocked'. SELECT * FROM candidates_safe returns whitelist only; SELECT email FROM candidates_safe fails. View survives restart.
    • Capability surface — raw candidates still accessible by name; Phase 13 access control is the layer that enforces who can query what
  • Phase C: Decoupled embedding refresh — 2026-04-16
    • DatasetManifest: last_embedded_at, embedding_stale_since, embedding_refresh_policy (Manual | OnAppend | Scheduled)
    • Registry::mark_embeddings_stale / clear_embeddings_stale / stale_datasets
    • Ingest paths (CSV pipeline + Postgres streaming) auto-mark-stale when writing to an already-embedded dataset
    • vectord::refresh::refresh_index — reads dataset, diffs doc_ids vs existing embeddings, embeds only new rows, writes combined index, clears stale
    • POST /vectors/refresh/{dataset} + GET /vectors/stale
    • Id columns accept Utf8, Int32, Int64
    • End-to-end on threat_intel: initial 20-row embed 2.1s; re-ingest to 54 rows auto-marks stale; delta refresh embeds only 34 new in 970ms (6× faster than full re-embed); stale cleared
  • Phase 16.2/16.5: Background autotune agent + ingest-triggered re-trials — 2026-04-17
    • vectord::agent — ε-greedy proposer, rate-limited, cooldown-gated, tokio background task
    • Ingest paths push DatasetAppended triggers to agent queue
    • Endpoints: GET /vectors/agent/status, POST /vectors/agent/stop, POST /vectors/agent/enqueue/{idx}
    • [agent] config section in lakehouse.toml (enabled, cycle_interval, cooldown, min_recall, max_trials/hr)
    • 3 unit tests
  • Phase 17 VRAM gate: Two-profile sequential swap — 2026-04-17
    • Sidecar: POST /admin/unload (keep_alive=0), POST /admin/preload, GET /admin/vram (nvidia-smi + Ollama /api/ps)
    • AiClient::unload_model / preload_model / vram_snapshot
    • VectorState.active_profile singleton — activate swaps models, deactivate unloads
    • Verified: staffing-recruiter (qwen2.5) ↔ docs-assistant (mistral) — only one model in VRAM at a time
  • MySQL streaming connector — 2026-04-17
    • my_stream.rs mirrors pg_stream: DSN parsing, OFFSET pagination, Arrow type mapping, Parquet streaming
    • POST /ingest/mysql with PII detection, lineage, agent trigger
    • Verified end-to-end on live MariaDB (10 rows, 9 columns, round-tripped all types)
    • 6 DSN + type-mapping unit tests
  • Phase 18 hybrid: vectord-lance production crate — 2026-04-17
    • Firewall crate (Arrow 57 / Lance 4, separate from main Arrow 55 / DF 47 stack)
    • Public API: migrate_from_parquet, build_index (IVF_PQ), search, get_by_doc_id, append, build_scalar_index, stats
    • lance_backend::LanceRegistry resolves bucket → URI per index
    • VectorBackend { Parquet | Lance } enum on ModelProfile + IndexMeta
    • 8 HTTP endpoints under /vectors/lance/* (migrate, index, search, doc, append, stats, scalar-index, recall)
    • Profile-driven routing: POST /vectors/profile/{id}/search auto-routes to Lance when profile.vector_backend=lance
    • Auto-migrate + auto-index on activation
    • Measured on real 100K × 768d: migrate 0.57s, IVF_PQ build 16.2s (14× faster than HNSW 230s), search 23ms, append 100 rows 3.3ms, doc_id fetch 3.5ms (with scalar btree)
    • IVF_PQ recall@10 = 0.805 with Lance's default nprobes=1 (the hidden cap — see 2026-04-20 tuning work below, which lifts it to 1.000). Measured via /vectors/lance/recall/{idx} harness.
  • Phase E.3: Scheduled ingest — 2026-04-17
    • ingestd::schedule module: ScheduleDef, ScheduleStore (JSON at _schedules/{id}.json), Scheduler tokio task
    • Supports MySQL + Postgres sources on interval triggers (Cron variant defined, parsing stubbed)
    • 6 CRUD endpoints under /ingest/schedules/* + run-now manual trigger
    • Full catalog integration: PII, lineage, mark-stale, agent trigger
    • 6 unit tests
  • PDF OCR via Tesseract — 2026-04-17
    • Two-tier: lopdf text extraction → Tesseract 5.5 fallback for scanned/image PDFs
    • Extracts embedded XObject /Image streams, shells to tesseract --oem 3 --psm 6
    • Same schema (source_file, page_number, text_content) — downstream unchanged
  • Catalog hygiene — idempotent register() + dedupe + DELETE (2026-04-19, ADR-020)
    • catalogd::Registry::register now gates on (name, schema_fingerprint): same fp → reuse DatasetId and update objects in place; different fp → return error (409 Conflict on HTTP, FAILED_PRECONDITION on gRPC). First-time registration is unchanged.
    • POST /catalog/dedupe one-shot operator endpoint collapses pre-existing duplicates; winner = non-null row_count first, newest updated_at second.
    • DELETE /catalog/datasets/by-name/{name} removes the manifest from both in-memory registry and object storage (metadata-only — parquet files, vector indexes, tombstones are NOT cascade-deleted). Added to support test-harness cleanup; also plugs a real catalog hole where zombie entries from prior deletes would break DataFusion schema inference.
    • Cleanup run on live catalog: 374 → 31 datasets, 343 orphan manifests removed, 0 errors. 308× successful_playbooks was the worst offender.
    • Concurrency: write lock held across storage I/O in register() to close the check→insert TOCTOU window (32-worker multi-threaded stress test verifies single-manifest invariant).
    • End-to-end verification: scripts/e2e_pipeline_check.sh runs 31 assertions across 12 pipeline stages (ingest → catalog → SQL+JOIN → dedup → idempotency → metadata → PII → vector embed → semantic search → cleanup) against the live gateway. Idempotent across repeat runs.
    • Tests: 11 new in catalogd (was 0, includes 3 concurrency tests + 3 delete_dataset tests); 11 new in storaged for AppendLog + ErrorJournal (was 0). Fixed a broken doctest in append_log.rs.
  • Autotune agent: portfolio rotation + auto-bootstrap (2026-04-20)
    • pick_periodic_target now sources candidates from IndexRegistry (not just promoted indexes) and picks least-recently-tuned, so trial budget spreads across every index with ≥1000 vectors instead of fixating on one converged champion.
    • run_one_cycle bootstraps on first visit: ensure_auto_harness auto-generates {index}_auto (20 synthetic self-queries, k=10, brute-force ground truth) if missing, then seeds with HnswConfig::default() (ec=80/es=30).
    • Regression fix: harness::recall_at_k now uses set-intersection semantics. The prior impl counted duplicates in predicted — on corpora with repeated chunks (kb_response_cache_agent) this inflated recall above 1.0 and poisoned promotion decisions. +7 unit tests.
  • Scheduled ingest: real cron parsing (2026-04-20)
    • Vixie-compatible 5/6-field cron via croner crate. Day-of-week follows Unix convention (1-5 = Mon-Fri). 6-field adds seconds granularity.
    • validate_trigger in ingestd::schedule — create/patch handlers reject malformed expressions with 400 BAD_REQUEST at creation time, not silently at fire time.
    • Swapped away from the cron crate (0.16) which uses a non-Unix DOW convention (1=Sun) that would silently bite anyone writing 1-5 expecting weekdays. +9 unit tests.
  • EvalSets federation (2026-04-20)
    • harness::HarnessStore mirrors the TrialJournal / PromotionRegistry federation pattern: eval artifacts colocate with each index's recorded bucket; legacy evals in primary remain discoverable via a fallback path; cross-bucket listing dedupes.
    • Every eval callsite (service.rs × 5, agent.rs × 3, autotune.rs × 1) now routes through HarnessStore. VectorState and AgentDeps each hold a shared instance.
  • Index bucket-migrate PATCH (2026-04-20)
    • PATCH /vectors/indexes/{name}/bucket copies an index's vector parquet + trial-journal batches + promotion file + auto-harness to dest_bucket, flips IndexMeta.bucket as the commit point, and evicts the EmbeddingCache so next load reads from the new bucket. Optional delete_source: true sweeps source artifacts.
    • Lance-backed indexes refused with 400 — Lance URIs are bucket-specific and require rewriting the dataset, separate story. Round-trip verified: 390 artifacts, 0.04s.
  • IVF_PQ recall tuning (2026-04-20)
    • LanceVectorStore::search now accepts optional nprobes + refine_factor. Lance's built-in nprobes=1 default was the hidden cap on recall — on 316-partition resumes_100k_v2 it searched only 0.3% of partitions per query.
    • Server defaults (LANCE_DEFAULT_NPROBES=20, LANCE_DEFAULT_REFINE_FACTOR=5) flow through the scoped-search path and the autotune harness. Measured on resumes_100k_v2: recall 0.805 → 1.000 at p50 ≈ 7.4ms. Even nprobes=5, refine=5 saturates recall at p50 ≈ 4.7ms.
    • /vectors/lance/recall/{idx} accepts per-request nprobes / refine_factor so operators can sweep the curve.
  • Phase 19: Playbook memory (meta-index) — the feedback loop originally implied by the PRD but never built. Playbooks stop being write-only; they start shaping future rankings. (2026-04-20)
    • 19.1 — POST /vectors/playbook_memory/rebuild scans successful_playbooks via DataFusion, builds one PlaybookEntry per row (operation + approach + context embedded as one vector via nomic-embed-text)
    • 19.2 — Brute-force cosine search over in-memory embeddings (chosen over HNSW: successful_playbooks maxes around thousands of rows, overhead of a second indexed surface isn't worth it until that ceiling bites)
    • 19.3 — Endorsed names parsed out of result column, keyed by (city, state, name) tuple so shared names across cities don't cross-pollinate. Parsing via parse_names + parse_city_state helpers (7 unit tests)
    • 19.4 — /vectors/hybrid?use_playbook_memory=true: fetches top_k * 5 candidates so endorsed workers outside the vanilla top-K can still climb. Boost is additive on vector score, each hit carries playbook_boost + playbook_citations in the response for explainability
    • 19.5 — Multi-agent orchestrator (tests/multi-agent/orchestrator.ts) auto-seeds POST /vectors/playbook_memory/seed on consensus_done, so the next query sees the new endorsement without a full /rebuild. Closes the feedback loop: two agents reach consensus → playbook sealed → next query re-ranks
    • 19.6 — MAX_BOOST_PER_WORKER = 0.25 enforced in compute_boost_for; verified with unit test (100 identical playbooks → boost capped at 0.25) and live test (5 identical seeds → exactly 0.25). Time decay also wired: BOOST_HALF_LIFE_DAYS = 30.0 — 30-day-old playbooks contribute half, 60-day a quarter, via exp(-age_days / 30) in the boost loop
    • Real finding surfaced during build: the 32 bootstrap rows in successful_playbooks reference phantom worker names — 80 of 82 don't correspond to actual rows in workers_500k. /seed endpoint bypasses successful_playbooks so operators can prime memory with real fixtures; production path is the orchestrator write-through
  • Phase 19 refinement — geo + role prefilter on boost (2026-04-21)
    • Added compute_boost_for_filtered and compute_boost_for_filtered_with_role to playbook_memory.rs. SQL filter's (city, state, role) parsed in service.rs; exact role-matches in target geo skip cosine and earn similarity=1.0. Restored the feedback loop: matched=0 → matched=11 per query on the same Nashville test. Citation density on Riverfront Steel: 2 → 28 per run (14×).
    • Rust unit tests: extractor_tests::extract_target_geo_basic/_missing_state/_word_boundary, extract_target_role_basic/_none/_multi_word. 6/6 pass.
    • Diagnostic log: playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=? on every call.
  • Phase 20: Model Matrix + Overseer Tiers (2026-04-21)
    • config/models.json — 5 tiers (t1_hot / t2_review / t3_overview / t4_strategic / t5_gatekeeper), each with context_window + context_budget + overflow_policy. Ollama Cloud bearer key from /root/llm_team_config.json.
    • Hot path: qwen3.5:latest + qwen3:latest local with think:false. Mistral dropped after 0/14 fill on complex scenarios.
    • T3 cloud: gpt-oss:120b via Ollama Cloud — verified 4-8s latency, strict JSON-shape output for remediation.
  • Phase 21: Scratchpad + Tree-Split Continuation (2026-04-21)
    • tests/multi-agent/agent.ts: estimateTokens(), assertContextBudget(), generateContinuable(), generateTreeSplit(). think flag plumbed through sidecar's /generate. Empty-response backoff + truncation-continuation, no max_tokens tourniquet.
    • Rust port shipped (2026-04-21, companion to Phase 27):
      • crates/aibridge/src/context.rsestimate_tokens (chars/4 ceil, matches TS), context_window_for, assert_context_budget returning Result<BudgetCheck, (BudgetCheck, usize over_by)> so callers get the numbers back on both success and overflow. Windows table mirrors config/models.json.
      • crates/aibridge/src/continuation.rsgenerate_continuable<G: TextGenerator> handles the two failure modes from TS: (a) empty thinking-model response → geometric-backoff retry with 2× budget up to budget_cap; (b) truncated non-empty → continuation with partial-as-scratchpad. is_structurally_complete balances braces then JSON.parse-check for the JSON shape; text shape is "non-empty". Guards the degen case "all retries empty → bail, don't loop on empty partial" — the TS impl has this implicit, Rust makes it explicit.
      • crates/aibridge/src/tree_split.rsgenerate_tree_split map→reduce with running scratchpad. Per-shard + reduce-prompt budget checked through assert_context_budget; loud-fails with the overflow message rather than silently truncating. Scratchpad oldest-digest-first truncation once it exceeds scratchpad_budget (default 6000 tokens, matches TS).
      • TextGenerator trait (native async-fn-in-trait, edition 2024) so ScriptedGenerator test double can inject canned sequences without a live Ollama. AiClient implements TextGenerator.
      • GenerateRequest gained think: Option<bool> field — forwards to sidecar for per-call hidden-reasoning opt-out on hot-path JSON emitters.
      • 25 aibridge unit tests (8 context + 10 continuation + 7 tree_split) — all green, no network required.
  • Phase 22: Internal Knowledge Library (2026-04-21)
    • data/_kb/ — signatures.jsonl, outcomes.jsonl, pathway_recommendations.jsonl, error_corrections.jsonl, config_snapshots.jsonl. Event-driven cycle: indexRun → recommendFor → loadRecommendation.
    • Item B cloud rescue: failed event → cloud remediation JSON → retry with pivot. Verified 1/3 rescues succeeded on stress_01 (Gary IN → South Bend IN pivot).
    • scripts/kb_measure.py aggregator. Unit tests: kb.test.ts — 4/4 pass (signature stability, role/city/count invariants, digest shape).
  • Phase 23: Staffer identity + competence-weighted retrieval (2026-04-21)
    • ScenarioSpec gained contract: ContractTerms and staffer: Staffer { id, name, tenure_months, role, tool_level }.
    • tool_level runtime overrides: full / local / basic / minimal. Basic + minimal route executor to Ollama Cloud kimi-k2.5 (kimi-k2.6 pending pro-tier upgrade).
    • data/_kb/staffers.jsonl — competence_score = 0.45·fill + 0.20·turn_eff + 0.20·cite + 0.15·rescue. Recomputed per run.
    • findNeighbors now returns weighted_score = cosine × max_staffer_competence. scripts/kb_staffer_report.py — leaderboard + cross-staffer worker overlap (Rachel D. Lewis 12× across 4 staffers → auto-discovered high-value label).
    • gen_staffer_demo.ts + run_staffer_demo.sh — 4 personas × 3 contracts = 12 runs.
  • Phase 27: Playbook versioning (2026-04-21)
    • PlaybookEntry gained version: u32 (default 1), parent_id, superseded_at, superseded_by fields. All #[serde(default)] so entries persisted before Phase 27 load as roots with version=1.
    • PlaybookMemory::revise_entry(parent_id, new_entry) appends a new version, stamps superseded_at+superseded_by on the parent, inherits parent_id and sets version = parent + 1 on the new entry. Rejects revising a retired or already-superseded parent with a clear error — the tip of the chain is the only valid revise target.
    • PlaybookMemory::history(playbook_id) returns the full chain root→tip, walking parent_id backward then superseded_by forward. Cycle-safe. Works from any node in the chain.
    • Superseded entries excluded from boost (same rule as retired): compute_boost_for_filtered_with_role, the active-entries prefilter, the geo-index rebuild, and the upsert existing-entry search all skip superseded_at.is_some().
    • Endpoints: POST /vectors/playbook_memory/revise + GET /vectors/playbook_memory/history/{id}.
    • status_counts now returns (total, retired, superseded, failures). /status JSON reports superseded as a distinct counter; active = total - retired - superseded.
    • 8 unit tests under mod version_tests covering: chain-metadata stamping, retired-parent rejection, already-superseded-parent rejection, superseded endorsement exclusion from boost, history traversal from root/tip/middle, empty-for-unknown-id, superseded-status-count, legacy-entry-default-version round-trip. 26/26 playbook_memory tests pass.
  • Phase 24: Observer / Autotune integration (2026-04-20, commit b95dd86)
    • Closed the gap where lakehouse-observer.service wrapped MCP :3700 while tests/multi-agent/scenario.ts hit gateway :3100 directly — observer sat idle at 0 ops across 3600+ cycles.
    • observer.ts gained a Bun HTTP listener on OBSERVER_PORT (default 3800) with GET /health, GET /stats (totals + by_source + recent scenario digest), and POST /event for scenario outcomes. Body shapes into ObservedOp with source="scenario" + staffer_id + sig_hash + event_kind + geo + rescue flags.
    • recordExternalOp() shared ring-buffer insert — ERROR_ANALYZER and PLAYBOOK_BUILDER loops now see both MCP-wrapped and scenario-posted ops through the same path.
    • persistOp() swap: old path wrote via /ingest/file?name=observed_operations which has REPLACE semantics (wiped prior ops); now uses append-friendly Parquet write-through.
  • Phase 25: Validity windows + playbook retirement (2026-04-21, commit e0a843d)
    • PlaybookEntry gained four optional fields (#[serde(default)] so legacy entries load as never-expiring): schema_fingerprint (SHA-256 over target dataset columns at seed time), valid_until (RFC3339 hard expiry), retired_at (set by retire calls), retirement_reason (human string).
    • compute_boost_for_filtered_with_role now skips retired + expired entries before geo/cosine — no silent boosting from stale playbooks. Unit-tested on expired valid_until + retired + schema-drift retirement.
    • Two retirement paths: retire_one(playbook_id, reason) for manual, retire_on_schema_drift(city, state, current_fingerprint, reason) for batch schema-migration sweep. Legacy entries without a fingerprint skip drift retirement (safe).
    • Endpoint: POST /vectors/playbook_memory/retire — accepts either {playbook_id, reason} or {city, state, current_schema_fingerprint, reason}.
  • Phase 26: Mem0 upsert + Letta geo hot cache (2026-04-21, commit 640db8c)
    • Mem0-style upsert: /seed with append=true (default) routes through upsert_entry, which decides ADD / UPDATE / NOOP on (operation, day, city, state). Same-day re-seed merges names (union, stable order) instead of duplicating the row. Identical re-seed is a no-op. Different-day same-operation is a fresh ADD. Playbook_id stays stable on UPDATE so prior citations remain valid.
    • Letta-style hot cache: PlaybookMemory now holds a geo_index: HashMap<(city_lower, state_upper), Vec<entry_idx>> rebuilt on every mutation. Geo-filtered boost queries skip the full scan and hit the O(1) key lookup. At 1.9K entries the full scan was sub-ms; the index scales the same path to 100K+ without code changes.
    • UpsertOutcome enum reported back to callers — {mode: added|updated|noop, playbook_id, merged_names?} + entries_after.
  • Phase 37: Hot-swap async (2026-04-22)
    • Extended JobTracker with JobType::ProfileActivation + Embed enum variants
    • Made activate_profile return immediately with job_id, work runs in background via tokio::spawn
    • Background jobs tracked via POST /vectors/jobs/{id} + GET /vectors/jobs
  • Phase 38: Universal API Skeleton (2026-04-23)
    • /v1/chat — OpenAI-compatible POST, forwards to local Ollama or Ollama Cloud
    • /v1/usage — returns {requests, prompt_tokens, completion_tokens, total_tokens}
    • /v1/sessions — returns {data: [], note: "Phase 38: stateless"}
    • Langfuse trace integration (fire-and-forget, Phase 40 early)
    • 12 unit tests green, curl gates pass
  • Phase 39: Provider Adapter Refactor (2026-04-23)
    • ProviderAdapter trait with chat() + embed() + unload() + health()
    • OllamaAdapter — wraps existing AiClient
    • OpenRouterAdapter — HTTP client to openrouter.ai
    • provider_key() routing by model prefix (openrouter/* → OpenRouter)
  • Phase 40: Routing & Policy Engine (2026-04-23)
    • RoutingEngine with RouteDecision in aibridge::routing
    • config/routing.toml — rules by model_pattern, fallback chain, cost gating
    • Per-provider usage tracking: Usage.by_provider
    • 12 gateway tests green, curl gates pass
  • Fine-tuned domain models (Phase 25+)
  • Multi-node query distribution (only if ceilings bite)

145 unit tests | 13 crates | 21 ADRs | 2.47M rows | 100K vectors | Hybrid Parquet+HNSW ⊕ Lance | Phases 0-27 shipped Latest: 2026-04-21 — Phase 27 (playbook versioning: version + parent_id + superseded_at + superseded_by on PlaybookEntry, /revise + /history endpoints, 8 new tests). Doc-sync pass: Phase 24 observer + Phase 25 validity windows + Phase 26 Mem0/Letta now reflected in phase tracker. Phase 19.6 time decay noted as wired (was misdocumented as deferred). Phase E.2 tombstone-at-compaction noted as closed in Phase 8 MVP limits.