lakehouse/docs/PHASES.md

# Phase Tracker

## Phase 0: Bootstrap ✅
- [x] Cargo workspace with all crate stubs compiling
- [x] `shared` crate: error types, ObjectRef, DatasetId
- [x] `gateway` with Axum: GET /health → 200
- [x] tracing + tracing-subscriber wired in gateway
- [x] justfile with build, test, run recipes
- [x] docs committed to git

## Phase 1: Storage + Catalog ✅
- [x] storaged: object_store backend init (LocalFileSystem)
- [x] storaged: Axum endpoints (PUT/GET/DELETE/LIST)
- [x] shared/arrow_helpers.rs: RecordBatch ↔ Parquet + schema fingerprinting
- [x] catalogd/registry.rs: in-memory index + manifest persistence
- [x] catalogd service: POST/GET /datasets + by-name
- [x] gateway routes wired

## Phase 2: Query Engine ✅
- [x] queryd: SessionContext + object_store config
- [x] queryd: ListingTable from catalog ObjectRefs
- [x] queryd service: POST /query/sql → JSON
- [x] queryd → catalogd wiring
- [x] gateway routes /query

## Phase 3: AI Integration ✅
- [x] Python sidecar: FastAPI + Ollama (embed/generate/rerank)
- [x] Dockerfile for sidecar
- [x] aibridge/client.rs: HTTP client
- [x] aibridge service: Axum proxy endpoints
- [x] Model config via env vars

## Phase 4: Frontend ✅
- [x] Dioxus scaffold, WASM build
- [x] Ask tab: natural language → AI SQL → results
- [x] Explore tab: dataset browser + AI summary
- [x] SQL tab: raw DataFusion editor
- [x] System tab: health checks for all services

## Phase 5: Hardening ✅
- [x] Proto definitions (lakehouse.proto)
- [x] Internal gRPC: CatalogService on :3101
- [x] OpenTelemetry tracing: stdout exporter
- [x] Auth middleware: X-API-Key (toggleable)
- [x] Config-driven startup: lakehouse.toml

## Phase 6: Ingest Pipeline ✅
- [x] CSV ingest with auto schema detection
- [x] JSON ingest (array + newline-delimited, nested flattening)
- [x] PDF text extraction (lopdf)
- [x] Text/SMS file ingest
- [x] Content hash dedup (SHA-256)
- [x] POST /ingest/file multipart upload
- [x] 12 unit tests

## Phase 7: Vector Index + RAG ✅
- [x] chunker: configurable size + overlap, sentence-boundary aware
- [x] store: embeddings as Parquet (binary f32 vectors)
- [x] search: brute-force cosine similarity
- [x] rag: embed → search → retrieve → LLM answer with citations
- [x] POST /vectors/index, /search, /rag
- [x] Background job system with progress tracking
- [x] Dual-pipeline supervisor with checkpointing + retry
- [x] 100K embeddings: 177/sec on A4000, zero failures
- [x] 6 unit tests

## Phase 8: Hot Cache + Incremental Updates ✅
- [x] MemTable hot cache: LRU, configurable max (16GB)
- [x] POST /query/cache/pin, /cache/evict, GET /cache/stats
- [x] Delta store: append-only delta Parquet files
- [x] Merge-on-read: queries combine base + deltas
- [x] Compaction: POST /query/compact
- [x] Benchmarked: 9.8x speedup (1M rows: 942ms → 96ms)

## Phase 8.5: Agent Workspaces ✅
- [x] WorkspaceManager with daily/weekly/monthly/pinned tiers
- [x] Saved searches, shortlists, activity logs per workspace
- [x] Instant zero-copy handoff between agents
- [x] Persistence to object storage, rebuild on startup

## Phase 9: Event Journal ✅
- [x] journald crate: append-only mutation log
- [x] Event schema: entity, field, old/new value, actor, source, workspace
- [x] In-memory buffer with auto-flush to Parquet
- [x] GET /journal/history/{entity_id}, /recent, /stats
- [x] POST /journal/event, /update, /flush

## Phase 10: Rich Catalog v2 ✅
- [x] DatasetManifest: description, owner, sensitivity, columns, lineage, freshness, tags
- [x] PII auto-detection: email, phone, SSN, salary, address, medical
- [x] Column-level metadata with sensitivity flags
- [x] Lineage tracking: source_system → ingest_job → dataset
- [x] PATCH /catalog/datasets/by-name/{name}/metadata
- [x] Backward compatible (serde default)

## Phase 11: Embedding Versioning ✅
- [x] IndexRegistry: model_name, model_version, dimensions per index
- [x] Index metadata persisted as JSON, rebuilt on startup
- [x] GET /vectors/indexes — list all (filter by source/model)
- [x] GET /vectors/indexes/{name} — metadata
- [x] Background jobs auto-register metadata on completion

## Phase 12: Tool Registry ✅
- [x] 6 built-in staffing tools (search_candidates, get_candidate, revenue_by_client, recruiter_performance, cold_leads, open_jobs)
- [x] Parameter validation + SQL template substitution
- [x] Permission levels: read / write / admin
- [x] Full audit trail per invocation
- [x] GET /tools, GET /tools/{name}, POST /tools/{name}/call, GET /tools/audit

## Phase 13: Security & Access Control ✅
- [x] Role-based access: admin, recruiter, analyst, agent
- [x] Field-level sensitivity enforcement
- [x] Column masking determination per agent
- [x] Query audit logging
- [x] GET/POST /access/roles, GET /access/audit, POST /access/check

## Phase 14: Schema Evolution ✅
- [x] Schema diff detection (added, removed, type changed, renamed)
- [x] Fuzzy rename detection (shared word parts)
- [x] Auto-generated migration rules with confidence scores
- [x] AI migration prompt builder for complex cases
- [x] 5 unit tests

## Phase 15+: Horizon
- [x] HNSW vector index with iteration-friendly trial system (2026-04-16)
  - `HnswStore.build_index_with_config` — parameterized ef_construction, ef_search, seed
  - `EmbeddingCache` — pins 100K vectors in memory, shared across trials
  - `harness::EvalSet` — named query sets with brute-force ground truth
  - `TrialJournal` — append-only JSONL at `_hnsw_trials/{index}.jsonl`
  - Endpoints: `/vectors/hnsw/trial`, `/hnsw/trials/{idx}`, `/hnsw/trials/{idx}/best?metric={recall|latency|pareto}`, `/hnsw/evals`, `/hnsw/evals/{name}/autogen`, `/hnsw/cache/stats`
  - Measured on 100K resumes: **brute-force 44-54ms → HNSW 509us-1830us**, recall 0.92-1.00 depending on `ef_construction`. Sweet spot: ec=80 es=30 → p50=873us recall=1.00 — locked in as `HnswConfig::default()`
- [x] Catalog manifest repair — `POST /catalog/resync-missing` restores row_count and columns from parquet footers (2026-04-16). All 7 staffing tables recovered to PRD-matching 2.47M rows.
- [~] Federated multi-bucket query — **foundation complete 2026-04-16**, see ADR-017
  - [x] `StorageConfig.buckets` + `rescue_bucket` + `profile_root` config shape
  - [x] `SecretsProvider` trait + `FileSecretsProvider` (reads /etc/lakehouse/secrets.toml, checks 0600 perms)
  - [x] `storaged::BucketRegistry` — multi-backend, rescue-aware, reachability probes
  - [x] `storaged::error_journal::ErrorJournal` — append-only JSONL at `primary://_errors/bucket_errors.jsonl`
  - [x] Endpoints: `GET /storage/buckets`, `GET /storage/errors`, `GET /storage/bucket-health`
  - [x] Bucket-aware I/O: `PUT/GET /storage/buckets/{bucket}/objects/{*key}` with rescue fallback + `X-Lakehouse-Rescue-Used` observability headers
  - [x] Backward compat: empty `[[storage.buckets]]` synthesizes a `primary` from legacy `root`
  - [x] Three-bucket test (primary + rescue + testing) verified: normal reads, rescue fallback with headers, hard-fail missing, write to unknown bucket 503, error journal + health summary
  - [x] `X-Lakehouse-Bucket` header middleware on ingest endpoints (2026-04-16)
  - [x] Catalog migration: `POST /catalog/migrate-buckets` stamps `bucket = "primary"` on legacy refs (12 renamed, 14 total now canonical)
  - [x] `queryd` registers every bucket with DataFusion for cross-bucket SQL — verified with people_test (testing) × animals (primary) CROSS JOIN
  - [x] Profile hot-load endpoints: bucket auto-provisioning on `POST /vectors/profile/{id}/activate` (2026-04-17)
  - [x] `vectord` bucket-scoped paths: TrialJournal + PromotionRegistry resolve per-index via IndexMeta.bucket (2026-04-17)
  - [x] Runtime bucket lifecycle: `POST /storage/buckets` (provision) + `DELETE /storage/buckets/{name}` (unregister, refuses primary/rescue) (2026-04-17)
  - [x] ModelProfile.bucket field — per-profile artifact isolation (2026-04-17)
- [x] Database connector ingest (Postgres first) — 2026-04-16
  - `pg_stream::stream_table_to_parquet` — ORDER BY + LIMIT/OFFSET pagination, configurable batch_size
  - `parse_dsn` — postgresql:// and postgres:// URL scheme, user/password/host/port/db
  - `POST /ingest/db` endpoint: `{dsn, table, dataset_name?, batch_size?, order_by?, limit?}` → streams to Parquet, registers in catalog with PII detection + redacted-password lineage
  - Existing `POST /ingest/postgres/import` (structured config) preserved alongside
  - 4 DSN-parser unit tests + live end-to-end test against `knowledge_base.team_runs` (586 rows, 13 cols, 6 batches, 196ms)
- [x] Phase B: Lance storage evaluation — 2026-04-16
  - `crates/lance-bench` standalone pilot (Lance 4.0) avoids DataFusion/Arrow version conflict with main stack
  - 8-dimension benchmark on resumes_100k_v2 — see docs/ADR-019-vector-storage.md for scorecard
  - Decision: hybrid architecture. Parquet+HNSW stays primary (2.55× faster search at 100K in-RAM). Lance added as per-profile second backend for random access (112× faster), append (0.08s vs full rewrite), hot-swap (14× faster index builds), and scale past 5M RAM ceiling.
- [x] Phase E.2 — Compaction integrates tombstones (physical deletion) — 2026-04-16
  - `delta::compact` accepts `tombstones: &[Tombstone]` param, filters rows at merge time via arrow `filter_record_batch`
  - CompactResult gains `tombstones_applied` + `rows_dropped_by_tombstones`
  - Atomic write: ArrowWriter → single Parquet file (fixes latent bug where concatenated Parquet byte streams produced garbage — footer-only-first-segment visible), verify-parse before overwrite, temp_key staging, delete delta files AFTER base write succeeds
  - Snappy compression on output matches ingest defaults (avoids 3× size inflation on every compact)
  - `TombstoneStore::clear` drops all batch files for a dataset; called by queryd after successful compact
  - Query engine exposes `catalog()` accessor so service handler can reach the tombstone store
  - E2E verified on candidates (100K rows): tombstone 3 IDs → compact → 99,997 rows physically in parquet, tombstones empty, IDs gone from `__raw__candidates` too; file size 10.59 MB → 10.72 MB (proportional to data, not inflated)
- [x] Phase 16: Hot-swap generations + autotune agent — 2026-04-16
  - `vectord::promotion::PromotionRegistry` — per-index current config + history at `_hnsw_promotions/{index}.json`, cap 50 history entries
  - Endpoints: `POST /vectors/hnsw/promote/{index}/{trial_id}`, `POST /vectors/hnsw/rollback/{index}`, `GET /vectors/hnsw/promoted/{index}`
  - `vectord::autotune::run_autotune` — grid of trials (configurable or default 5 configs), Pareto winner selection (max recall, then min p50), min_recall safety gate (default 0.9), config bounds (ec ∈ [10,400], es ∈ [10,200])
  - `POST /vectors/hnsw/autotune` — runs the full loop synchronously, journals every trial, auto-promotes winner
  - `activate_profile` uses `promotion_registry.config_or(..., profile_default)` so newly-promoted configs flow automatically into next activation
  - End-to-end: autogen harness for threat_intel_v1 (10 queries), autotune ran 5 trials (all recall=1.00, p50 64-68us), promoted ec=20 es=30 at recall=1.0 p50=64us as winner. Manual promote of ec=80 es=30 pushed autotune pick onto history. Rollback restored autotune winner. Second rollback cleared to None. Re-promote + restart verified persistence. Activation after promotion logged "building HNSW ef_construction=80 ef_search=30 seed=42" — config flowed through correctly.
- [x] Phase 17: Model profiles + scoped search — 2026-04-16
  - `shared::types::ModelProfile` — { id, ollama_name, description, bound_datasets, hnsw_config, embed_model, created_at, created_by }
  - `shared::types::ProfileHnswConfig` — mirror of vectord's HnswConfig to avoid cross-crate dep cycle (defaults ec=80 es=30 matching Phase 15 winner)
  - `Registry::{put_profile, get_profile, list_profiles, delete_profile}` persisted at `_catalog/profiles/{id}.json`, validates bindings exist (raw dataset OR AiView)
  - Endpoints: `POST/GET /catalog/profiles`, `GET/DELETE /catalog/profiles/{id}`
  - `POST /vectors/profile/{id}/activate` — warms EmbeddingCache + builds HNSW with profile's config for every bound dataset's vector index; reports warmed indexes + failures + duration
  - `POST /vectors/profile/{id}/search` — rejects 403 if requested index's source isn't in profile.bound_datasets; falls through to HNSW if warm, brute-force otherwise
  - Fixed refresh to register new index metadata (was silently no-op for first-time indexes)
  - End-to-end: security-analyst profile bound to threat_intel → activate warms 54 vectors in 156ms → within-scope HNSW search works (0.625 score); out-of-scope search for candidates returns 403 with allowed bindings listed
- [x] Phase E: Soft deletes (tombstones) — 2026-04-16
  - `shared::types::Tombstone` — { dataset, row_key_column, row_key_value, deleted_at, actor, reason }
  - `catalogd::tombstones::TombstoneStore` per-dataset append-log at `_catalog/tombstones/{dataset}/`, flush_threshold=1 + explicit flush so every tombstone is durable on return (compliance requirement)
  - All tombstones for a dataset must share the same `row_key_column` (validated at write — query filter is built as a single WHERE NOT IN clause)
  - `Registry::add_tombstone / list_tombstones`
  - Endpoint: `POST /catalog/datasets/by-name/{name}/tombstone` accepting `{row_key_column, row_key_values[], actor, reason}`; companion `GET` lists active tombstones
  - `queryd::context::build_context` wraps tombstoned tables: raw goes to `__raw__{name}`, public name becomes a DataFusion view with `WHERE CAST(col AS VARCHAR) NOT IN (...)` filter
  - End-to-end on candidates: tombstone 3 IDs, COUNT drops 100,000 → 99,997, specific WHERE returns empty, AiView candidates_safe transitively excludes them too, restart preserves all tombstones
  - Limits / not in MVP: journal integration (tombstones don't yet emit Phase 9 mutation events — covered by audit fields on the tombstone itself). Physical compaction integration was closed by Phase E.2 below.
- [x] Phase D: AI-safe views — 2026-04-16
  - `shared::types::AiView` — name, base_dataset, columns whitelist, optional row_filter, column_redactions
  - `shared::types::Redaction` — Null | Hash | Mask { keep_prefix, keep_suffix }
  - `Registry::put_view / get_view / list_views / delete_view` persisted to `_catalog/views/{name}.json`
  - `queryd::context` registers each view as a DataFusion view with the safe projection + filter + redactions baked into the SELECT
  - Endpoints: `POST/GET /catalog/views`, `GET/DELETE /catalog/views/{name}`
  - End-to-end on candidates: `candidates_safe` view exposes 8 of 15 columns, masks `candidate_id` (CAN******01), filters out `status='blocked'`. `SELECT * FROM candidates_safe` returns whitelist only; `SELECT email FROM candidates_safe` fails. View survives restart.
  - Capability surface — raw `candidates` still accessible by name; Phase 13 access control is the layer that enforces who can query what
- [x] Phase C: Decoupled embedding refresh — 2026-04-16
  - `DatasetManifest`: `last_embedded_at`, `embedding_stale_since`, `embedding_refresh_policy` (Manual | OnAppend | Scheduled)
  - `Registry::mark_embeddings_stale` / `clear_embeddings_stale` / `stale_datasets`
  - Ingest paths (CSV pipeline + Postgres streaming) auto-mark-stale when writing to an already-embedded dataset
  - `vectord::refresh::refresh_index` — reads dataset, diffs doc_ids vs existing embeddings, embeds only new rows, writes combined index, clears stale
  - `POST /vectors/refresh/{dataset}` + `GET /vectors/stale`
  - Id columns accept `Utf8`, `Int32`, `Int64`
  - End-to-end on threat_intel: initial 20-row embed 2.1s; re-ingest to 54 rows auto-marks stale; delta refresh embeds only 34 new in 970ms (6× faster than full re-embed); stale cleared
- [x] Phase 16.2/16.5: Background autotune agent + ingest-triggered re-trials — 2026-04-17
  - `vectord::agent` — ε-greedy proposer, rate-limited, cooldown-gated, tokio background task
  - Ingest paths push `DatasetAppended` triggers to agent queue
  - Endpoints: `GET /vectors/agent/status`, `POST /vectors/agent/stop`, `POST /vectors/agent/enqueue/{idx}`
  - `[agent]` config section in lakehouse.toml (enabled, cycle_interval, cooldown, min_recall, max_trials/hr)
  - 3 unit tests
- [x] Phase 17 VRAM gate: Two-profile sequential swap — 2026-04-17
  - Sidecar: `POST /admin/unload` (keep_alive=0), `POST /admin/preload`, `GET /admin/vram` (nvidia-smi + Ollama /api/ps)
  - `AiClient::unload_model / preload_model / vram_snapshot`
  - `VectorState.active_profile` singleton — activate swaps models, deactivate unloads
  - Verified: staffing-recruiter (qwen2.5) ↔ docs-assistant (mistral) — only one model in VRAM at a time
- [x] MySQL streaming connector — 2026-04-17
  - `my_stream.rs` mirrors pg_stream: DSN parsing, OFFSET pagination, Arrow type mapping, Parquet streaming
  - `POST /ingest/mysql` with PII detection, lineage, agent trigger
  - Verified end-to-end on live MariaDB (10 rows, 9 columns, round-tripped all types)
  - 6 DSN + type-mapping unit tests
- [x] Phase 18 hybrid: vectord-lance production crate — 2026-04-17
  - Firewall crate (Arrow 57 / Lance 4, separate from main Arrow 55 / DF 47 stack)
  - Public API: migrate_from_parquet, build_index (IVF_PQ), search, get_by_doc_id, append, build_scalar_index, stats
  - `lance_backend::LanceRegistry` resolves bucket → URI per index
  - `VectorBackend { Parquet | Lance }` enum on ModelProfile + IndexMeta
  - 8 HTTP endpoints under `/vectors/lance/*` (migrate, index, search, doc, append, stats, scalar-index, recall)
  - Profile-driven routing: `POST /vectors/profile/{id}/search` auto-routes to Lance when profile.vector_backend=lance
  - Auto-migrate + auto-index on activation
  - Measured on real 100K × 768d: migrate 0.57s, IVF_PQ build 16.2s (14× faster than HNSW 230s), search 23ms, append 100 rows 3.3ms, doc_id fetch 3.5ms (with scalar btree)
  - IVF_PQ recall@10 = 0.805 with Lance's default `nprobes=1` (the hidden cap — see 2026-04-20 tuning work below, which lifts it to 1.000). Measured via `/vectors/lance/recall/{idx}` harness.
- [x] Phase E.3: Scheduled ingest — 2026-04-17
  - `ingestd::schedule` module: ScheduleDef, ScheduleStore (JSON at `_schedules/{id}.json`), Scheduler tokio task
  - Supports MySQL + Postgres sources on interval triggers (Cron variant defined, parsing stubbed)
  - 6 CRUD endpoints under `/ingest/schedules/*` + run-now manual trigger
  - Full catalog integration: PII, lineage, mark-stale, agent trigger
  - 6 unit tests
- [x] PDF OCR via Tesseract — 2026-04-17
  - Two-tier: lopdf text extraction → Tesseract 5.5 fallback for scanned/image PDFs
  - Extracts embedded XObject /Image streams, shells to tesseract --oem 3 --psm 6
  - Same schema (source_file, page_number, text_content) — downstream unchanged
- [x] Catalog hygiene — idempotent `register()` + dedupe + DELETE (2026-04-19, ADR-020)
  - `catalogd::Registry::register` now gates on `(name, schema_fingerprint)`: same fp → reuse `DatasetId` and update objects in place; different fp → return error (409 Conflict on HTTP, `FAILED_PRECONDITION` on gRPC). First-time registration is unchanged.
  - `POST /catalog/dedupe` one-shot operator endpoint collapses pre-existing duplicates; winner = non-null `row_count` first, newest `updated_at` second.
  - `DELETE /catalog/datasets/by-name/{name}` removes the manifest from both in-memory registry and object storage (metadata-only — parquet files, vector indexes, tombstones are NOT cascade-deleted). Added to support test-harness cleanup; also plugs a real catalog hole where zombie entries from prior deletes would break DataFusion schema inference.
  - Cleanup run on live catalog: 374 → 31 datasets, 343 orphan manifests removed, 0 errors. 308× `successful_playbooks` was the worst offender.
  - Concurrency: write lock held across storage I/O in `register()` to close the check→insert TOCTOU window (32-worker multi-threaded stress test verifies single-manifest invariant).
  - End-to-end verification: `scripts/e2e_pipeline_check.sh` runs 31 assertions across 12 pipeline stages (ingest → catalog → SQL+JOIN → dedup → idempotency → metadata → PII → vector embed → semantic search → cleanup) against the live gateway. Idempotent across repeat runs.
  - Tests: 11 new in `catalogd` (was 0, includes 3 concurrency tests + 3 delete_dataset tests); 11 new in `storaged` for `AppendLog` + `ErrorJournal` (was 0). Fixed a broken doctest in `append_log.rs`.
- [x] Autotune agent: portfolio rotation + auto-bootstrap (2026-04-20)
  - `pick_periodic_target` now sources candidates from `IndexRegistry` (not just promoted indexes) and picks least-recently-tuned, so trial budget spreads across every index with ≥1000 vectors instead of fixating on one converged champion.
  - `run_one_cycle` bootstraps on first visit: `ensure_auto_harness` auto-generates `{index}_auto` (20 synthetic self-queries, k=10, brute-force ground truth) if missing, then seeds with `HnswConfig::default()` (ec=80/es=30).
  - Regression fix: `harness::recall_at_k` now uses set-intersection semantics. The prior impl counted duplicates in `predicted` — on corpora with repeated chunks (`kb_response_cache_agent`) this inflated recall above 1.0 and poisoned promotion decisions. +7 unit tests.
- [x] Scheduled ingest: real cron parsing (2026-04-20)
  - Vixie-compatible 5/6-field cron via `croner` crate. Day-of-week follows Unix convention (`1-5` = Mon-Fri). 6-field adds seconds granularity.
  - `validate_trigger` in `ingestd::schedule` — create/patch handlers reject malformed expressions with `400 BAD_REQUEST` at creation time, not silently at fire time.
  - Swapped away from the `cron` crate (0.16) which uses a non-Unix DOW convention (`1=Sun`) that would silently bite anyone writing `1-5` expecting weekdays. +9 unit tests.
- [x] EvalSets federation (2026-04-20)
  - `harness::HarnessStore` mirrors the TrialJournal / PromotionRegistry federation pattern: eval artifacts colocate with each index's recorded bucket; legacy evals in primary remain discoverable via a fallback path; cross-bucket listing dedupes.
  - Every eval callsite (service.rs × 5, agent.rs × 3, autotune.rs × 1) now routes through `HarnessStore`. `VectorState` and `AgentDeps` each hold a shared instance.
- [x] Index bucket-migrate PATCH (2026-04-20)
  - `PATCH /vectors/indexes/{name}/bucket` copies an index's vector parquet + trial-journal batches + promotion file + auto-harness to `dest_bucket`, flips `IndexMeta.bucket` as the commit point, and evicts the `EmbeddingCache` so next load reads from the new bucket. Optional `delete_source: true` sweeps source artifacts.
  - Lance-backed indexes refused with 400 — Lance URIs are bucket-specific and require rewriting the dataset, separate story. Round-trip verified: 390 artifacts, 0.04s.
- [x] IVF_PQ recall tuning (2026-04-20)
  - `LanceVectorStore::search` now accepts optional `nprobes` + `refine_factor`. Lance's built-in `nprobes=1` default was the hidden cap on recall — on 316-partition `resumes_100k_v2` it searched only 0.3% of partitions per query.
  - Server defaults (`LANCE_DEFAULT_NPROBES=20`, `LANCE_DEFAULT_REFINE_FACTOR=5`) flow through the scoped-search path and the autotune harness. Measured on `resumes_100k_v2`: recall `0.805 → 1.000` at p50 ≈ 7.4ms. Even `nprobes=5, refine=5` saturates recall at p50 ≈ 4.7ms.
  - `/vectors/lance/recall/{idx}` accepts per-request `nprobes` / `refine_factor` so operators can sweep the curve.
- [x] **Phase 19: Playbook memory (meta-index)** — the feedback loop originally implied by the PRD but never built. Playbooks stop being write-only; they start shaping future rankings. (2026-04-20)
  - [x] 19.1 — `POST /vectors/playbook_memory/rebuild` scans `successful_playbooks` via DataFusion, builds one `PlaybookEntry` per row (operation + approach + context embedded as one vector via nomic-embed-text)
  - [x] 19.2 — Brute-force cosine search over in-memory embeddings (chosen over HNSW: successful_playbooks maxes around thousands of rows, overhead of a second indexed surface isn't worth it until that ceiling bites)
  - [x] 19.3 — Endorsed names parsed out of `result` column, keyed by `(city, state, name)` tuple so shared names across cities don't cross-pollinate. Parsing via `parse_names` + `parse_city_state` helpers (7 unit tests)
  - [x] 19.4 — `/vectors/hybrid?use_playbook_memory=true`: fetches `top_k * 5` candidates so endorsed workers outside the vanilla top-K can still climb. Boost is additive on vector score, each hit carries `playbook_boost` + `playbook_citations` in the response for explainability
  - [x] 19.5 — Multi-agent orchestrator (`tests/multi-agent/orchestrator.ts`) auto-seeds `POST /vectors/playbook_memory/seed` on consensus_done, so the next query sees the new endorsement without a full `/rebuild`. Closes the feedback loop: two agents reach consensus → playbook sealed → next query re-ranks
  - [x] 19.6 — `MAX_BOOST_PER_WORKER = 0.25` enforced in `compute_boost_for`; verified with unit test (100 identical playbooks → boost capped at 0.25) and live test (5 identical seeds → exactly 0.25). Time decay also wired: `BOOST_HALF_LIFE_DAYS = 30.0` — 30-day-old playbooks contribute half, 60-day a quarter, via `exp(-age_days / 30)` in the boost loop
  - Real finding surfaced during build: the 32 bootstrap rows in `successful_playbooks` reference phantom worker names — 80 of 82 don't correspond to actual rows in `workers_500k`. `/seed` endpoint bypasses `successful_playbooks` so operators can prime memory with real fixtures; production path is the orchestrator write-through
- [x] **Phase 19 refinement — geo + role prefilter on boost** (2026-04-21)
  - Added `compute_boost_for_filtered` and `compute_boost_for_filtered_with_role` to `playbook_memory.rs`. SQL filter's `(city, state, role)` parsed in `service.rs`; exact role-matches in target geo skip cosine and earn similarity=1.0. Restored the feedback loop: matched=0 → matched=11 per query on the same Nashville test. Citation density on Riverfront Steel: 2 → 28 per run (14×).
  - Rust unit tests: `extractor_tests::extract_target_geo_basic/_missing_state/_word_boundary`, `extract_target_role_basic/_none/_multi_word`. 6/6 pass.
  - Diagnostic log: `playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=?` on every call.
- [x] **Phase 20: Model Matrix + Overseer Tiers** (2026-04-21)
  - `config/models.json` — 5 tiers (t1_hot / t2_review / t3_overview / t4_strategic / t5_gatekeeper), each with context_window + context_budget + overflow_policy. Ollama Cloud bearer key from `/root/llm_team_config.json`.
  - Hot path: qwen3.5:latest + qwen3:latest local with `think:false`. Mistral dropped after 0/14 fill on complex scenarios.
  - T3 cloud: gpt-oss:120b via Ollama Cloud — verified 4-8s latency, strict JSON-shape output for remediation.
- [x] **Phase 21: Scratchpad + Tree-Split Continuation** (2026-04-21)
  - `tests/multi-agent/agent.ts`: `estimateTokens()`, `assertContextBudget()`, `generateContinuable()`, `generateTreeSplit()`. `think` flag plumbed through sidecar's `/generate`. Empty-response backoff + truncation-continuation, no max_tokens tourniquet.
  - Rust port shipped (2026-04-21, companion to Phase 27):
    - `crates/aibridge/src/context.rs` — `estimate_tokens` (chars/4 ceil, matches TS), `context_window_for`, `assert_context_budget` returning `Result<BudgetCheck, (BudgetCheck, usize over_by)>` so callers get the numbers back on both success and overflow. Windows table mirrors `config/models.json`.
    - `crates/aibridge/src/continuation.rs` — `generate_continuable<G: TextGenerator>` handles the two failure modes from TS: (a) empty thinking-model response → geometric-backoff retry with 2× budget up to `budget_cap`; (b) truncated non-empty → continuation with partial-as-scratchpad. `is_structurally_complete` balances braces then JSON.parse-check for the JSON shape; text shape is "non-empty". Guards the degen case "all retries empty → bail, don't loop on empty partial" — the TS impl has this implicit, Rust makes it explicit.
    - `crates/aibridge/src/tree_split.rs` — `generate_tree_split` map→reduce with running scratchpad. Per-shard + reduce-prompt budget checked through `assert_context_budget`; loud-fails with the overflow message rather than silently truncating. Scratchpad oldest-digest-first truncation once it exceeds `scratchpad_budget` (default 6000 tokens, matches TS).
    - `TextGenerator` trait (native async-fn-in-trait, edition 2024) so `ScriptedGenerator` test double can inject canned sequences without a live Ollama. `AiClient` implements `TextGenerator`.
    - `GenerateRequest` gained `think: Option<bool>` field — forwards to sidecar for per-call hidden-reasoning opt-out on hot-path JSON emitters.
    - 25 aibridge unit tests (8 context + 10 continuation + 7 tree_split) — all green, no network required.
- [x] **Phase 22: Internal Knowledge Library** (2026-04-21)
  - `data/_kb/` — signatures.jsonl, outcomes.jsonl, pathway_recommendations.jsonl, error_corrections.jsonl, config_snapshots.jsonl. Event-driven cycle: indexRun → recommendFor → loadRecommendation.
  - Item B cloud rescue: failed event → cloud remediation JSON → retry with pivot. Verified 1/3 rescues succeeded on stress_01 (Gary IN → South Bend IN pivot).
  - `scripts/kb_measure.py` aggregator. Unit tests: `kb.test.ts` — 4/4 pass (signature stability, role/city/count invariants, digest shape).
- [x] **Phase 23: Staffer identity + competence-weighted retrieval** (2026-04-21)
  - ScenarioSpec gained `contract: ContractTerms` and `staffer: Staffer { id, name, tenure_months, role, tool_level }`.
  - tool_level runtime overrides: full / local / basic / minimal. Basic + minimal route executor to Ollama Cloud `kimi-k2.5` (kimi-k2.6 pending pro-tier upgrade).
  - `data/_kb/staffers.jsonl` — competence_score = 0.45·fill + 0.20·turn_eff + 0.20·cite + 0.15·rescue. Recomputed per run.
  - `findNeighbors` now returns `weighted_score = cosine × max_staffer_competence`. `scripts/kb_staffer_report.py` — leaderboard + cross-staffer worker overlap (Rachel D. Lewis 12× across 4 staffers → auto-discovered high-value label).
  - `gen_staffer_demo.ts` + `run_staffer_demo.sh` — 4 personas × 3 contracts = 12 runs.
- [x] **Phase 27: Playbook versioning** (2026-04-21)
  - `PlaybookEntry` gained `version: u32` (default 1), `parent_id`, `superseded_at`, `superseded_by` fields. All `#[serde(default)]` so entries persisted before Phase 27 load as roots with version=1.
  - `PlaybookMemory::revise_entry(parent_id, new_entry)` appends a new version, stamps `superseded_at`+`superseded_by` on the parent, inherits `parent_id` and sets `version = parent + 1` on the new entry. Rejects revising a retired or already-superseded parent with a clear error — the tip of the chain is the only valid revise target.
  - `PlaybookMemory::history(playbook_id)` returns the full chain root→tip, walking `parent_id` backward then `superseded_by` forward. Cycle-safe. Works from any node in the chain.
  - Superseded entries excluded from boost (same rule as retired): `compute_boost_for_filtered_with_role`, the active-entries prefilter, the geo-index rebuild, and the upsert existing-entry search all skip `superseded_at.is_some()`.
  - Endpoints: `POST /vectors/playbook_memory/revise` + `GET /vectors/playbook_memory/history/{id}`.
  - `status_counts` now returns `(total, retired, superseded, failures)`. `/status` JSON reports `superseded` as a distinct counter; `active = total - retired - superseded`.
  - 8 unit tests under `mod version_tests` covering: chain-metadata stamping, retired-parent rejection, already-superseded-parent rejection, superseded endorsement exclusion from boost, history traversal from root/tip/middle, empty-for-unknown-id, superseded-status-count, legacy-entry-default-version round-trip. 26/26 playbook_memory tests pass.
- [x] **Phase 24: Observer / Autotune integration** (2026-04-20, commit b95dd86)
  - Closed the gap where `lakehouse-observer.service` wrapped MCP :3700 while `tests/multi-agent/scenario.ts` hit gateway :3100 directly — observer sat idle at 0 ops across 3600+ cycles.
  - `observer.ts` gained a Bun HTTP listener on `OBSERVER_PORT` (default 3800) with `GET /health`, `GET /stats` (totals + by_source + recent scenario digest), and `POST /event` for scenario outcomes. Body shapes into `ObservedOp` with `source="scenario"` + `staffer_id` + `sig_hash` + `event_kind` + geo + rescue flags.
  - `recordExternalOp()` shared ring-buffer insert — ERROR_ANALYZER and PLAYBOOK_BUILDER loops now see both MCP-wrapped and scenario-posted ops through the same path.
  - `persistOp()` swap: old path wrote via `/ingest/file?name=observed_operations` which has REPLACE semantics (wiped prior ops); now uses append-friendly Parquet write-through.
- [x] **Phase 25: Validity windows + playbook retirement** (2026-04-21, commit e0a843d)
  - `PlaybookEntry` gained four optional fields (`#[serde(default)]` so legacy entries load as never-expiring): `schema_fingerprint` (SHA-256 over target dataset columns at seed time), `valid_until` (RFC3339 hard expiry), `retired_at` (set by retire calls), `retirement_reason` (human string).
  - `compute_boost_for_filtered_with_role` now skips retired + expired entries before geo/cosine — no silent boosting from stale playbooks. Unit-tested on expired `valid_until` + retired + schema-drift retirement.
  - Two retirement paths: `retire_one(playbook_id, reason)` for manual, `retire_on_schema_drift(city, state, current_fingerprint, reason)` for batch schema-migration sweep. Legacy entries without a fingerprint skip drift retirement (safe).
  - Endpoint: `POST /vectors/playbook_memory/retire` — accepts either `{playbook_id, reason}` or `{city, state, current_schema_fingerprint, reason}`.
- [x] **Phase 26: Mem0 upsert + Letta geo hot cache** (2026-04-21, commit 640db8c)
  - Mem0-style upsert: `/seed` with `append=true` (default) routes through `upsert_entry`, which decides ADD / UPDATE / NOOP on (operation, day, city, state). Same-day re-seed merges names (union, stable order) instead of duplicating the row. Identical re-seed is a no-op. Different-day same-operation is a fresh ADD. Playbook_id stays stable on UPDATE so prior citations remain valid.
  - Letta-style hot cache: `PlaybookMemory` now holds a `geo_index: HashMap<(city_lower, state_upper), Vec<entry_idx>>` rebuilt on every mutation. Geo-filtered boost queries skip the full scan and hit the O(1) key lookup. At 1.9K entries the full scan was sub-ms; the index scales the same path to 100K+ without code changes.
  - `UpsertOutcome` enum reported back to callers — `{mode: added|updated|noop, playbook_id, merged_names?}` + `entries_after`.
- [ ] Fine-tuned domain models (Phase 25+)
- [ ] Multi-node query distribution (only if ceilings bite)

---

**145 unit tests | 13 crates | 21 ADRs | 2.47M rows | 100K vectors | Hybrid Parquet+HNSW ⊕ Lance | Phases 0-27 shipped**
**Latest: 2026-04-21 — Phase 27 (playbook versioning: `version` + `parent_id` + `superseded_at` + `superseded_by` on `PlaybookEntry`, `/revise` + `/history` endpoints, 8 new tests). Doc-sync pass: Phase 24 observer + Phase 25 validity windows + Phase 26 Mem0/Letta now reflected in phase tracker. Phase 19.6 time decay noted as wired (was misdocumented as deferred). Phase E.2 tombstone-at-compaction noted as closed in Phase 8 MVP limits.**