Today's work shipped four Phase closures (Truth Layer, Validation Pipeline, Caller Migration, Doc-Drift Detection); the canonical tracker now reflects them. Foundation for production switchover (real Chicago data replaces synthetic test data soon). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
37 KiB
37 KiB
Phase Tracker
Phase 0: Bootstrap ✅
- Cargo workspace with all crate stubs compiling
sharedcrate: error types, ObjectRef, DatasetIdgatewaywith Axum: GET /health → 200- tracing + tracing-subscriber wired in gateway
- justfile with build, test, run recipes
- docs committed to git
Phase 1: Storage + Catalog ✅
- storaged: object_store backend init (LocalFileSystem)
- storaged: Axum endpoints (PUT/GET/DELETE/LIST)
- shared/arrow_helpers.rs: RecordBatch ↔ Parquet + schema fingerprinting
- catalogd/registry.rs: in-memory index + manifest persistence
- catalogd service: POST/GET /datasets + by-name
- gateway routes wired
Phase 2: Query Engine ✅
- queryd: SessionContext + object_store config
- queryd: ListingTable from catalog ObjectRefs
- queryd service: POST /query/sql → JSON
- queryd → catalogd wiring
- gateway routes /query
Phase 3: AI Integration ✅
- Python sidecar: FastAPI + Ollama (embed/generate/rerank)
- Dockerfile for sidecar
- aibridge/client.rs: HTTP client
- aibridge service: Axum proxy endpoints
- Model config via env vars
Phase 4: Frontend ✅
- Dioxus scaffold, WASM build
- Ask tab: natural language → AI SQL → results
- Explore tab: dataset browser + AI summary
- SQL tab: raw DataFusion editor
- System tab: health checks for all services
Phase 5: Hardening ✅
- Proto definitions (lakehouse.proto)
- Internal gRPC: CatalogService on :3101
- OpenTelemetry tracing: stdout exporter
- Auth middleware: X-API-Key (toggleable)
- Config-driven startup: lakehouse.toml
Phase 6: Ingest Pipeline ✅
- CSV ingest with auto schema detection
- JSON ingest (array + newline-delimited, nested flattening)
- PDF text extraction (lopdf)
- Text/SMS file ingest
- Content hash dedup (SHA-256)
- POST /ingest/file multipart upload
- 12 unit tests
Phase 7: Vector Index + RAG ✅
- chunker: configurable size + overlap, sentence-boundary aware
- store: embeddings as Parquet (binary f32 vectors)
- search: brute-force cosine similarity
- rag: embed → search → retrieve → LLM answer with citations
- POST /vectors/index, /search, /rag
- Background job system with progress tracking
- Dual-pipeline supervisor with checkpointing + retry
- 100K embeddings: 177/sec on A4000, zero failures
- 6 unit tests
Phase 8: Hot Cache + Incremental Updates ✅
- MemTable hot cache: LRU, configurable max (16GB)
- POST /query/cache/pin, /cache/evict, GET /cache/stats
- Delta store: append-only delta Parquet files
- Merge-on-read: queries combine base + deltas
- Compaction: POST /query/compact
- Benchmarked: 9.8x speedup (1M rows: 942ms → 96ms)
Phase 8.5: Agent Workspaces ✅
- WorkspaceManager with daily/weekly/monthly/pinned tiers
- Saved searches, shortlists, activity logs per workspace
- Instant zero-copy handoff between agents
- Persistence to object storage, rebuild on startup
Phase 9: Event Journal ✅
- journald crate: append-only mutation log
- Event schema: entity, field, old/new value, actor, source, workspace
- In-memory buffer with auto-flush to Parquet
- GET /journal/history/{entity_id}, /recent, /stats
- POST /journal/event, /update, /flush
Phase 10: Rich Catalog v2 ✅
- DatasetManifest: description, owner, sensitivity, columns, lineage, freshness, tags
- PII auto-detection: email, phone, SSN, salary, address, medical
- Column-level metadata with sensitivity flags
- Lineage tracking: source_system → ingest_job → dataset
- PATCH /catalog/datasets/by-name/{name}/metadata
- Backward compatible (serde default)
Phase 11: Embedding Versioning ✅
- IndexRegistry: model_name, model_version, dimensions per index
- Index metadata persisted as JSON, rebuilt on startup
- GET /vectors/indexes — list all (filter by source/model)
- GET /vectors/indexes/{name} — metadata
- Background jobs auto-register metadata on completion
Phase 12: Tool Registry ✅
- 6 built-in staffing tools (search_candidates, get_candidate, revenue_by_client, recruiter_performance, cold_leads, open_jobs)
- Parameter validation + SQL template substitution
- Permission levels: read / write / admin
- Full audit trail per invocation
- GET /tools, GET /tools/{name}, POST /tools/{name}/call, GET /tools/audit
Phase 13: Security & Access Control ✅
- Role-based access: admin, recruiter, analyst, agent
- Field-level sensitivity enforcement
- Column masking determination per agent
- Query audit logging
- GET/POST /access/roles, GET /access/audit, POST /access/check
Phase 14: Schema Evolution ✅
- Schema diff detection (added, removed, type changed, renamed)
- Fuzzy rename detection (shared word parts)
- Auto-generated migration rules with confidence scores
- AI migration prompt builder for complex cases
- 5 unit tests
Phase 15+: Horizon
- HNSW vector index with iteration-friendly trial system (2026-04-16)
HnswStore.build_index_with_config— parameterized ef_construction, ef_search, seedEmbeddingCache— pins 100K vectors in memory, shared across trialsharness::EvalSet— named query sets with brute-force ground truthTrialJournal— append-only JSONL at_hnsw_trials/{index}.jsonl- Endpoints:
/vectors/hnsw/trial,/hnsw/trials/{idx},/hnsw/trials/{idx}/best?metric={recall|latency|pareto},/hnsw/evals,/hnsw/evals/{name}/autogen,/hnsw/cache/stats - Measured on 100K resumes: brute-force 44-54ms → HNSW 509us-1830us, recall 0.92-1.00 depending on
ef_construction. Sweet spot: ec=80 es=30 → p50=873us recall=1.00 — locked in asHnswConfig::default()
- Catalog manifest repair —
POST /catalog/resync-missingrestores row_count and columns from parquet footers (2026-04-16). All 7 staffing tables recovered to PRD-matching 2.47M rows. - [~] Federated multi-bucket query — foundation complete 2026-04-16, see ADR-017
StorageConfig.buckets+rescue_bucket+profile_rootconfig shapeSecretsProvidertrait +FileSecretsProvider(reads /etc/lakehouse/secrets.toml, checks 0600 perms)storaged::BucketRegistry— multi-backend, rescue-aware, reachability probesstoraged::error_journal::ErrorJournal— append-only JSONL atprimary://_errors/bucket_errors.jsonl- Endpoints:
GET /storage/buckets,GET /storage/errors,GET /storage/bucket-health - Bucket-aware I/O:
PUT/GET /storage/buckets/{bucket}/objects/{*key}with rescue fallback +X-Lakehouse-Rescue-Usedobservability headers - Backward compat: empty
[[storage.buckets]]synthesizes aprimaryfrom legacyroot - Three-bucket test (primary + rescue + testing) verified: normal reads, rescue fallback with headers, hard-fail missing, write to unknown bucket 503, error journal + health summary
X-Lakehouse-Bucketheader middleware on ingest endpoints (2026-04-16)- Catalog migration:
POST /catalog/migrate-bucketsstampsbucket = "primary"on legacy refs (12 renamed, 14 total now canonical) querydregisters every bucket with DataFusion for cross-bucket SQL — verified with people_test (testing) × animals (primary) CROSS JOIN- Profile hot-load endpoints: bucket auto-provisioning on
POST /vectors/profile/{id}/activate(2026-04-17) vectordbucket-scoped paths: TrialJournal + PromotionRegistry resolve per-index via IndexMeta.bucket (2026-04-17)- Runtime bucket lifecycle:
POST /storage/buckets(provision) +DELETE /storage/buckets/{name}(unregister, refuses primary/rescue) (2026-04-17) - ModelProfile.bucket field — per-profile artifact isolation (2026-04-17)
- Database connector ingest (Postgres first) — 2026-04-16
pg_stream::stream_table_to_parquet— ORDER BY + LIMIT/OFFSET pagination, configurable batch_sizeparse_dsn— postgresql:// and postgres:// URL scheme, user/password/host/port/dbPOST /ingest/dbendpoint:{dsn, table, dataset_name?, batch_size?, order_by?, limit?}→ streams to Parquet, registers in catalog with PII detection + redacted-password lineage- Existing
POST /ingest/postgres/import(structured config) preserved alongside - 4 DSN-parser unit tests + live end-to-end test against
knowledge_base.team_runs(586 rows, 13 cols, 6 batches, 196ms)
- Phase B: Lance storage evaluation — 2026-04-16
crates/lance-benchstandalone pilot (Lance 4.0) avoids DataFusion/Arrow version conflict with main stack- 8-dimension benchmark on resumes_100k_v2 — see docs/ADR-019-vector-storage.md for scorecard
- Decision: hybrid architecture. Parquet+HNSW stays primary (2.55× faster search at 100K in-RAM). Lance added as per-profile second backend for random access (112× faster), append (0.08s vs full rewrite), hot-swap (14× faster index builds), and scale past 5M RAM ceiling.
- Phase E.2 — Compaction integrates tombstones (physical deletion) — 2026-04-16
delta::compactacceptstombstones: &[Tombstone]param, filters rows at merge time via arrowfilter_record_batch- CompactResult gains
tombstones_applied+rows_dropped_by_tombstones - Atomic write: ArrowWriter → single Parquet file (fixes latent bug where concatenated Parquet byte streams produced garbage — footer-only-first-segment visible), verify-parse before overwrite, temp_key staging, delete delta files AFTER base write succeeds
- Snappy compression on output matches ingest defaults (avoids 3× size inflation on every compact)
TombstoneStore::cleardrops all batch files for a dataset; called by queryd after successful compact- Query engine exposes
catalog()accessor so service handler can reach the tombstone store - E2E verified on candidates (100K rows): tombstone 3 IDs → compact → 99,997 rows physically in parquet, tombstones empty, IDs gone from
__raw__candidatestoo; file size 10.59 MB → 10.72 MB (proportional to data, not inflated)
- Phase 16: Hot-swap generations + autotune agent — 2026-04-16
vectord::promotion::PromotionRegistry— per-index current config + history at_hnsw_promotions/{index}.json, cap 50 history entries- Endpoints:
POST /vectors/hnsw/promote/{index}/{trial_id},POST /vectors/hnsw/rollback/{index},GET /vectors/hnsw/promoted/{index} vectord::autotune::run_autotune— grid of trials (configurable or default 5 configs), Pareto winner selection (max recall, then min p50), min_recall safety gate (default 0.9), config bounds (ec ∈ [10,400], es ∈ [10,200])POST /vectors/hnsw/autotune— runs the full loop synchronously, journals every trial, auto-promotes winneractivate_profileusespromotion_registry.config_or(..., profile_default)so newly-promoted configs flow automatically into next activation- End-to-end: autogen harness for threat_intel_v1 (10 queries), autotune ran 5 trials (all recall=1.00, p50 64-68us), promoted ec=20 es=30 at recall=1.0 p50=64us as winner. Manual promote of ec=80 es=30 pushed autotune pick onto history. Rollback restored autotune winner. Second rollback cleared to None. Re-promote + restart verified persistence. Activation after promotion logged "building HNSW ef_construction=80 ef_search=30 seed=42" — config flowed through correctly.
- Phase 17: Model profiles + scoped search — 2026-04-16
shared::types::ModelProfile— { id, ollama_name, description, bound_datasets, hnsw_config, embed_model, created_at, created_by }shared::types::ProfileHnswConfig— mirror of vectord's HnswConfig to avoid cross-crate dep cycle (defaults ec=80 es=30 matching Phase 15 winner)Registry::{put_profile, get_profile, list_profiles, delete_profile}persisted at_catalog/profiles/{id}.json, validates bindings exist (raw dataset OR AiView)- Endpoints:
POST/GET /catalog/profiles,GET/DELETE /catalog/profiles/{id} POST /vectors/profile/{id}/activate— warms EmbeddingCache + builds HNSW with profile's config for every bound dataset's vector index; reports warmed indexes + failures + durationPOST /vectors/profile/{id}/search— rejects 403 if requested index's source isn't in profile.bound_datasets; falls through to HNSW if warm, brute-force otherwise- Fixed refresh to register new index metadata (was silently no-op for first-time indexes)
- End-to-end: security-analyst profile bound to threat_intel → activate warms 54 vectors in 156ms → within-scope HNSW search works (0.625 score); out-of-scope search for candidates returns 403 with allowed bindings listed
- Phase E: Soft deletes (tombstones) — 2026-04-16
shared::types::Tombstone— { dataset, row_key_column, row_key_value, deleted_at, actor, reason }catalogd::tombstones::TombstoneStoreper-dataset append-log at_catalog/tombstones/{dataset}/, flush_threshold=1 + explicit flush so every tombstone is durable on return (compliance requirement)- All tombstones for a dataset must share the same
row_key_column(validated at write — query filter is built as a single WHERE NOT IN clause) Registry::add_tombstone / list_tombstones- Endpoint:
POST /catalog/datasets/by-name/{name}/tombstoneaccepting{row_key_column, row_key_values[], actor, reason}; companionGETlists active tombstones queryd::context::build_contextwraps tombstoned tables: raw goes to__raw__{name}, public name becomes a DataFusion view withWHERE CAST(col AS VARCHAR) NOT IN (...)filter- End-to-end on candidates: tombstone 3 IDs, COUNT drops 100,000 → 99,997, specific WHERE returns empty, AiView candidates_safe transitively excludes them too, restart preserves all tombstones
- Limits / not in MVP: journal integration (tombstones don't yet emit Phase 9 mutation events — covered by audit fields on the tombstone itself). Physical compaction integration was closed by Phase E.2 below.
- Phase D: AI-safe views — 2026-04-16
shared::types::AiView— name, base_dataset, columns whitelist, optional row_filter, column_redactionsshared::types::Redaction— Null | Hash | Mask { keep_prefix, keep_suffix }Registry::put_view / get_view / list_views / delete_viewpersisted to_catalog/views/{name}.jsonqueryd::contextregisters each view as a DataFusion view with the safe projection + filter + redactions baked into the SELECT- Endpoints:
POST/GET /catalog/views,GET/DELETE /catalog/views/{name} - End-to-end on candidates:
candidates_safeview exposes 8 of 15 columns, maskscandidate_id(CAN******01), filters outstatus='blocked'.SELECT * FROM candidates_safereturns whitelist only;SELECT email FROM candidates_safefails. View survives restart. - Capability surface — raw
candidatesstill accessible by name; Phase 13 access control is the layer that enforces who can query what
- Phase C: Decoupled embedding refresh — 2026-04-16
DatasetManifest:last_embedded_at,embedding_stale_since,embedding_refresh_policy(Manual | OnAppend | Scheduled)Registry::mark_embeddings_stale/clear_embeddings_stale/stale_datasets- Ingest paths (CSV pipeline + Postgres streaming) auto-mark-stale when writing to an already-embedded dataset
vectord::refresh::refresh_index— reads dataset, diffs doc_ids vs existing embeddings, embeds only new rows, writes combined index, clears stalePOST /vectors/refresh/{dataset}+GET /vectors/stale- Id columns accept
Utf8,Int32,Int64 - End-to-end on threat_intel: initial 20-row embed 2.1s; re-ingest to 54 rows auto-marks stale; delta refresh embeds only 34 new in 970ms (6× faster than full re-embed); stale cleared
- Phase 16.2/16.5: Background autotune agent + ingest-triggered re-trials — 2026-04-17
vectord::agent— ε-greedy proposer, rate-limited, cooldown-gated, tokio background task- Ingest paths push
DatasetAppendedtriggers to agent queue - Endpoints:
GET /vectors/agent/status,POST /vectors/agent/stop,POST /vectors/agent/enqueue/{idx} [agent]config section in lakehouse.toml (enabled, cycle_interval, cooldown, min_recall, max_trials/hr)- 3 unit tests
- Phase 17 VRAM gate: Two-profile sequential swap — 2026-04-17
- Sidecar:
POST /admin/unload(keep_alive=0),POST /admin/preload,GET /admin/vram(nvidia-smi + Ollama /api/ps) AiClient::unload_model / preload_model / vram_snapshotVectorState.active_profilesingleton — activate swaps models, deactivate unloads- Verified: staffing-recruiter (qwen2.5) ↔ docs-assistant (mistral) — only one model in VRAM at a time
- Sidecar:
- MySQL streaming connector — 2026-04-17
my_stream.rsmirrors pg_stream: DSN parsing, OFFSET pagination, Arrow type mapping, Parquet streamingPOST /ingest/mysqlwith PII detection, lineage, agent trigger- Verified end-to-end on live MariaDB (10 rows, 9 columns, round-tripped all types)
- 6 DSN + type-mapping unit tests
- Phase 18 hybrid: vectord-lance production crate — 2026-04-17
- Firewall crate (Arrow 57 / Lance 4, separate from main Arrow 55 / DF 47 stack)
- Public API: migrate_from_parquet, build_index (IVF_PQ), search, get_by_doc_id, append, build_scalar_index, stats
lance_backend::LanceRegistryresolves bucket → URI per indexVectorBackend { Parquet | Lance }enum on ModelProfile + IndexMeta- 8 HTTP endpoints under
/vectors/lance/*(migrate, index, search, doc, append, stats, scalar-index, recall) - Profile-driven routing:
POST /vectors/profile/{id}/searchauto-routes to Lance when profile.vector_backend=lance - Auto-migrate + auto-index on activation
- Measured on real 100K × 768d: migrate 0.57s, IVF_PQ build 16.2s (14× faster than HNSW 230s), search 23ms, append 100 rows 3.3ms, doc_id fetch 3.5ms (with scalar btree)
- IVF_PQ recall@10 = 0.805 with Lance's default
nprobes=1(the hidden cap — see 2026-04-20 tuning work below, which lifts it to 1.000). Measured via/vectors/lance/recall/{idx}harness.
- Phase E.3: Scheduled ingest — 2026-04-17
ingestd::schedulemodule: ScheduleDef, ScheduleStore (JSON at_schedules/{id}.json), Scheduler tokio task- Supports MySQL + Postgres sources on interval triggers (Cron variant defined, parsing stubbed)
- 6 CRUD endpoints under
/ingest/schedules/*+ run-now manual trigger - Full catalog integration: PII, lineage, mark-stale, agent trigger
- 6 unit tests
- PDF OCR via Tesseract — 2026-04-17
- Two-tier: lopdf text extraction → Tesseract 5.5 fallback for scanned/image PDFs
- Extracts embedded XObject /Image streams, shells to tesseract --oem 3 --psm 6
- Same schema (source_file, page_number, text_content) — downstream unchanged
- Catalog hygiene — idempotent
register()+ dedupe + DELETE (2026-04-19, ADR-020)catalogd::Registry::registernow gates on(name, schema_fingerprint): same fp → reuseDatasetIdand update objects in place; different fp → return error (409 Conflict on HTTP,FAILED_PRECONDITIONon gRPC). First-time registration is unchanged.POST /catalog/dedupeone-shot operator endpoint collapses pre-existing duplicates; winner = non-nullrow_countfirst, newestupdated_atsecond.DELETE /catalog/datasets/by-name/{name}removes the manifest from both in-memory registry and object storage (metadata-only — parquet files, vector indexes, tombstones are NOT cascade-deleted). Added to support test-harness cleanup; also plugs a real catalog hole where zombie entries from prior deletes would break DataFusion schema inference.- Cleanup run on live catalog: 374 → 31 datasets, 343 orphan manifests removed, 0 errors. 308×
successful_playbookswas the worst offender. - Concurrency: write lock held across storage I/O in
register()to close the check→insert TOCTOU window (32-worker multi-threaded stress test verifies single-manifest invariant). - End-to-end verification:
scripts/e2e_pipeline_check.shruns 31 assertions across 12 pipeline stages (ingest → catalog → SQL+JOIN → dedup → idempotency → metadata → PII → vector embed → semantic search → cleanup) against the live gateway. Idempotent across repeat runs. - Tests: 11 new in
catalogd(was 0, includes 3 concurrency tests + 3 delete_dataset tests); 11 new instoragedforAppendLog+ErrorJournal(was 0). Fixed a broken doctest inappend_log.rs.
- Autotune agent: portfolio rotation + auto-bootstrap (2026-04-20)
pick_periodic_targetnow sources candidates fromIndexRegistry(not just promoted indexes) and picks least-recently-tuned, so trial budget spreads across every index with ≥1000 vectors instead of fixating on one converged champion.run_one_cyclebootstraps on first visit:ensure_auto_harnessauto-generates{index}_auto(20 synthetic self-queries, k=10, brute-force ground truth) if missing, then seeds withHnswConfig::default()(ec=80/es=30).- Regression fix:
harness::recall_at_know uses set-intersection semantics. The prior impl counted duplicates inpredicted— on corpora with repeated chunks (kb_response_cache_agent) this inflated recall above 1.0 and poisoned promotion decisions. +7 unit tests.
- Scheduled ingest: real cron parsing (2026-04-20)
- Vixie-compatible 5/6-field cron via
cronercrate. Day-of-week follows Unix convention (1-5= Mon-Fri). 6-field adds seconds granularity. validate_triggeriningestd::schedule— create/patch handlers reject malformed expressions with400 BAD_REQUESTat creation time, not silently at fire time.- Swapped away from the
croncrate (0.16) which uses a non-Unix DOW convention (1=Sun) that would silently bite anyone writing1-5expecting weekdays. +9 unit tests.
- Vixie-compatible 5/6-field cron via
- EvalSets federation (2026-04-20)
harness::HarnessStoremirrors the TrialJournal / PromotionRegistry federation pattern: eval artifacts colocate with each index's recorded bucket; legacy evals in primary remain discoverable via a fallback path; cross-bucket listing dedupes.- Every eval callsite (service.rs × 5, agent.rs × 3, autotune.rs × 1) now routes through
HarnessStore.VectorStateandAgentDepseach hold a shared instance.
- Index bucket-migrate PATCH (2026-04-20)
PATCH /vectors/indexes/{name}/bucketcopies an index's vector parquet + trial-journal batches + promotion file + auto-harness todest_bucket, flipsIndexMeta.bucketas the commit point, and evicts theEmbeddingCacheso next load reads from the new bucket. Optionaldelete_source: truesweeps source artifacts.- Lance-backed indexes refused with 400 — Lance URIs are bucket-specific and require rewriting the dataset, separate story. Round-trip verified: 390 artifacts, 0.04s.
- IVF_PQ recall tuning (2026-04-20)
LanceVectorStore::searchnow accepts optionalnprobes+refine_factor. Lance's built-innprobes=1default was the hidden cap on recall — on 316-partitionresumes_100k_v2it searched only 0.3% of partitions per query.- Server defaults (
LANCE_DEFAULT_NPROBES=20,LANCE_DEFAULT_REFINE_FACTOR=5) flow through the scoped-search path and the autotune harness. Measured onresumes_100k_v2: recall0.805 → 1.000at p50 ≈ 7.4ms. Evennprobes=5, refine=5saturates recall at p50 ≈ 4.7ms. /vectors/lance/recall/{idx}accepts per-requestnprobes/refine_factorso operators can sweep the curve.
- Phase 19: Playbook memory (meta-index) — the feedback loop originally implied by the PRD but never built. Playbooks stop being write-only; they start shaping future rankings. (2026-04-20)
- 19.1 —
POST /vectors/playbook_memory/rebuildscanssuccessful_playbooksvia DataFusion, builds onePlaybookEntryper row (operation + approach + context embedded as one vector via nomic-embed-text) - 19.2 — Brute-force cosine search over in-memory embeddings (chosen over HNSW: successful_playbooks maxes around thousands of rows, overhead of a second indexed surface isn't worth it until that ceiling bites)
- 19.3 — Endorsed names parsed out of
resultcolumn, keyed by(city, state, name)tuple so shared names across cities don't cross-pollinate. Parsing viaparse_names+parse_city_statehelpers (7 unit tests) - 19.4 —
/vectors/hybrid?use_playbook_memory=true: fetchestop_k * 5candidates so endorsed workers outside the vanilla top-K can still climb. Boost is additive on vector score, each hit carriesplaybook_boost+playbook_citationsin the response for explainability - 19.5 — Multi-agent orchestrator (
tests/multi-agent/orchestrator.ts) auto-seedsPOST /vectors/playbook_memory/seedon consensus_done, so the next query sees the new endorsement without a full/rebuild. Closes the feedback loop: two agents reach consensus → playbook sealed → next query re-ranks - 19.6 —
MAX_BOOST_PER_WORKER = 0.25enforced incompute_boost_for; verified with unit test (100 identical playbooks → boost capped at 0.25) and live test (5 identical seeds → exactly 0.25). Time decay also wired:BOOST_HALF_LIFE_DAYS = 30.0— 30-day-old playbooks contribute half, 60-day a quarter, viaexp(-age_days / 30)in the boost loop - Real finding surfaced during build: the 32 bootstrap rows in
successful_playbooksreference phantom worker names — 80 of 82 don't correspond to actual rows inworkers_500k./seedendpoint bypassessuccessful_playbooksso operators can prime memory with real fixtures; production path is the orchestrator write-through
- 19.1 —
- Phase 19 refinement — geo + role prefilter on boost (2026-04-21)
- Added
compute_boost_for_filteredandcompute_boost_for_filtered_with_roletoplaybook_memory.rs. SQL filter's(city, state, role)parsed inservice.rs; exact role-matches in target geo skip cosine and earn similarity=1.0. Restored the feedback loop: matched=0 → matched=11 per query on the same Nashville test. Citation density on Riverfront Steel: 2 → 28 per run (14×). - Rust unit tests:
extractor_tests::extract_target_geo_basic/_missing_state/_word_boundary,extract_target_role_basic/_none/_multi_word. 6/6 pass. - Diagnostic log:
playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=?on every call.
- Added
- Phase 20: Model Matrix + Overseer Tiers (2026-04-21)
config/models.json— 5 tiers (t1_hot / t2_review / t3_overview / t4_strategic / t5_gatekeeper), each with context_window + context_budget + overflow_policy. Ollama Cloud bearer key from/root/llm_team_config.json.- Hot path: qwen3.5:latest + qwen3:latest local with
think:false. Mistral dropped after 0/14 fill on complex scenarios. - T3 cloud: gpt-oss:120b via Ollama Cloud — verified 4-8s latency, strict JSON-shape output for remediation.
- Phase 21: Scratchpad + Tree-Split Continuation (2026-04-21)
tests/multi-agent/agent.ts:estimateTokens(),assertContextBudget(),generateContinuable(),generateTreeSplit().thinkflag plumbed through sidecar's/generate. Empty-response backoff + truncation-continuation, no max_tokens tourniquet.- Rust port shipped (2026-04-21, companion to Phase 27):
crates/aibridge/src/context.rs—estimate_tokens(chars/4 ceil, matches TS),context_window_for,assert_context_budgetreturningResult<BudgetCheck, (BudgetCheck, usize over_by)>so callers get the numbers back on both success and overflow. Windows table mirrorsconfig/models.json.crates/aibridge/src/continuation.rs—generate_continuable<G: TextGenerator>handles the two failure modes from TS: (a) empty thinking-model response → geometric-backoff retry with 2× budget up tobudget_cap; (b) truncated non-empty → continuation with partial-as-scratchpad.is_structurally_completebalances braces then JSON.parse-check for the JSON shape; text shape is "non-empty". Guards the degen case "all retries empty → bail, don't loop on empty partial" — the TS impl has this implicit, Rust makes it explicit.crates/aibridge/src/tree_split.rs—generate_tree_splitmap→reduce with running scratchpad. Per-shard + reduce-prompt budget checked throughassert_context_budget; loud-fails with the overflow message rather than silently truncating. Scratchpad oldest-digest-first truncation once it exceedsscratchpad_budget(default 6000 tokens, matches TS).TextGeneratortrait (native async-fn-in-trait, edition 2024) soScriptedGeneratortest double can inject canned sequences without a live Ollama.AiClientimplementsTextGenerator.GenerateRequestgainedthink: Option<bool>field — forwards to sidecar for per-call hidden-reasoning opt-out on hot-path JSON emitters.- 25 aibridge unit tests (8 context + 10 continuation + 7 tree_split) — all green, no network required.
- Phase 22: Internal Knowledge Library (2026-04-21)
data/_kb/— signatures.jsonl, outcomes.jsonl, pathway_recommendations.jsonl, error_corrections.jsonl, config_snapshots.jsonl. Event-driven cycle: indexRun → recommendFor → loadRecommendation.- Item B cloud rescue: failed event → cloud remediation JSON → retry with pivot. Verified 1/3 rescues succeeded on stress_01 (Gary IN → South Bend IN pivot).
scripts/kb_measure.pyaggregator. Unit tests:kb.test.ts— 4/4 pass (signature stability, role/city/count invariants, digest shape).
- Phase 23: Staffer identity + competence-weighted retrieval (2026-04-21)
- ScenarioSpec gained
contract: ContractTermsandstaffer: Staffer { id, name, tenure_months, role, tool_level }. - tool_level runtime overrides: full / local / basic / minimal. Basic + minimal route executor to Ollama Cloud
kimi-k2.5(kimi-k2.6 pending pro-tier upgrade). data/_kb/staffers.jsonl— competence_score = 0.45·fill + 0.20·turn_eff + 0.20·cite + 0.15·rescue. Recomputed per run.findNeighborsnow returnsweighted_score = cosine × max_staffer_competence.scripts/kb_staffer_report.py— leaderboard + cross-staffer worker overlap (Rachel D. Lewis 12× across 4 staffers → auto-discovered high-value label).gen_staffer_demo.ts+run_staffer_demo.sh— 4 personas × 3 contracts = 12 runs.
- ScenarioSpec gained
- Phase 27: Playbook versioning (2026-04-21)
PlaybookEntrygainedversion: u32(default 1),parent_id,superseded_at,superseded_byfields. All#[serde(default)]so entries persisted before Phase 27 load as roots with version=1.PlaybookMemory::revise_entry(parent_id, new_entry)appends a new version, stampssuperseded_at+superseded_byon the parent, inheritsparent_idand setsversion = parent + 1on the new entry. Rejects revising a retired or already-superseded parent with a clear error — the tip of the chain is the only valid revise target.PlaybookMemory::history(playbook_id)returns the full chain root→tip, walkingparent_idbackward thensuperseded_byforward. Cycle-safe. Works from any node in the chain.- Superseded entries excluded from boost (same rule as retired):
compute_boost_for_filtered_with_role, the active-entries prefilter, the geo-index rebuild, and the upsert existing-entry search all skipsuperseded_at.is_some(). - Endpoints:
POST /vectors/playbook_memory/revise+GET /vectors/playbook_memory/history/{id}. status_countsnow returns(total, retired, superseded, failures)./statusJSON reportssupersededas a distinct counter;active = total - retired - superseded.- 8 unit tests under
mod version_testscovering: chain-metadata stamping, retired-parent rejection, already-superseded-parent rejection, superseded endorsement exclusion from boost, history traversal from root/tip/middle, empty-for-unknown-id, superseded-status-count, legacy-entry-default-version round-trip. 26/26 playbook_memory tests pass.
- Phase 24: Observer / Autotune integration (2026-04-20, commit
b95dd86)- Closed the gap where
lakehouse-observer.servicewrapped MCP :3700 whiletests/multi-agent/scenario.tshit gateway :3100 directly — observer sat idle at 0 ops across 3600+ cycles. observer.tsgained a Bun HTTP listener onOBSERVER_PORT(default 3800) withGET /health,GET /stats(totals + by_source + recent scenario digest), andPOST /eventfor scenario outcomes. Body shapes intoObservedOpwithsource="scenario"+staffer_id+sig_hash+event_kind+ geo + rescue flags.recordExternalOp()shared ring-buffer insert — ERROR_ANALYZER and PLAYBOOK_BUILDER loops now see both MCP-wrapped and scenario-posted ops through the same path.persistOp()swap: old path wrote via/ingest/file?name=observed_operationswhich has REPLACE semantics (wiped prior ops); now uses append-friendly Parquet write-through.
- Closed the gap where
- Phase 25: Validity windows + playbook retirement (2026-04-21, commit
e0a843d)PlaybookEntrygained four optional fields (#[serde(default)]so legacy entries load as never-expiring):schema_fingerprint(SHA-256 over target dataset columns at seed time),valid_until(RFC3339 hard expiry),retired_at(set by retire calls),retirement_reason(human string).compute_boost_for_filtered_with_rolenow skips retired + expired entries before geo/cosine — no silent boosting from stale playbooks. Unit-tested on expiredvalid_until+ retired + schema-drift retirement.- Two retirement paths:
retire_one(playbook_id, reason)for manual,retire_on_schema_drift(city, state, current_fingerprint, reason)for batch schema-migration sweep. Legacy entries without a fingerprint skip drift retirement (safe). - Endpoint:
POST /vectors/playbook_memory/retire— accepts either{playbook_id, reason}or{city, state, current_schema_fingerprint, reason}.
- Phase 26: Mem0 upsert + Letta geo hot cache (2026-04-21, commit
640db8c)- Mem0-style upsert:
/seedwithappend=true(default) routes throughupsert_entry, which decides ADD / UPDATE / NOOP on (operation, day, city, state). Same-day re-seed merges names (union, stable order) instead of duplicating the row. Identical re-seed is a no-op. Different-day same-operation is a fresh ADD. Playbook_id stays stable on UPDATE so prior citations remain valid. - Letta-style hot cache:
PlaybookMemorynow holds ageo_index: HashMap<(city_lower, state_upper), Vec<entry_idx>>rebuilt on every mutation. Geo-filtered boost queries skip the full scan and hit the O(1) key lookup. At 1.9K entries the full scan was sub-ms; the index scales the same path to 100K+ without code changes. UpsertOutcomeenum reported back to callers —{mode: added|updated|noop, playbook_id, merged_names?}+entries_after.
- Mem0-style upsert:
- Phase 37: Hot-swap async (2026-04-22)
- Extended
JobTrackerwithJobType::ProfileActivation+Embedenum variants - Made
activate_profilereturn immediately withjob_id, work runs in background viatokio::spawn - Background jobs tracked via
POST /vectors/jobs/{id}+GET /vectors/jobs
- Extended
- Phase 38: Universal API Skeleton (2026-04-23)
/v1/chat— OpenAI-compatible POST, forwards to local Ollama or Ollama Cloud/v1/usage— returns{requests, prompt_tokens, completion_tokens, total_tokens}/v1/sessions— returns{data: [], note: "Phase 38: stateless"}- Langfuse trace integration (fire-and-forget, Phase 40 early)
- 12 unit tests green, curl gates pass
- Phase 39: Provider Adapter Refactor (2026-04-23)
ProviderAdaptertrait withchat()+embed()+unload()+health()OllamaAdapter— wraps existing AiClientOpenRouterAdapter— HTTP client to openrouter.aiprovider_key()routing by model prefix (openrouter/* → OpenRouter)
- Phase 40: Routing & Policy Engine (2026-04-23)
RoutingEnginewithRouteDecisionin aibridge::routingconfig/routing.toml— rules by model_pattern, fallback chain, cost gating- Per-provider usage tracking: Usage.by_provider
- 12 gateway tests green, curl gates pass
- Phase 41: Profile System Expansion (2026-04-23)
ProfileTypeenum: Execution, Retrieval, Memory, Observer- Per-type endpoints:
/profiles/retrieval,/profiles/memory,/profiles/observer profile_typefield on ModelProfile- Guard fix: automated scrumaudit.py finds real issues
- Phase 42: Truth Layer (2026-04-27 closure verified)
crates/truth/{lib,staffing,devops,loader}.rs- Staffing rules populated; devops scaffold by design
/v1/contextserves task_classes + rules; 37 tests green
- Phase 43: Validation Pipeline (2026-04-27)
crates/validator/real validators + WorkerLookup + ParquetWorkerLookup- 500K-row workers_500k.parquet loaded at gateway boot
POST /v1/validate+POST /v1/iterate(the 0→85% loop)- 33 validator tests green
- Phase 44: Caller Migration (2026-04-27)
- TS callers + aibridge::AiClient::new_with_gateway opt-in
- Vectord routed through /v1/chat for autotune + RAG
- scripts/check_phase44_callers.sh CI guard
- Phase 45: Doc-Drift Detection (2026-04-27)
- DocRef + doc_drift module + context7 bridge
- /doc_drift/check + /scan + /resolve endpoints
- data/_kb/doc_drift_corrections.jsonl writes
- boost exclusion of unreviewed drift-flagged entries
- Fine-tuned domain models (Phase 25+)
- Multi-node query distribution (only if ceilings bite)
145 unit tests | 13 crates | 21 ADRs | 2.47M rows | 100K vectors | Hybrid Parquet+HNSW ⊕ Lance | Phases 0-27 shipped
Latest: 2026-04-21 — Phase 27 (playbook versioning: version + parent_id + superseded_at + superseded_by on PlaybookEntry, /revise + /history endpoints, 8 new tests). Doc-sync pass: Phase 24 observer + Phase 25 validity windows + Phase 26 Mem0/Letta now reflected in phase tracker. Phase 19.6 time decay noted as wired (was misdocumented as deferred). Phase E.2 tombstone-at-compaction noted as closed in Phase 8 MVP limits.