Apply the highest-confidence findings from the Phase 0→42 forensic sweep
after four scrum-master iterations under the adversarial prompt. Each fix
is independently validated by a later scrum iteration scoring the same
file higher under the same bar.
Code changes
────────────
P5-001 — crates/gateway/src/auth.rs + main.rs
api_key_auth was marked #[allow(dead_code)] and never wrapped around
the router, so `[auth] enabled=true` logged a green message and
enforced nothing. Now wired via from_fn_with_state, with constant-time
header compare and /health exempted for LB probes.
P42-001 — crates/truth/src/lib.rs
TruthStore::check() ignored RuleCondition entirely — signature looked
like enforcement, body returned every action unconditionally. Added
evaluate(task_class, ctx) that actually walks FieldEquals / FieldEmpty /
FieldGreater / Always against a serde_json::Value via dot-path lookup.
check() kept for back-compat. Tests 14 → 24 (10 new exercising real
pass/fail semantics). serde_json moved to [dependencies].
P9-001 (partial) — crates/ingestd/src/service.rs
Added Optional<Journal> to IngestState + a journal.record_ingest() call
on /ingest/file success. Gateway wires it with `journal.clone()` before
the /journal nest consumes the original. First-ever internal mutation
journal event verified live (total_events_created 0→1 after probe).
Iter-4 scrum scored these files higher under same prompt:
ingestd/src/service.rs 3 → 6 (P9-001 visible)
truth/src/lib.rs 3 → 4 (P42-001 visible)
gateway/src/auth.rs 3 → 4 (P5-001 visible)
gateway/src/execution_loop 4 → 6 (indirect)
storaged/src/federation 3 → 4 (indirect)
Infrastructure additions
────────────────────────
* tests/real-world/scrum_master_pipeline.ts
- cloud-first ladder: kimi-k2:1t → deepseek-v3.1:671b → mistral-large-3:675b
→ gpt-oss:120b → devstral-2:123b → qwen3.5:397b (deep final thinker)
- LH_SCRUM_FORENSIC env: injects SCRUM_FORENSIC_PROMPT.md as adversarial preamble
- LH_SCRUM_PROPOSAL env: per-iter fix-wave doc override
- Confidence extraction (markdown + JSON), schema v4 KB rows with:
verdict, critical_failures_count, verified_components_count,
missing_components_count, output_format, gradient_tier
- Model trust profile written per file-accept to data/_kb/model_trust.jsonl
- Fire-and-forget POST to observer /event so by_source.scrum appears in /stats
* mcp-server/observer.ts — unchanged in shape, confirmed receiving scrum events
* ui/ — new Visual Control Plane on :3950
- Bun.serve with /data/{services,reviews,metrics,trust,overrides,findings,file,refactor_signals,search,logs/:svc,scrum_log}
- Views: MAP (D3 graph, 5 overlays) / TRACE (per-file iter timeline) /
TRAJECTORY (refactor signals + reverse index search) / METRICS (explainers
with SOURCE + GOOD lines) / KB (card grid with tooltips) / CONSOLE (per-service
journalctl tail, tabs for gateway/sidecar/observer/mcp/ctx7/auditor/langfuse)
- tryFetch always attempts JSON.parse (fix for observer returning JSON without content-type)
- renderNodeContext primitive-vs-object guard (fix for gateway /health string)
* docs/SCRUM_FIX_WAVE.md — iter-specific scope directing the scrum
* docs/SCRUM_FORENSIC_PROMPT.md — adversarial audit prompt (verdict/critical/verified schema)
* docs/SCRUM_LOOP_NOTES.md — iteration observations + fix-next-loop queue
* docs/SYSTEM_EVOLUTION_LAYERS.md — Layers 1-10 roadmap (trust profiling, execution DNA, drift sentinel, etc)
Measurements across iterations
──────────────────────────────
iter 1 (soft prompt, gpt-oss:120b): mean score 5.00/10
iter 3 (forensic, kimi-k2:1t): mean score 3.56/10 (−1.44 — bar raised)
iter 4 (same bar, post fixes): mean score 4.00/10 (+0.44 — fixes landed)
Score movement iter3→iter4: ↑5 ↓1 =12
21/21 first-attempt accept by kimi-k2:1t in iter 4
20/21 emitted forensic JSON (richer signal than markdown)
16 verified_components captured (proof-of-life, new metric)
Permission Gradient distribution: 0 auto · 16 dry_run · 4 sim · 1 block
Observer loop: by_source {scrum: 21, langfuse: 1985, phase24_audit: 1}
v1/usage: 224 requests, 477K tokens, all tracked
Signal classes per file (iter 3 → iter 4):
CONVERGING: 1 (ingestd/service.rs — fix clearly landed)
LOOPING: 4 (catalogd/registry, main, queryd/service, vectord/index_registry)
ORBITING: 1 (truth — novel findings surfacing as surface ones fix)
PLATEAU: 9 (scores flat with high confidence — diminishing returns)
MIXED: 6
Loop thesis status
──────────────────
A file's score rises only when the scrum confirms a real fix landed.
No false positives yet across 3 iterations. Fixes applied to 3 files all
raised their independent scores under the same adversarial prompt. Loop
is measurable, not hand-wavy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five threads of work landing as one milestone — all individually
verified end-to-end against real data, full release build clean,
46 unit tests pass.
## Phase 16.2 / 16.5 — autotune agent + ingest triggers
`vectord::agent` is a long-running tokio task that watches the trial
journal and autonomously proposes + runs new HNSW configs. Distinct
from `autotune::run_autotune` (synchronous one-shot grid). Triggered
on POST /vectors/agent/enqueue/{idx} or by the periodic wake; ingest
paths now push DatasetAppended events when an index's source dataset
gets re-ingested. Rate-limited (max_trials_per_hour) and cooldown-
gated so it can't saturate Ollama under live load.
The proposer is ε-greedy around the current champion: with prob 0.25
sample random from full bounds, otherwise perturb champion ± small
delta on both axes. Dedup against history. Deterministic — RNG seeded
from history.len() so the same journal state proposes the same next
config (helps offline replay debugging).
`[agent]` config section in lakehouse.toml; opt-in via enabled=true.
## Federation Layer 2 — runtime bucket lifecycle + per-index scoping
`BucketRegistry.buckets` moved to `std::sync::RwLock<HashMap>` so
buckets can be added/removed after startup. POST /storage/buckets
provisions at runtime; DELETE /storage/buckets/{name} unregisters
(refuses primary/rescue with 403). Local-backend buckets get their
root directory auto-created.
`IndexMeta.bucket` (default "primary" via serde) records each index's
home bucket. `TrialJournal` and `PromotionRegistry` now hold
Arc<BucketRegistry> + IndexRegistry; they resolve target store per-
index via IndexMeta.bucket. PromotionRegistry::list_all scans every
bucket and dedups by index_name. Pre-federation indexes keep working
unchanged — they just default to primary.
`ModelProfile.bucket: Option<String>` declares per-profile artifact
home. POST /vectors/profile/{id}/activate auto-provisions the
profile's bucket under storage.profile_root if not yet registered.
EvalSets stay primary-only for now — noted gap, low-risk to extend
later with the same resolver pattern.
## Phase 17 — VRAM-aware two-profile gate
Sidecar gains POST /admin/unload (Ollama keep_alive=0 trick — forces
immediate VRAM release), POST /admin/preload (keep_alive=5m with
empty prompt, takes the slot warm), and GET /admin/vram (combines
nvidia-smi snapshot with Ollama /api/ps). Exposed via aibridge as
unload_model / preload_model / vram_snapshot.
`VectorState.active_profile` is the GPU-slot singleton —
Arc<RwLock<Option<ActiveProfileSlot>>>. activate_profile checks for
a previous profile with a different ollama_name and unloads it
before preloading the new one; same-model reactivations skip the
unload (Ollama no-ops). New routes: POST /vectors/profile/{id}/
deactivate (unload + clear slot), GET /vectors/profile/active.
Verified live: staffing-recruiter (qwen2.5) → docs-assistant
(mistral) swap freed qwen2.5 from VRAM and loaded mistral. nomic-
embed-text persists across swaps because both profiles use it —
free optimization that fell out of the design. Scoped search
correctly 403s cross-profile in both directions.
## MySQL streaming connector
`crates/ingestd/src/my_stream.rs` mirrors pg_stream.rs for MySQL.
Pure-rust `mysql_async` driver (default-features=false to avoid C
deps). Same OFFSET pagination, same Parquet-streaming write shape.
Type mapping per ADR-010: int/bigint → Int32/Int64, decimal/float
→ Float64, tinyint(1)/bool → Boolean, everything else → Utf8 with
fallback parsers for date/time/json/uuid via Display.
POST /ingest/mysql parallel to /ingest/db. Same PII auto-detection,
same lineage capture (source_system="mysql"), same agent-trigger
hook. `redact_dsn` generalized — was hardcoded to "postgresql://"
length, now works for any scheme://user:pass@host/path URL (latent
PII leak fix for MySQL DSNs).
Verified live against MariaDB on localhost: 10 rows × 9 columns of
test data round-tripped through datatypes int/varchar/decimal/
tinyint/datetime/text. PII detection auto-flagged name + email.
Aggregation queries through DataFusion match the source values
exactly.
## Phase 18 — Hybrid Parquet+HNSW ⊕ Lance backend (ADR-019)
`vectord-lance` is a new firewall crate. Lance pulls Arrow 57 and
DataFusion 52 — incompatible with the rest of the workspace's
Arrow 55 / DataFusion 47. The firewall isolates that dep tree:
public API uses only std types (Vec<f32>, Vec<String>, Hit, Row,
*Stats), so no Arrow types cross the crate boundary and nothing
propagates to vectord. The ADR-019 path that didn't ship until now.
`vectord::lance_backend::LanceRegistry` lazy-creates a
LanceVectorStore per index, resolving bucket → URI via the
conventional local-bucket layout. `IndexMeta.vector_backend` and
`ModelProfile.vector_backend` carry the choice (default Parquet so
existing indexes unchanged).
Six routes under /vectors/lance/*:
- migrate/{idx}: convert binary-blob Parquet → Lance FixedSizeList
- index/{idx}: build IVF_PQ
- search/{idx}: vector search (embed via sidecar)
- doc/{idx}/{doc_id}: random row fetch
- append/{idx}: native fragment append
- stats/{idx}: row count + index presence
Verified live on the real resumes_100k_v2 corpus (100K × 768d):
- Migrate: 0.57s
- Build IVF_PQ index: 16.2s (matches ADR-019 bench; 14× faster than
HNSW's 230s for the same data)
- Search end-to-end (Ollama embed + Lance scan): 23-53ms
- Random doc_id fetch: 5-7ms (filter scan; faster than Parquet's
~35ms full-file scan, slower than the bench's 311us positional
take — would close that gap with a scalar btree on doc_id)
- Append 100 rows: 3.3ms / +320KB on disk vs Parquet's required
full ~330MB rewrite — the structural win
- Index survives append; both backends coexist cleanly
## Known follow-ups not in this milestone
- ModelProfile.vector_backend doesn't yet auto-route /vectors/profile/
{id}/search to Lance; callers go through /vectors/lance/* directly
- Scalar btree on doc_id (closes the 5-7ms → ~300us gap)
- vectord-lance built default-features=false → no S3 yet
- IVF_PQ recall not measured (ADR-019 caveat) — needs a Lance-aware
variant of the eval harness
- Watcher-path ingest doesn't push agent triggers (HTTP paths do)
- EvalSets still primary-only (federation gap)
- No PATCH endpoint to move an existing index between buckets
- The pre-existing storaged::append_log doctest fails to compile
(malformed `{prefix}/` parses as code fence) — pre-existing bug,
left for a focused fix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone benchmark crate `crates/lance-bench` running Lance 4.0 against
our Parquet+HNSW at 100K × 768d (resumes_100k_v2) measured 8 dimensions.
Results (see docs/ADR-019-vector-storage.md for full scorecard):
Cold load: Parquet 0.17s vs Lance 0.13s (tie — not ≥2× threshold)
Disk size: 330.3 MB vs 330.4 MB (tie)
Search p50: 873us vs 2229us (Parquet 2.55× faster)
Search p95: 1413us vs 4998us (Parquet 3.54× faster)
Index build: 230s (ec=80) vs 16s (IVF_PQ) (Lance 14× faster)
Random access: 35ms (scan) vs 311us (Lance 112× faster)
Append 10K rows: full rewrite vs 0.08s/+31MB (Lance structural win)
Decision (ADR-019): hybrid, not migrate-or-reject.
- Parquet+HNSW stays primary — our HNSW at ec=80 es=30 recall=1.00 is
2.55× faster than Lance IVF_PQ at 100K in-RAM scale
- Lance joins as second backend per-profile for workloads where it wins
architecturally: random row access (RAG text fetch), append-heavy
pipelines (Phase C), hot-swap generations (Phase 16, 14× faster
builds), and indexes past the ~5M RAM ceiling
- Phase 17 ModelProfile gets vector_backend: Parquet | Lance field
- Ceiling table in PRD updated — 5M ceiling now says "switch to Lance"
instead of "migrate" since Lance runs alongside, not instead of
Isolation: lance-bench is a standalone workspace crate with its own dep
tree (Lance pulls DataFusion 52 + Arrow 57 incompatible with main stack
DataFusion 47 + Arrow 55). Kept off the critical path until API is
stable enough to promote into vectord::lance_store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- POST /ingest/postgres/tables — list all tables in a database
- POST /ingest/postgres/import — import table → Parquet → catalog → queryable
- Auto type mapping: int2/4/8 → Int, float4/8 → Float64, bool → Boolean,
text/varchar/jsonb/timestamp → Utf8 (safe default per ADR-010)
- Auto PII detection + lineage on import
- Empty password support for trust auth
- Tested: imported lab_trials (40 rows, 10 cols) and threat_intel (20 rows, 30 cols)
from local knowledge_base Postgres database — immediately queryable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- journald crate: immutable event log for every data mutation
- Events: entity_type, entity_id, field, action, old_value, new_value,
actor, source, workspace_id, timestamp
- In-memory buffer with configurable flush threshold (default 100 events)
- Flush writes events as Parquet to journal/ directory
- Query: GET /journal/history/{entity_id} — full history of any record
- Query: GET /journal/recent?limit=50 — latest events across all entities
- Convenience methods: record_insert, record_update, record_ingest
- Stats: GET /journal/stats — buffer size, persisted file count
- Manual flush: POST /journal/flush
- Per ADR-012: events are never modified or deleted
This is the single most important future-proofing decision.
Once history is lost, it's gone forever.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- queryd: SessionContext with custom URL scheme to avoid path doubling with LocalFileSystem
- queryd: ListingTable registration from catalog ObjectRefs with schema inference
- queryd: POST /query/sql returns JSON {columns, rows, row_count}
- queryd→catalogd wiring: reads all datasets, registers as named tables
- gateway: wires QueryEngine with shared store + registry
- e2e verified: SELECT *, WHERE/ORDER BY, COUNT/AVG all correct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>