Implements GDPR/CCPA-compatible row-level deletion without rewriting
the underlying Parquet. Tombstone markers live beside each dataset and
are applied at query time via a DataFusion view that excludes the
deleted row_key_values.
Schema (shared::types):
- Tombstone { dataset, row_key_column, row_key_value, deleted_at,
actor, reason }
- All tombstones for a dataset must share one row_key_column —
enforced at write so the query-time filter remains a single
WHERE NOT IN (...) clause
Storage (catalogd::tombstones):
- Per-dataset AppendLog at _catalog/tombstones/{dataset}/
- flush_threshold=1 + explicit flush after every append — tombstones
are high-value, low-frequency; durability on return is the contract
- Reuses storaged::append_log infra so compaction is already wired
(POST .../tombstones/compact will work once we expose it)
Catalog (catalogd::registry):
- add_tombstone validates dataset exists + key column compatibility
- list_tombstones for the GET endpoint
- TombstoneStore exposed via Registry::tombstones() for queryd
HTTP (catalogd::service):
- POST /catalog/datasets/by-name/{name}/tombstone
{ row_key_column, row_key_values[], actor, reason }
Returns rows_tombstoned count + per-value failure list (207 on
partial success).
- GET same path lists active tombstones with full audit info.
Query layer (queryd::context):
- Snapshot tombstones-by-dataset before registering tables
- Tombstoned tables: raw goes to "__raw__{name}", public "{name}"
becomes DataFusion view with
SELECT * FROM "__raw__{name}" WHERE CAST(col AS VARCHAR) NOT IN (...)
- CAST AS VARCHAR handles both string and integer key columns
- Untombstoned tables register as before — zero overhead
End-to-end on candidates (100K rows):
- Pick CAND-000001/2/3 (Linda/Charles/Kimberly)
- POST tombstone -> rows_tombstoned: 3
- COUNT(*) drops 100000 -> 99997
- WHERE candidate_id IN (those 3) -> 0 rows
- candidates_safe view transitively excludes them
(Linda+Denver: __raw__candidates=159, candidates_safe=158)
- Restart: COUNT still 99997, 3 tombstones reload from disk
Reversibility: tombstones are reversible deletes, not destruction.
Power users can still query "__raw__{name}" to see deleted rows.
Phase 13 access control is what stops a non-admin from accessing
__raw__* tables.
Limits / follow-up:
- Physical compaction not yet integrated — Phase 8's compact_files
doesn't read tombstones during merge. Tombstoned rows are still
on disk until that integration ships.
- Phase 9 journald event emission for tombstones not wired —
tombstone records carry their own actor+reason+timestamp so the
audit trail is intact, but cross-referencing with the mutation
event log would help compliance reporting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the llms3.com "AI-safe views" pattern: a named projection
that exposes only whitelisted columns, with optional row filter and
per-column redactions. AI agents (or Phase 13 roles) bind to the view;
they can never accidentally see PII even if they write raw SQL.
Schema (shared::types):
- AiView { name, base_dataset, columns: Vec<String>, row_filter,
column_redactions: HashMap<String, Redaction>, ... }
- Redaction enum: Null | Hash | Mask { keep_prefix, keep_suffix }
Catalog (catalogd::registry):
- put_view validates base dataset exists + columns non-empty
- Persists JSON at _catalog/views/{name}.json (sanitized name)
- rebuild() loads views alongside dataset manifests on startup
Query layer (queryd::context):
- build_context registers every AiView as a DataFusion view object
- Constructed SELECT applies whitelist projection, WHERE filter, and
redaction expressions per column
- Mask: substr(prefix) + repeat('*', mid_len) + substr(suffix)
- Hash: digest(value, 'sha256')
- Null: CAST(NULL AS VARCHAR) AS col
- DataFusion handles JOINs/aggregates over the view natively — it's a
real view, not a query rewrite
HTTP (catalogd::service):
- POST /catalog/views (create)
- GET /catalog/views (list)
- GET /catalog/views/{name} (full def)
- DELETE /catalog/views/{name}
End-to-end test on candidates (100K rows, 15 columns):
candidates_safe view:
columns: candidate_id, first_name, city, state, vertical,
skills, years_experience, status
row_filter: status != 'blocked'
redaction: candidate_id mask(prefix=3, suffix=2)
SELECT * FROM candidates_safe LIMIT 5
-> 8 columns only, candidate_id shown as "CAN******01"
(PII fields email/phone/last_name absent from result)
SELECT email FROM candidates_safe
-> fails (column not in projection)
SELECT email FROM candidates
-> succeeds (raw table still accessible by name —
Phase 13 access control is the gate, not the view itself)
Survives restart — view definitions reload from object storage.
Limits / not in MVP:
- View CANNOT shadow base table by name (DataFusion treats them as
separate identifiers; access control must restrict raw-table access)
- row_filter is treated as trusted SQL — operators must validate
before persisting; only authenticated admin path should call put_view
- Redaction expressions assume column is castable to VARCHAR; numeric
redactions could be misleading (a Hash on Int64 returns a hex string
that won't equi-join with another hash on the same value type)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three pieces of the multi-bucket federation made real:
1. Catalog migration (POST /catalog/migrate-buckets)
- One-shot normalizer for ObjectRef.bucket field
- Empty -> "primary"; legacy "data"/"local" -> "primary"
- Idempotent; re-running on canonical state is no-op
- Ran on existing catalog: 12 refs renamed from "data", 2 already
"primary", all 14 now canonical
2. X-Lakehouse-Bucket header middleware on ingest
- resolve_bucket() helper extracts header, returns
(bucket_name, store) or 404 with valid bucket list
- ingest_file and ingest_db_stream now route writes per-request
- Defaults to "primary" when header absent
- pipeline::ingest_file_to_bucket records the actual bucket on the
ObjectRef so catalog stays the source of truth for "where does this
data live"
- Verified: ingest with X-Lakehouse-Bucket: testing lands in
data/_testing/, ingest without header lands in data/, bad header
returns 404 with hint
3. queryd registers every bucket with DataFusion
- QueryEngine now holds Arc<BucketRegistry> instead of single store
- build_context iterates all buckets, registers each as a separate
ObjectStore under URL scheme "lakehouse-{bucket}://"
- ListingTable URLs include the per-object bucket scheme so
DataFusion routes scans automatically based on ObjectRef.bucket
- Profile bucket names like "profile:user" sanitized to
"lakehouse-profile-user" since URL host segments can't contain ":"
- Tolerant of duplicate manifest entries (pre-existing
pipeline::ingest_file behavior creates a fresh dataset id per
ingest); duplicates skipped with debug log
- Backward compat: legacy "lakehouse://data/" URL still registered
pointing at primary
Success gate: cross-bucket CROSS JOIN
SELECT p.name, p.role, a.species
FROM people_test p (bucket: testing)
CROSS JOIN animals a (bucket: primary)
LIMIT 5
returns rows correctly. DataFusion routed each scan to its bucket's
ObjectStore based on the URL scheme.
No regressions: SELECT COUNT(*) FROM candidates still returns 100000
from the primary bucket.
Deferred to Phase 17:
- POST /profile/{user}/activate (HNSW hot-load on profile switch)
- vectord storage paths becoming bucket-scoped (trial journals,
eval sets per-profile)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The chunker's &text[start..end] slice could land inside a multi-byte
UTF-8 character (e.g. narrow no-break space \u{202f}, em-dashes, smart
quotes — universal in pg-imported editorial data). Rust panics on
non-boundary string slicing. In the refresh path that panic is caught
by tokio's task machinery but somehow causes linear memory growth at
~540MB/sec until OOM at 120GB+.
Root cause: chunk boundaries computed by byte arithmetic without
checking is_char_boundary(). The existing "look for last sentence / \n
/ space" logic finds ASCII-safe positions, but the *primary* `end`
calculation `(start + chunk_size).min(text.len())` lands wherever.
Fix:
- ceil_char_boundary(s, idx) — forward-scan to the nearest valid
UTF-8 char boundary. Used at end, actual_end, and next_start.
- Iteration cap — break if iterations exceed text.len(). Any
non-progressing loop dies safely instead of burning memory.
- Forced forward advance — if overlap + boundary math produce a
next_start <= start, force +1 char to guarantee termination.
Reproduced on kb_team_runs (585 pg-imported prompts with editorial
unicode): previous run grew memory linearly to 124GB over 240s then
OOM-killed. Same request after fix: peaks at <100MB, completes in
~4m42s to produce 12,693 embeddings. /vectors/search returns
relevant results.
Regression tests added:
- handles_multibyte_utf8_at_chunk_boundary — exact \u{202f} repro
- no_infinite_loop_on_no_spaces — 5KB text, no whitespace
- no_infinite_loop_on_degenerate_params — chunk_size == overlap
Surfaced by Phase C, but pre-existed as a latent bug since Phase 7.
Any Ollama-targeted RAG corpus with non-ASCII content would have hit
this once it grew past ~13KB per document.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the llms3.com-inspired pattern: embeddings refresh
asynchronously, decoupled from transactional row writes. New rows arrive,
ingest marks the vector index stale, a later refresh embeds only the
delta (doc_ids not already in the index).
Schema additions (DatasetManifest):
- last_embedded_at: Option<DateTime> - when the index was last refreshed
- embedding_stale_since: Option<DateTime> - set when data written, cleared on refresh
- embedding_refresh_policy: Option<RefreshPolicy> - Manual | OnAppend | Scheduled
Ingest paths (pipeline::ingest_file + pg_stream) call
registry.mark_embeddings_stale after writing. No-op if the dataset has
never been embedded — stale semantics only kick in once last_embedded_at
is set.
Refresh pipeline (vectord::refresh::refresh_index):
- Reads the dataset Parquet, extracts (doc_id, text) pairs
- Accepts Utf8 / Int32 / Int64 id columns (covers both CSV and pg schemas)
- Loads existing embeddings via EmbeddingCache (empty on first-time build)
- Filters to rows whose doc_id is NOT in the existing set
- Chunks (chunker::chunk_column), embeds via Ollama (batches of 32),
writes combined index, clears stale flag
Endpoints:
- POST /vectors/refresh/{dataset_name} - body {index_name, id_column,
text_column, chunk_size?, overlap?}
- GET /vectors/stale - lists datasets whose embedding_stale_since is set
End-to-end verified on threat_intel (knowledge_base.threat_intel):
- Initial refresh: 20 rows -> 20 chunks -> embedded in 2.1s,
last_embedded_at set
- Idempotent second refresh: 0 new docs -> 1.8ms (pure delta check)
- Re-ingest to 54 rows: mark_embeddings_stale fires -> stale_since set
- /vectors/stale surfaces threat_intel with timestamps + policy
- Delta refresh: 34 new docs embedded in 970ms (6x faster than full
re-embed); stale_cleared = true
Not in MVP scope:
- UPDATE semantics (same doc_id, different content) - would need
per-row content hashing
- OnAppend policy auto-trigger - just declares intent; actual scheduler
deferred
- Scheduler runtime - the Scheduled(cron) variant declares the intent so
operators can see which datasets expect what, but the cron itself is
separate
Per ADR-019: when a profile switches to vector_backend=Lance, this
refresh path benefits — Lance's native append replaces our "read all +
rewrite" Parquet rebuild pattern. Current MVP works well enough at
~500-5K rows to validate the architecture; Lance unblocks the 5M+ case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone benchmark crate `crates/lance-bench` running Lance 4.0 against
our Parquet+HNSW at 100K × 768d (resumes_100k_v2) measured 8 dimensions.
Results (see docs/ADR-019-vector-storage.md for full scorecard):
Cold load: Parquet 0.17s vs Lance 0.13s (tie — not ≥2× threshold)
Disk size: 330.3 MB vs 330.4 MB (tie)
Search p50: 873us vs 2229us (Parquet 2.55× faster)
Search p95: 1413us vs 4998us (Parquet 3.54× faster)
Index build: 230s (ec=80) vs 16s (IVF_PQ) (Lance 14× faster)
Random access: 35ms (scan) vs 311us (Lance 112× faster)
Append 10K rows: full rewrite vs 0.08s/+31MB (Lance structural win)
Decision (ADR-019): hybrid, not migrate-or-reject.
- Parquet+HNSW stays primary — our HNSW at ec=80 es=30 recall=1.00 is
2.55× faster than Lance IVF_PQ at 100K in-RAM scale
- Lance joins as second backend per-profile for workloads where it wins
architecturally: random row access (RAG text fetch), append-heavy
pipelines (Phase C), hot-swap generations (Phase 16, 14× faster
builds), and indexes past the ~5M RAM ceiling
- Phase 17 ModelProfile gets vector_backend: Parquet | Lance field
- Ceiling table in PRD updated — 5M ceiling now says "switch to Lance"
instead of "migrate" since Lance runs alongside, not instead of
Isolation: lance-bench is a standalone workspace crate with its own dep
tree (Lance pulls DataFusion 52 + Arrow 57 incompatible with main stack
DataFusion 47 + Arrow 55). Kept off the critical path until API is
stable enough to promote into vectord::lance_store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: Dashboard auto-fired 6+ API calls on load, then Ask tab fired
7 DESCRIBE queries per question — 15+ concurrent requests from WASM.
Fixes:
- Schema context cached after first build (7 DESCRIBE → 0 on subsequent questions)
- Dashboard lazy-loads only when tab clicked (not on app mount)
- Default tab changed back to Ask (no background API storm)
- std::sync::Mutex for WASM compat (no tokio in browser)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ResultStore: execute query, store batches server-side, serve pages on demand
- POST /query/paged → returns query_id + total_rows + page count (no rows)
- GET /query/page/{id}/{page}?size=100 → returns one page of rows
- RecordBatch slicing for efficient page extraction from Arrow batches
- LRU eviction: keeps 50 most recent query results in memory
- Tested: 100K rows → 1,000 pages of 100, any page fetchable by number
- Supervisor pattern: chunk results, serve on demand, retry-safe (idempotent GET)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Schema context limited to 7 core staffing tables (was all 12+)
- Results table capped at 200 rows to prevent DOM explosion
- Shows "first 200 of N rows" when truncated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clean_sql now uses 3 strategies in priority order:
1. Extract from ```sql...``` markdown blocks
2. Find first SELECT/WITH/INSERT statement in text
3. Strip leading "sql" keyword fallback
Tested against 5 real model output patterns:
- Clean SQL ✓
- "sql" prefixed ✓
- Markdown fenced ✓
- Explanation before ```sql block ✓
- Explanation with SELECT buried in text ✓
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous: only retried on "Schema error" or "No field named"
Now: retries on any error (type mismatches, execution errors, etc.)
Model gets full error message + schema to write corrected SQL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- clean_sql() strips markdown fences, leading "sql" keyword, trailing explanations
- Schema context now includes table relationships (JOIN paths)
- Explicit note: "vertical only in candidates/clients/job_orders, JOIN for others"
- Full column paths (table.column) in schema to reduce ambiguity
- Auto-retry on schema errors feeds error + schema back to model
- TESTED: 4 questions all return correct results:
"highest avg salary" → IT $2,213 ✓
"top 5 earning over $50/hr" → correct candidates ✓
"most placements by vertical" → Industrial 10,096 ✓
"revenue by client" → 1,996 clients ✓
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Prompt now says "CRITICAL: ONLY use columns from schema, do NOT invent"
- Strips markdown backticks from model output
- Auto-retry: if SQL fails with "Schema error" or "No field named",
feeds the error + schema back to the model for a corrected query
- Both button click and Enter key paths have retry logic
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- POST /ingest/postgres/tables — list all tables in a database
- POST /ingest/postgres/import — import table → Parquet → catalog → queryable
- Auto type mapping: int2/4/8 → Int, float4/8 → Float64, bool → Boolean,
text/varchar/jsonb/timestamp → Utf8 (safe default per ADR-010)
- Auto PII detection + lineage on import
- Empty password support for trust auth
- Tested: imported lab_trials (40 rows, 10 cols) and threat_intel (20 rows, 30 cols)
from local knowledge_base Postgres database — immediately queryable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop CSV/JSON/PDF/text into ./inbox → auto-detected → Parquet → queryable
- Polls every 10 seconds (configurable)
- Processed files moved to ./inbox/processed/
- Failed files moved to ./inbox/failed/
- Dedup: same file dropped twice = no-op
- Watcher starts automatically on gateway boot
- Tested: CSV dropped → queryable in <15s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Schema diff detection: compare old vs new schema, identify changes
(added, removed, type changed, renamed columns)
- Fuzzy rename detection: "first_name" → "full_name" detected by shared word parts
- Auto-generated migration rules: direct map, cast, concat, split, drop
Each rule has confidence score (0.0-1.0)
- AI migration prompt builder: generates LLM prompt for complex schema changes
LLM suggests JSON migration rules when heuristics aren't enough
- 5 new unit tests (detect added, removed, type change, rename, rule generation)
- 30 total unit tests passing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- AccessControl: agent roles with allowed sensitivity levels
- 4 default roles: admin (all), recruiter (PII ok), analyst (financial ok), agent (internal only)
- Field-level masking: determines which columns to mask per agent based on sensitivity
- Query audit log: tracks every query with agent, datasets, PII fields accessed
- Endpoints: GET/POST /access/roles, GET /access/audit, POST /access/check
- Toggleable via config (auth.enabled)
- 100K embedding: supervisor now sustained 125/sec (2.9x vs single pipeline)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ToolRegistry: named tools with parameter validation and audit logging
- 6 built-in staffing tools:
search_candidates (skills, city, state, experience, availability)
get_candidate (by ID)
revenue_by_client (top N by billed revenue)
recruiter_performance (placements, revenue per recruiter)
cold_leads (called N+ times, never placed)
open_jobs (by vertical, city)
- Each tool: name, description, params, permission level (read/write/admin)
- SQL template with validated parameter substitution
- Full audit trail: every invocation logged with agent, params, result
- Endpoints: GET /tools (list), GET /tools/{name} (schema),
POST /tools/{name}/call (execute), GET /tools/audit (log)
- Per ADR-015: governed interface before raw SQL for agents
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- IndexRegistry: tracks all vector indexes with model metadata
(model_name, model_version, dimensions, build stats)
- Index metadata persisted as JSON in vectors/meta/
- Rebuilt on startup for crash recovery
- GET /vectors/indexes — list all indexes (filter by source/model)
- GET /vectors/indexes/{name} — get index metadata
- Background jobs auto-register metadata on completion
- Multi-version support: same data, different models, coexist
- Per ADR-014: enables incremental re-embed on model upgrade
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DatasetManifest expanded: description, owner, sensitivity, columns,
lineage, freshness contract, tags, row_count
- All new fields use #[serde(default)] for backward compatibility
- PII auto-detection: scans column names for email, phone, SSN, salary,
address, DOB, medical terms — flags as PII/PHI/Financial
- Column-level metadata: name, type, sensitivity, is_pii flag
- Lineage tracking: source_system, source_file, ingest_job, timestamp
- Ingest pipeline auto-populates: PII scan, column meta, lineage, row count
- PATCH /catalog/datasets/by-name/{name}/metadata — update metadata
- Catalog responses now include all rich fields
- 25 unit tests passing (5 new PII detection tests)
Per ADR-013: datasets without metadata become mystery files.
This makes every ingested file self-describing from day one.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- journald crate: immutable event log for every data mutation
- Events: entity_type, entity_id, field, action, old_value, new_value,
actor, source, workspace_id, timestamp
- In-memory buffer with configurable flush threshold (default 100 events)
- Flush writes events as Parquet to journal/ directory
- Query: GET /journal/history/{entity_id} — full history of any record
- Query: GET /journal/recent?limit=50 — latest events across all entities
- Convenience methods: record_insert, record_update, record_ingest
- Stats: GET /journal/stats — buffer size, persisted file count
- Manual flush: POST /journal/flush
- Per ADR-012: events are never modified or deleted
This is the single most important future-proofing decision.
Once history is lost, it's gone forever.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- JobTracker: create/update/complete/fail jobs with progress tracking
- POST /vectors/index now returns immediately with job_id (HTTP 202)
- Embedding runs in tokio::spawn background task
- GET /vectors/jobs/{id} returns live progress (chunks embedded, rate, ETA)
- GET /vectors/jobs lists all jobs
- Progress logged every 100 batches with chunks/sec and ETA
- 100K embedding job running successfully at 44 chunks/sec
- System stays responsive during embedding (queries in 23ms)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- WorkspaceManager: create/get/list workspaces with daily/weekly/monthly/pinned tiers
- Saved searches: agent stores SQL queries in workspace context
- Shortlist: tag candidates/records to a workspace with notes
- Activity log: track calls, emails, updates per workspace per agent
- Instant handoff: transfer workspace ownership with full history
Zero data copy — just a pointer swap, receiving agent sees everything
- Persistence: workspaces stored as JSON in object storage, rebuilt on startup
- Endpoints: /workspaces/create, /{id}, /{id}/handoff, /{id}/search,
/{id}/shortlist, /{id}/activity
- Tested: Sarah creates workspace, saves searches, shortlists 3 candidates,
logs activity, hands off to Mike who continues seamlessly
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MemCache: LRU in-memory cache for hot datasets (configurable max, default 16GB)
Pin/evict/stats endpoints: POST /query/cache/pin, /cache/evict, GET /cache/stats
- Delta store: append-only delta Parquet files for row-level updates
Write deltas without rewriting base files, merge at query time
- Compaction: POST /query/compact merges deltas into base Parquet
- Query engine: checks cache first, falls back to Parquet, merges deltas
- Benchmarked on 2.47M rows:
1M row JOIN: 854ms cold → 96ms hot (8.9x speedup)
100K filter: 62ms cold → 21ms hot (3x speedup)
1.1M rows cached in 408MB RAM
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous regex routes for /catalog, /storage, /health intercepted main site.
Now all lakehouse API calls go through /lakehouse/api/ prefix, stripped by nginx rewrite.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- queryd: SessionContext with custom URL scheme to avoid path doubling with LocalFileSystem
- queryd: ListingTable registration from catalog ObjectRefs with schema inference
- queryd: POST /query/sql returns JSON {columns, rows, row_count}
- queryd→catalogd wiring: reads all datasets, registers as named tables
- gateway: wires QueryEngine with shared store + registry
- e2e verified: SELECT *, WHERE/ORDER BY, COUNT/AVG all correct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>