lakehouse

Author	SHA1	Message	Date
root	98b6647f2a	gateway: IterateResponse echoes trace_id + enable session_log_path Some checks failed lakehouse/auditor 14 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:" Closes the 2026-05-02 cross-runtime parity gap: Go's validator.IterateResponse carried trace_id back to callers; Rust's didn't. A caller pivoting from response → Langfuse → session log worked on Go but failed on Rust because the join key wasn't visible in the response body. ## Changes crates/gateway/src/v1/iterate.rs: - IterateResponse + IterateFailure gain `trace_id: Option<String>` (skip-serializing-if-none preserves backward-compat for any consumer parsing the response without the field) - Both return sites populated with the resolved trace_id lakehouse.toml: - [gateway].session_log_path set to /tmp/lakehouse-validator/sessions.jsonl — same path Go validatord writes to. The two daemons now co-write one unified longitudinal log; rows tag daemon="gateway" vs daemon="validatord" so producers stay distinguishable in DuckDB queries. Append-write is atomic at the row sizes both runtimes produce, so concurrent writes from both daemons are safe. ## Verification Post-restart of lakehouse.service: POST /v1/iterate with X-Lakehouse-Trace-Id: rust-fix1-test → response.trace_id = "rust-fix1-test" ✓ (was: field absent) → sessions.jsonl latest row daemon=gateway, session_id=rust-fix1-test ✓ (was: no row) Cross-runtime drive — same prompt to Rust :3100 and Go :4110: Rust: trace_id=unified-rust-001, daemon=gateway, accepted Go: trace_id=unified-go-001, daemon=validatord, accepted Same file, distinct daemons, one query covers both: SELECT daemon, COUNT(*) FROM read_json_auto('sessions.jsonl', format='nd') GROUP BY daemon → gateway: 2, validatord: 19 All 4 parity probes still 6/6 + 12/12 + 4/4 + 2/2 against live :3100 + :4110 stacks. Cargo test 4/4 PASS for v1::iterate module. ## Architecture invariant The "unified longitudinal log" thesis is now demonstrated. Operators running both runtimes in production point both daemons at the same session_log_path and DuckDB queries naturally span both producers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 06:24:41 -05:00
root	57bde63a06	gateway: trace-id propagation + coordinator session JSONL (Rust parity) Some checks failed lakehouse/auditor 10 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:" Cross-runtime parity with the Go-side observability wave (commits d6d2fdf + 1a3a82a in golangLAKEHOUSE). The two layers J flagged: the LIVE per-call view (Langfuse) and the LONGITUDINAL forensic view (JSONL queryable via DuckDB). Hard correctness gate (FillValidator phantom-rejection) was already in place; this is the observability on top. ## Trace-id propagation X-Lakehouse-Trace-Id header constant declared in crates/gateway/src/v1/iterate.rs (matches Go's shared.TraceIDHeader byte-for-byte). When set on an inbound /v1/iterate request, the handler reuses it; the chat + validate self-loopback hops forward the same header so chatd's trace emit nests under the parent rather than minting a fresh top-level trace per call. ChatTrace gains a parent_trace_id field. emit_chat_inner skips the trace-create event when parent is set, only emits the generation-create which attaches to the existing trace tree. Result: an iterate session with N retries shows in Langfuse as ONE tree, not N+1 disconnected traces. emit_attempt_span (new) writes one Langfuse span per iteration attempt with input={iteration, model, provider, prompt} and output={verdict, raw, error}. WARNING level on non-accepted verdicts. The returned span id is stamped on the corresponding SessionRecord attempt for cross-log correlation. ## Coordinator session JSONL crates/gateway/src/v1/session_log.rs — new writer matching Go's internal/validator/session_log.go schema byte-for-byte: - SessionRecord with schema=session.iterate.v1 - SessionAttemptRecord per retry - SessionLogger.append: tokio Mutex serialized append-only - Best-effort posture (slog.Warn on error, never blocks request) iterate.rs builds + appends a row on EVERY code path: - accepted: write_session_accepted with grounded_in_roster bool derived from validate_workers WorkerLookup (matches Go's handlers.rosterCheckFor("fill") semantics) - max-iter-exhausted: write_session_failure - infra-error: write_infra_error (so a missing /v1/iterate event never silently disappears from the longitudinal log) [gateway].session_log_path config field (empty = disabled). Production: /var/lib/lakehouse/gateway/sessions.jsonl. Operators who want a unified longitudinal stream can point both Rust and Go loggers at the same path — write-append is safe at the row sizes we produce. ## Cross-runtime parity probe crates/gateway/src/bin/parity_session_log: tiny stdin/stdout helper that round-trips a fixture through SessionRecord serde. golangLAKEHOUSE/scripts/cutover/parity/session_log_parity.sh feeds 4 fixtures through both helpers and diffs the rows after stripping timestamp + daemon (the two fields that legitimately differ between producers). Result: 4/4 byte-equal including the unicode-prompt fixture ("Café résumé ⭐ 你好"). Schema parity holds. The non-trivial-equal guard in the probe rejects the case where both sides fail identically — protecting against a regression where one side silently stops producing valid JSON. ## Verification - cargo test -p gateway --lib: 90/90 PASS (3 new session_log tests including concurrent-append safety) - cargo check --workspace: clean - session_log_parity.sh: 4/4 fixtures byte-equal - Both runtimes can append to the same path; DuckDB sees one stream - The Go-side validatord smoke remains 5/5 (unchanged) ## Architecture invariant Don't propose to "wire trace-id propagation in Rust" or "add Rust session log" — both are now shipped on the demo/post-pr11-polish branch. The longitudinal log + Langfuse tree together cover the multi-call observability concern J flagged 2026-05-02. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:39:29 -05:00
root	ba928b1d64	aibridge: drop Python sidecar from hot path; AiClient → direct Ollama Some checks failed lakehouse/auditor 11 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:" The "drop Python sidecar from Rust aibridge" item from the architecture_comparison decisions tracker. Universal-win cleanup — removes 1 process + 1 runtime + 1 hop from every embed/generate request, with no behavior change. ## What was on the hot path before gateway → AiClient → http://:3200 (FastAPI sidecar) ├── embed.py → http://:11434 (Ollama) ├── generate.py → http://:11434 ├── rerank.py → http://:11434 (loops generate) └── admin.py → http://:11434 (/api/ps + nvidia-smi) The sidecar's hot-path code (~120 LOC across embed.py / generate.py / rerank.py / admin.py) was pure pass-through: each route translated its request body to Ollama's wire format and returned Ollama's response in a sidecar envelope. Zero logic, one full HTTP hop of overhead. ## What's on the hot path now gateway → AiClient → http://:11434 (Ollama directly) Inline rewrites in crates/aibridge/src/client.rs: - embed_uncached: per-text loop to /api/embed; computes dimension from response[0].length (matches the sidecar's prior shape) - generate (direct path): translates GenerateRequest → /api/generate (model, prompt, stream:false, options:{temperature, num_predict}, system, think); maps response → GenerateResponse using Ollama's field names (response, prompt_eval_count, eval_count) - rerank: per-doc loop with the same score-prompt the sidecar used; parses leading number, clamps 0-10, sorts desc - unload_model: /api/generate with prompt:"", keep_alive:0 - preload_model: /api/generate with prompt:" ", keep_alive:"5m", num_predict:1 - vram_snapshot: GET /api/ps + std::process::Command nvidia-smi; same envelope shape as the sidecar's /admin/vram so callers keep parsing - health: GET /api/version, wrapped in a sidecar-shaped envelope ({status, ollama_url, ollama_version}) Public AiClient API is unchanged — Request/Response types untouched. Callers (gateway routes, vectord, etc.) require zero updates. ## Config changes - crates/shared/src/config.rs: default_sidecar_url() bumps to :11434. The TOML field stays `[sidecar].url` for migration compat (operators with existing configs don't need to rename anything). - lakehouse.toml + config/providers.toml: bumped to localhost:11434 with comments explaining the 2026-05-02 transition. ## What stays Python sidecar/sidecar/lab_ui.py (385 LOC) + pipeline_lab.py (503 LOC) are dev-mode Streamlit-shape UIs for prompt experimentation. Not on the runtime hot path; continue running for ad-hoc work. The embed/generate/rerank/admin routes inside sidecar can be retired, but operators who want to keep the sidecar process running for the lab UI face no breakage — those routes still call Ollama and work. ## Verification - cargo check --workspace: clean - cargo test -p aibridge --lib: 32/32 PASS - Live smoke against test gateway on :3199 with new config: /ai/embed → 768-dim vector for "forklift operator" ✓ /v1/chat → provider=ollama, model=qwen2.5:latest, content=OK ✓ - nvidia-smi parsing tested via std::process::Command path - Live `lakehouse.service` (port :3100) NOT yet restarted — deploy step is operator-driven (sudo systemctl restart lakehouse.service) ## Architecture comparison update (Captured separately in golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md decisions tracker.) The "drop Python sidecar" line moves from _open_ to DONE. The Rust process model now has 1 mega-binary instead of 1 mega-binary + 1 sidecar process — a small but real reduction in ops surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:59:47 -05:00
root	8de94eba08	cleanup: bump qwen2.5 → qwen3.5:latest in active defaults Some checks failed lakehouse/auditor 16 blocking issues: cloud: claim not backed — "Verified end-to-end via playwright on devop.live/lakehouse:" stronger local rung is now the small-model-pipeline tier-1 default across both Rust legacy + Go rewrite (cf. golangLAKEHOUSE phase 1). same JSON-clean property as qwen2.5, more capacity. ollama still serves both side-by-side; rollback is a 4-line revert if a workload regresses. active-default sites: - lakehouse.toml [ai] gen_model + rerank_model → qwen3.5:latest - mcp-server/observer.ts diagnose call (Phase 44 /v1/chat path) → qwen3.5:latest - mcp-server/index.ts model roster doc → qwen3.5:latest first - crates/vectord/src/rag.rs ContinuableOpts + RagResponse.model → qwen3.5:latest skipped: execution_loop/mod.rs comments describing historic qwen2.5 tool_call quirks — those are documentation of past behavior, not active defaults. data/_catalog/profiles/*.json are runtime-generated (gitignored), not in scope for tracked changes. cargo check -p vectord: clean. no behavioral change in the audit pipeline — same JSON-clean local model, same think=Some(false) posture, just stronger upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:10:57 -05:00
root	1bee0e4969	Qwen 3 integration + agent plan + playbook loop Pulled qwen3 (8.2B, 40K context, thinking, tool-calling). Created agent-qwen3 profile. Ran structured plan: 5 contracts (16/16 filled via hybrid), 5 intelligence questions (2/5 — same RAG counting gap). Key playbook entry generated: "count/aggregation questions must use /sql not /search. RAG returns 5 chunks from 10K — cannot count the full dataset." This routing rule is now in the playbooks database for future agent runs to learn from. Pattern confirmed across qwen2.5, mistral, AND qwen3: the structured matching path (hybrid SQL+vector) is production-ready across all models. The RAG counting gap is a routing problem, not a model problem — the fix is query classification, not a better model. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:08:48 -05:00
root	9e6002c4d4	S3 backend for Lance — hybrid operates on real MinIO object storage Enabled lance feature "aws" for S3-compatible storage via opendal. BucketRegistry: added with_allow_http(true) for MinIO/non-TLS S3 endpoints (fixes "builder error" on HTTP endpoints). lakehouse.toml gains [[storage.buckets]] name="s3:lakehouse" with S3 backend config. lance_backend.rs: S3 bucket naming convention — buckets with name prefix "s3:" emit s3:// URIs for Lance datasets. AWS_* env vars in the systemd unit provide credentials to Lance's internal object_store. Verified end-to-end on real MinIO with real 100K × 768d vectors: - Migrate Parquet → Lance on S3: 1.7s (vs 0.57s local) - Build IVF_PQ: 16.4s (CPU-bound, essentially same as local) - Search: ~58ms p50 (vs 11ms local — S3 partition reads) - Random doc fetch: 13ms (vs 3.5ms local) - Recall@10: 0.835 (randomized IVF_PQ, consistent with local 0.805) - Total S3 footprint: 637 MiB (vectors + index + lance metadata) The "public storage" claim from the PRD is now proven: the hybrid Parquet+HNSW ⊕ Lance architecture works on S3-compatible object storage, not just local filesystem. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 21:09:42 -05:00
root	0d037cfac1	Phases 16.2 + L2 + 17 VRAM gate + MySQL + 18 Lance hybrid milestone Five threads of work landing as one milestone — all individually verified end-to-end against real data, full release build clean, 46 unit tests pass. ## Phase 16.2 / 16.5 — autotune agent + ingest triggers `vectord::agent` is a long-running tokio task that watches the trial journal and autonomously proposes + runs new HNSW configs. Distinct from `autotune::run_autotune` (synchronous one-shot grid). Triggered on POST /vectors/agent/enqueue/{idx} or by the periodic wake; ingest paths now push DatasetAppended events when an index's source dataset gets re-ingested. Rate-limited (max_trials_per_hour) and cooldown- gated so it can't saturate Ollama under live load. The proposer is ε-greedy around the current champion: with prob 0.25 sample random from full bounds, otherwise perturb champion ± small delta on both axes. Dedup against history. Deterministic — RNG seeded from history.len() so the same journal state proposes the same next config (helps offline replay debugging). `[agent]` config section in lakehouse.toml; opt-in via enabled=true. ## Federation Layer 2 — runtime bucket lifecycle + per-index scoping `BucketRegistry.buckets` moved to `std::sync::RwLock<HashMap>` so buckets can be added/removed after startup. POST /storage/buckets provisions at runtime; DELETE /storage/buckets/{name} unregisters (refuses primary/rescue with 403). Local-backend buckets get their root directory auto-created. `IndexMeta.bucket` (default "primary" via serde) records each index's home bucket. `TrialJournal` and `PromotionRegistry` now hold Arc<BucketRegistry> + IndexRegistry; they resolve target store per- index via IndexMeta.bucket. PromotionRegistry::list_all scans every bucket and dedups by index_name. Pre-federation indexes keep working unchanged — they just default to primary. `ModelProfile.bucket: Option<String>` declares per-profile artifact home. POST /vectors/profile/{id}/activate auto-provisions the profile's bucket under storage.profile_root if not yet registered. EvalSets stay primary-only for now — noted gap, low-risk to extend later with the same resolver pattern. ## Phase 17 — VRAM-aware two-profile gate Sidecar gains POST /admin/unload (Ollama keep_alive=0 trick — forces immediate VRAM release), POST /admin/preload (keep_alive=5m with empty prompt, takes the slot warm), and GET /admin/vram (combines nvidia-smi snapshot with Ollama /api/ps). Exposed via aibridge as unload_model / preload_model / vram_snapshot. `VectorState.active_profile` is the GPU-slot singleton — Arc<RwLock<Option<ActiveProfileSlot>>>. activate_profile checks for a previous profile with a different ollama_name and unloads it before preloading the new one; same-model reactivations skip the unload (Ollama no-ops). New routes: POST /vectors/profile/{id}/ deactivate (unload + clear slot), GET /vectors/profile/active. Verified live: staffing-recruiter (qwen2.5) → docs-assistant (mistral) swap freed qwen2.5 from VRAM and loaded mistral. nomic- embed-text persists across swaps because both profiles use it — free optimization that fell out of the design. Scoped search correctly 403s cross-profile in both directions. ## MySQL streaming connector `crates/ingestd/src/my_stream.rs` mirrors pg_stream.rs for MySQL. Pure-rust `mysql_async` driver (default-features=false to avoid C deps). Same OFFSET pagination, same Parquet-streaming write shape. Type mapping per ADR-010: int/bigint → Int32/Int64, decimal/float → Float64, tinyint(1)/bool → Boolean, everything else → Utf8 with fallback parsers for date/time/json/uuid via Display. POST /ingest/mysql parallel to /ingest/db. Same PII auto-detection, same lineage capture (source_system="mysql"), same agent-trigger hook. `redact_dsn` generalized — was hardcoded to "postgresql://" length, now works for any scheme://user:pass@host/path URL (latent PII leak fix for MySQL DSNs). Verified live against MariaDB on localhost: 10 rows × 9 columns of test data round-tripped through datatypes int/varchar/decimal/ tinyint/datetime/text. PII detection auto-flagged name + email. Aggregation queries through DataFusion match the source values exactly. ## Phase 18 — Hybrid Parquet+HNSW ⊕ Lance backend (ADR-019) `vectord-lance` is a new firewall crate. Lance pulls Arrow 57 and DataFusion 52 — incompatible with the rest of the workspace's Arrow 55 / DataFusion 47. The firewall isolates that dep tree: public API uses only std types (Vec<f32>, Vec<String>, Hit, Row, Stats), so no Arrow types cross the crate boundary and nothing propagates to vectord. The ADR-019 path that didn't ship until now. `vectord::lance_backend::LanceRegistry` lazy-creates a LanceVectorStore per index, resolving bucket → URI via the conventional local-bucket layout. `IndexMeta.vector_backend` and `ModelProfile.vector_backend` carry the choice (default Parquet so existing indexes unchanged). Six routes under /vectors/lance/: - migrate/{idx}: convert binary-blob Parquet → Lance FixedSizeList - index/{idx}: build IVF_PQ - search/{idx}: vector search (embed via sidecar) - doc/{idx}/{doc_id}: random row fetch - append/{idx}: native fragment append - stats/{idx}: row count + index presence Verified live on the real resumes_100k_v2 corpus (100K × 768d): - Migrate: 0.57s - Build IVF_PQ index: 16.2s (matches ADR-019 bench; 14× faster than HNSW's 230s for the same data) - Search end-to-end (Ollama embed + Lance scan): 23-53ms - Random doc_id fetch: 5-7ms (filter scan; faster than Parquet's ~35ms full-file scan, slower than the bench's 311us positional take — would close that gap with a scalar btree on doc_id) - Append 100 rows: 3.3ms / +320KB on disk vs Parquet's required full ~330MB rewrite — the structural win - Index survives append; both backends coexist cleanly ## Known follow-ups not in this milestone - ModelProfile.vector_backend doesn't yet auto-route /vectors/profile/ {id}/search to Lance; callers go through /vectors/lance/* directly - Scalar btree on doc_id (closes the 5-7ms → ~300us gap) - vectord-lance built default-features=false → no S3 yet - IVF_PQ recall not measured (ADR-019 caveat) — needs a Lance-aware variant of the eval harness - Watcher-path ingest doesn't push agent triggers (HTTP paths do) - EvalSets still primary-only (federation gap) - No PATCH endpoint to move an existing index between buckets - The pre-existing storaged::append_log doctest fails to compile (malformed `{prefix}/` parses as code fence) — pre-existing bug, left for a focused fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 20:24:46 -05:00
root	dbe00d018f	Federation foundation + HNSW trial system + Postgres streaming + PRD reframe Four shipped features and a PRD realignment, all measured end-to-end: HNSW trial system (Phase 15 horizon item → complete) - vectord: EmbeddingCache, harness (eval sets + brute-force ground truth), TrialJournal, parameterized HnswConfig on build_index_with_config - /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best, /hnsw/evals/{name}/autogen, /hnsw/cache/stats - Measured on resumes_100k_v2 (100K × 768d): brute-force 44ms -> HNSW 873us at 100% recall@10. ec=80 es=30 locked as HnswConfig::default() - Lower ec values trade recall for build time: 20/30 = 0.96 recall in 8s, 80/30 = 1.00 recall in 230s Catalog manifest repair - catalogd: resync_from_parquet reads parquet footers to restore row_count and columns on drifted manifests - POST /catalog/datasets/{name}/resync + POST /catalog/resync-missing - All 7 staffing tables recovered to PRD-matching 2,469,278 rows Federation foundation (ADR-017) - shared::secrets: SecretsProvider trait + FileSecretsProvider (reads /etc/lakehouse/secrets.toml, enforces 0600 perms) - storaged::registry::BucketRegistry — multi-bucket resolution with rescue_bucket read fallback and reachability probing - storaged::error_journal — bucket op failures visible in one HTTP call - storaged::append_log — write-once batched append pattern (fixes the RMW anti-pattern llms3.com calls out; errors and trial journals both use it) - /storage/buckets, /storage/errors, /storage/bucket-health, /storage/errors/{flush,compact} - Bucket-aware I/O at /storage/buckets/{bucket}/objects/{*key} with X-Lakehouse-Rescue-Used observability headers on fallback Postgres streaming ingest - ingestd::pg_stream: DSN parser, batched ORDER BY + LIMIT/OFFSET pagination into ArrowWriter, lineage redacts password - POST /ingest/db — verified against live knowledge_base.team_runs (586 rows × 13 cols, 6 batches, 196ms end-to-end) PRD realignment (2026-04-16) - Dual use case: staffing analytics + local LLM knowledge substrate - Removed "multi-tenancy (single-owner system)" from non-goals - Added invariants 8-11: indexes hot-swappable, per-reader profiles, trials-as-data, operational failures findable in one HTTP call - New phases 16 (hot-swap generations), 17 (model profiles + dataset bindings), 18 (Lance vs Parquet+sidecar evaluation) - Known ceilings table documents the 5M vector wall and escape hatches - ADR-017 (federation), ADR-018 (append-log pattern) added - EXECUTION_PLAN.md sequences phases B-E with success gates and decision rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 01:50:05 -05:00
root	01373c0e45	Phase 5: hardening — gRPC, observability, auth, config - proto: lakehouse.proto with CatalogService, QueryService, StorageService, AiService - proto crate: tonic-build codegen from proto definitions - catalogd: gRPC CatalogService implementation - gateway: dual HTTP (:3100) + gRPC (:3101) servers - gateway: OpenTelemetry tracing with stdout exporter - gateway: API key auth middleware (toggleable) - shared: TOML config system with typed structs and defaults - lakehouse.toml config file - ADR-006 and ADR-007 documented Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 06:37:07 -05:00

9 Commits