30 Commits

Author SHA1 Message Date
root
2a4b316a15 subjects: 2nd scrum fix wave (token min, chain_tip, tampering, rebuild collision warn)
Second cross-lineage scrum on Steps 5+6 returned 13 distinct findings, 0 convergent.
Three BLOCK-class claims verified as false positives (cache IS written, per-subject
Mutex IS in place, spawn IS safe under writer's lock). Five real fixes shipped:

1. audit_endpoint: legal token min length 16->32 (HMAC-SHA256 best practice, kimi)
2. subject_audit: new chain_tip() returns last hash from full log; audit_endpoint
   now reports chain_root from full chain instead of windowed slice (opus)
3. registry: rebuild loader now warns on sanitize collision (symmetric with
   put_subject's collision guard - opus)
4. audit_endpoint: tampering detection - if manifest expects non-empty chain_root
   but log returns 0 rows, flag chain_verified=false with explicit message (opus)
5. execution_loop::audit_result_state: tightened heuristic - error/denied/not_found
   only classify when no rows/data/results sibling (opus INFO)

Tests: 17 catalogd subject + 6 gateway audit_result_state, all green.
New: audit_result_state_does_not_classify_error_when_data_sibling_present,
     audit_result_state_status_is_authoritative_even_with_data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:00:42 -05:00
root
150cc3b681 aibridge: LRU embed cache - 236x RPS gain on warm workloads. Per architecture_comparison.md universal-win for Rust side. Cache key (model,text), default 4096 entries, in-process inside gateway. Load test: 128 RPS -> 30k+ RPS, p50 78ms -> 129us.
Some checks failed
lakehouse/auditor 20 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:"
2026-05-01 04:45:20 -05:00
root
86123fce4c gateway: /v1/validate endpoint — Phase 43 v3 part 2
Closes the Phase 43 PRD's "any caller can validate" surface. The
validator crate (FillValidator + EmailValidator + PlaybookValidator
+ WorkerLookup) is now reachable over HTTP at /v1/validate.

Request/response:
  POST /v1/validate
    {"kind":"fill"|"email"|"playbook", "artifact":{...}, "context":{...}?}
  → 200 + Report on success
  → 422 + ValidationError on validation failure
  → 400 on bad kind

Boot-time wiring (main.rs):
- Load workers_500k.parquet into a shared Arc<dyn WorkerLookup>
- Path overridable via LH_WORKERS_PARQUET env
- Missing file: warn + fall back to empty InMemoryWorkerLookup so the
  endpoint stays live (validators just fail Consistency on every
  worker-existence check, which is the correct behavior when the
  roster isn't configured)
- Boot log line: "workers parquet loaded from <path>" or
  "workers parquet at <path> not found"
- Live boot timing: 500K rows loaded in ~1.4s

V1State gains `validate_workers: Arc<dyn validator::WorkerLookup>`.
The `_context` JSON key is auto-injected from `request.context` so
callers can either embed `_context` directly in `artifact` or split
it cleanly via the `context` field.

Verified live (gateway + 500K worker snapshot):
  POST {kind:"fill", phantom W-FAKE-99999}    → 422 Consistency
                                                 ("does not exist in
                                                  worker roster")
  POST {kind:"fill", real W-1, "Anyone"}      → 200 OK + Warning
                                                 ("differs from
                                                  roster name 'Donald
                                                  Green'")
  POST {kind:"email", body has 123-45-6789}   → 422 Policy ("SSN-
                                                shaped sequence")
  POST {kind:"nonsense"}                       → 400 Bad Request

The "0→85% with iteration" thesis can now run end-to-end on real
staffing data: an executor emits a fill_proposal, posts to
/v1/validate, gets a structured ValidationError on phantom IDs or
inactive workers, observer-corrects, retries. Closure of that loop
in a scrum harness is the next commit (separate scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 07:40:27 -05:00
root
7c47734287 v1/mode: parameterized runner + 5 enrichment-experiment modes
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
J's directive (2026-04-26): "Create different modes so we can really
dial in the architecture before it gets further along — pinpoint the
failures and strengths equally so I know what direction to go in.
Loop theater happens when we don't pinpoint the most accurate path."

Refactored execute() to switch on mode name → EnrichmentFlags preset.
Five native modes designed as deliberate experiments — each isolates
one architectural axis so the comparison matrix reads off what's
doing work vs what's adding latency for nothing:

  codereview_lakehouse     — all enrichment on (ceiling)
  codereview_null          — raw file + generic prompt (baseline)
  codereview_isolation     — file + pathway only (no matrix)
  codereview_matrix_only   — file + matrix only (no pathway)
  codereview_playbook_only — pathway only, NO file content (lossy ceiling)

Each call appends a row to data/_kb/mode_experiments.jsonl with full
sources + response. LH_MODE_LOG_OFF=1 to suppress.

scripts/mode_experiment.ts — sweeps files × modes serially, prints
live progress with per-call enrichment stats. Defaults to OpenRouter
free model so cloud quota doesn't gate experiments.

scripts/mode_compare.ts — reads the JSONL, outputs per-file matrix
+ per-mode aggregate + mode-vs-baseline win/loss with avg finding
delta. Heuristic finding-count from markdown table rows; pathway
citation count from preamble references.

scrum_master_pipeline.ts gets a mode-runner fast path gated by
LH_USE_MODE_RUNNER=1: try /v1/mode/execute first, fall through to
the existing ladder if response < LH_MODE_MIN_CHARS (default 2000)
or anything errors. Off by default until A/B-validated.

First experiment results (2 files × 5 modes via gpt-oss-120b:free):
  - codereview_null produces 12.6KB response with ZERO findings
    (proves adversarial framing is load-bearing)
  - codereview_playbook_only produces MORE findings than lakehouse
    on average (12 vs 9) at 73% the latency — pathway memory is
    the dominant signal driver
  - codereview_matrix_only underperforms isolation by ~0.5 findings
    while costing the same latency — matrix corpus likely
    underperforming for scrum_review task class

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 01:36:42 -05:00
root
a6c83b03e5 chore: sync Cargo.lock — toml dep for phase-42 rule loader
Pairs retroactively with de8fb10 (truth/ TOML rule loader).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 17:07:16 -05:00
root
0b3bd28cf8 phase-40: Gemini + Claude provider adapters
Phase 40 PRD (docs/CONTROL_PLANE_PRD.md:82-83) listed:
  - crates/aibridge/src/providers/gemini.rs
  - crates/aibridge/src/providers/claude.rs

Neither existed. Landing both now, in gateway/src/v1/ (matches the
existing ollama.rs + openrouter.rs sibling pattern — aibridge's
providers/ is for the adapter *trait* abstractions, v1/ holds the
concrete /v1/chat dispatchers that know the wire format).

gemini.rs:
  - POST https://generativelanguage.googleapis.com/v1beta/models/
    {model}:generateContent?key=<API_KEY>
  - Auth: query-string key (not bearer)
  - Maps messages → contents+parts (Gemini's wire shape),
    extracts from candidates[0].content.parts[0].text
  - 3 tests: key resolution, body serialization (camelCase
    generationConfig + maxOutputTokens), prefix-strip

claude.rs:
  - POST https://api.anthropic.com/v1/messages
  - Auth: x-api-key header + anthropic-version: 2023-06-01
  - Carries system prompt in top-level `system` field (not
    messages[]). Extracts from content[0].text where type=="text"
  - 4 tests: key resolution, body serialization with/without
    system field, prefix-strip

v1/mod.rs:
  + V1State.gemini_key + claude_key Option<String>
  + resolve_provider() strips "gemini/" and "claude/" prefixes
  + /v1/chat dispatcher handles "gemini" + "claude"/"anthropic"
  + 2 new resolve_provider tests (prefix + strip per adapter)

main.rs:
  + Construct both keys at startup via resolve_*_key() helpers.
    Missing keys log at debug (not warn) since these are optional
    providers — unlike OpenRouter which is the rescue rung.

Every /v1/chat error path mirrors the existing pattern:
  - 503 SERVICE_UNAVAILABLE when key isn't configured
  - 502 BAD_GATEWAY with the provider's error text when the
    upstream call fails
  - Response shape always the OpenAI-compatible ChatResponse

Workspace warnings still at 0. 9 new tests pass.

Pre-existing test failure `executor_prompt_includes_surfaced_
candidates` at execution_loop/mod.rs:1550 is unrelated (fails on
pristine HEAD too — PR fixture divergence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:41:31 -05:00
root
2f8b347f37 pathway_memory: consensus-designed sidecar + hot-swap learning loop
Some checks failed
lakehouse/auditor 11 warnings — see review
10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest /
deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b /
qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked
the design across three rounds. Full JSON responses in
data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json.

What it does

Preserves FULL backtrack context per reviewed file (ladder attempts +
latencies + reject reasons, KB chunks with provenance + cosine + rank,
observer signals, context7 bridge hits, sub-pipeline calls, audit
consensus) and indexes them by narrow fingerprint for hot-swap of
proven review pathways.

When scrum reviews a file:
  1. narrow fingerprint = task_class + file_prefix + signal_class
  2. query_hot_swap checks pathway memory for a match that passes
     probation (≥3 replays @ ≥80% success) + audit gate + similarity
     (≥0.90 cosine on normalized-metadata-token embedding)
  3. if hot-swap eligible, recommended model tried first in the ladder
  4. replay outcome reported back, updating the pathway's success_rate
  5. pathways below 0.80 after ≥3 replays retire permanently (sticky)
  6. full PathwayTrace always inserted at end of review — hot-swap
     grows with use, it doesn't bootstrap from nothing

Gate design is load-bearing:
  - narrow fingerprint (6 of 8 consensus models converged on the same
    3-field composition; lock) — enables generalization within crate
  - probation ≥3 replays — binomial tail at 80% is ~5%, below is noise
  - success rate ≥0.80 — mistral + qwen3-coder independently proposed
    this exact threshold across two rounds
  - similarity ≥0.90 — middle of the 0.85/0.95 consensus spread
  - bootstrap: null audit_consensus ALLOWED (auditor → pathway update
    not wired yet; probation + success_rate gates alone enforce safety
    during bootstrap; explicit audit FAIL still blocks)
  - retirement is sticky — prevents oscillation on noise

Files

  + crates/vectord/src/pathway_memory.rs  (new, 600 lines + 18 tests)
    PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit,
    SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory,
    PathwayMemoryStats. 18/18 tests green.
    Cosine + 32-bucket L2-normalized embedding; mirror of TS impl.
  M crates/vectord/src/lib.rs
    pub mod pathway_memory;
  M crates/vectord/src/service.rs
    VectorState grows pathway_memory field;
    4 HTTP handlers (/pathway/insert, /pathway/query,
    /pathway/record_replay, /pathway/stats).
  M crates/gateway/src/main.rs
    Construct PathwayMemory + load from storage on boot,
    wire into VectorState.
  M tests/real-world/scrum_master_pipeline.ts
    Byte-matching TS bucket-hash (verified same bucket indices as
    Rust); pre-ladder hot-swap query; ladder reorder on hit;
    per-attempt latency capture; post-accept trace insert
    (fire-and-forget); replay outcome recording;
    observer /event emits pathway_hot_swap_hit, pathway_similarity,
    rungs_saved per review for the VCP UI.
  M ui/server.ts
    /data/pathway_stats aggregates /vectors/pathway/stats +
    scrum_reviews.jsonl window for the value metric.
  M ui/ui.js
    Three new metric cards:
      · pathway reuse rate (activity: is it firing?)
      · avg rungs saved (value: is it earning its keep?)
      · pathways tracked (stability: retirement = learning)

What's not in this commit (queued)

  - auditor → pathway audit_consensus update wire (explicit audit-fail
    block activates when this lands)
  - bridge_hits + sub_pipeline_calls population from context7 / LLM
    Team extract results (fields wired, callers not yet)
  - replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as
    a separate jsonl for forensic audit of why specific replays failed

Why > summarization

Summaries discard the causal chain. With this, auditor can verify
citation provenance, applier can distinguish lucky from learned paths,
and the matrix indexing actually stores end-to-end pathways instead of
just RAG chunks — which is what J meant by "why aren't we using it
for everything."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 05:15:32 -05:00
root
21fd3b9c61 Scrum-driven fixes: P5-001 auth wired, P42-001 truth evaluator, P9-001 journal on ingest
Some checks failed
lakehouse/auditor 2 blocking issues: cloud: claim not backed — "| **P9-001** (partial) | `crates/ingestd/src/service.rs` | **3 → 6** ↑↑↑ | `journal.record_ing
Apply the highest-confidence findings from the Phase 0→42 forensic sweep
after four scrum-master iterations under the adversarial prompt. Each fix
is independently validated by a later scrum iteration scoring the same
file higher under the same bar.

Code changes
────────────
P5-001 — crates/gateway/src/auth.rs + main.rs
  api_key_auth was marked #[allow(dead_code)] and never wrapped around
  the router, so `[auth] enabled=true` logged a green message and
  enforced nothing. Now wired via from_fn_with_state, with constant-time
  header compare and /health exempted for LB probes.

P42-001 — crates/truth/src/lib.rs
  TruthStore::check() ignored RuleCondition entirely — signature looked
  like enforcement, body returned every action unconditionally. Added
  evaluate(task_class, ctx) that actually walks FieldEquals / FieldEmpty /
  FieldGreater / Always against a serde_json::Value via dot-path lookup.
  check() kept for back-compat. Tests 14 → 24 (10 new exercising real
  pass/fail semantics). serde_json moved to [dependencies].

P9-001 (partial) — crates/ingestd/src/service.rs
  Added Optional<Journal> to IngestState + a journal.record_ingest() call
  on /ingest/file success. Gateway wires it with `journal.clone()` before
  the /journal nest consumes the original. First-ever internal mutation
  journal event verified live (total_events_created 0→1 after probe).

Iter-4 scrum scored these files higher under same prompt:
  ingestd/src/service.rs      3 → 6  (P9-001 visible)
  truth/src/lib.rs            3 → 4  (P42-001 visible)
  gateway/src/auth.rs         3 → 4  (P5-001 visible)
  gateway/src/execution_loop  4 → 6  (indirect)
  storaged/src/federation     3 → 4  (indirect)

Infrastructure additions
────────────────────────
 * tests/real-world/scrum_master_pipeline.ts
   - cloud-first ladder: kimi-k2:1t → deepseek-v3.1:671b → mistral-large-3:675b
     → gpt-oss:120b → devstral-2:123b → qwen3.5:397b (deep final thinker)
   - LH_SCRUM_FORENSIC env: injects SCRUM_FORENSIC_PROMPT.md as adversarial preamble
   - LH_SCRUM_PROPOSAL env: per-iter fix-wave doc override
   - Confidence extraction (markdown + JSON), schema v4 KB rows with:
     verdict, critical_failures_count, verified_components_count,
     missing_components_count, output_format, gradient_tier
   - Model trust profile written per file-accept to data/_kb/model_trust.jsonl
   - Fire-and-forget POST to observer /event so by_source.scrum appears in /stats

 * mcp-server/observer.ts — unchanged in shape, confirmed receiving scrum events

 * ui/ — new Visual Control Plane on :3950
   - Bun.serve with /data/{services,reviews,metrics,trust,overrides,findings,file,refactor_signals,search,logs/:svc,scrum_log}
   - Views: MAP (D3 graph, 5 overlays) / TRACE (per-file iter timeline) /
     TRAJECTORY (refactor signals + reverse index search) / METRICS (explainers
     with SOURCE + GOOD lines) / KB (card grid with tooltips) / CONSOLE (per-service
     journalctl tail, tabs for gateway/sidecar/observer/mcp/ctx7/auditor/langfuse)
   - tryFetch always attempts JSON.parse (fix for observer returning JSON without content-type)
   - renderNodeContext primitive-vs-object guard (fix for gateway /health string)

 * docs/SCRUM_FIX_WAVE.md     — iter-specific scope directing the scrum
 * docs/SCRUM_FORENSIC_PROMPT.md — adversarial audit prompt (verdict/critical/verified schema)
 * docs/SCRUM_LOOP_NOTES.md   — iteration observations + fix-next-loop queue
 * docs/SYSTEM_EVOLUTION_LAYERS.md — Layers 1-10 roadmap (trust profiling, execution DNA, drift sentinel, etc)

Measurements across iterations
──────────────────────────────
 iter 1 (soft prompt, gpt-oss:120b):   mean score 5.00/10
 iter 3 (forensic, kimi-k2:1t):        mean score 3.56/10 (−1.44 — bar raised)
 iter 4 (same bar, post fixes):        mean score 4.00/10 (+0.44 — fixes landed)

 Score movement iter3→iter4: ↑5 ↓1 =12
 21/21 first-attempt accept by kimi-k2:1t in iter 4
 20/21 emitted forensic JSON (richer signal than markdown)
 16 verified_components captured (proof-of-life, new metric)
 Permission Gradient distribution: 0 auto · 16 dry_run · 4 sim · 1 block

 Observer loop: by_source {scrum: 21, langfuse: 1985, phase24_audit: 1}
 v1/usage: 224 requests, 477K tokens, all tracked

Signal classes per file (iter 3 → iter 4):
 CONVERGING:  1 (ingestd/service.rs — fix clearly landed)
 LOOPING:     4 (catalogd/registry, main, queryd/service, vectord/index_registry)
 ORBITING:    1 (truth — novel findings surfacing as surface ones fix)
 PLATEAU:     9 (scores flat with high confidence — diminishing returns)
 MIXED:       6

Loop thesis status
──────────────────
A file's score rises only when the scrum confirms a real fix landed.
No false positives yet across 3 iterations. Fixes applied to 3 files all
raised their independent scores under the same adversarial prompt. Loop
is measurable, not hand-wavy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 02:25:43 -05:00
root
e442d401d2 Update Cargo.lock 2026-04-23 03:02:12 -05:00
root
5ff3213a37 Update Cargo.lock 2026-04-23 01:57:37 -05:00
profit
42a11d35cd Phase 39 (first slice): Ollama Cloud adapter on /v1/chat
Second provider wired. /v1/chat now routes by optional `provider`
field: default "ollama" hits local via sidecar, "ollama_cloud"
(or "cloud") hits ollama.com/api/generate directly with Bearer auth.
Key sourced at gateway startup from OLLAMA_CLOUD_KEY env, then
/root/llm_team_config.json (providers.ollama_cloud.api_key), then
OLLAMA_CLOUD_API_KEY env. Config source matches LLM Team convention.

Shape-identical to scenario.ts::generateCloud — same endpoint, same
body, same Bearer auth. Cloud path bypasses sidecar entirely (sidecar
is local-only by design, mirrors TS agent.ts).

Changes:
- crates/gateway/src/v1/ollama_cloud.rs (new, 130 LOC) — reqwest
  client, resolve_cloud_key(), chat() adapter, CloudGenerateBody /
  CloudGenerateResponse wire shapes
- crates/gateway/src/v1/ollama.rs — flatten_messages_public()
  re-export so sibling adapters reuse the shape collapse
- crates/gateway/src/v1/mod.rs — provider field on ChatRequest,
  dispatch match in chat() handler, ollama_cloud_key on V1State
- crates/gateway/src/main.rs — resolves cloud key at startup,
  logs which source provided it
- crates/gateway/Cargo.toml — reqwest 0.12 with rustls-tls

Verified end-to-end after restart:
- provider=ollama → qwen3.5:latest local (~400ms, Phase 38 unchanged)
- provider=ollama_cloud + model=gpt-oss:120b → real 225-word
  technical response in 5.4s, 313 tokens

Tests: 9/9 green (7 from Phase 38 + 2 new for cloud body serialization
and key resolver shape).

Not in this slice: trait extraction (full Phase 39 scope adds
ProviderAdapter trait + OpenRouter adapter + fallback chain logic).
These land next with Phase 40 routing engine on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:57:42 -05:00
profit
5b1fcf6d27 Phase 28-36 body of work
Accumulated since a6f12e2 (Phase 21 Rust port + Phase 27 versioning):

- Phase 36: embed_semaphore on VectorState (permits=1) serializes
  seed embed calls — prevents sidecar socket collisions under
  concurrent /seed stress load
- Phase 31+: run_stress.ts 6-task diverse stress scaffolding;
  run_e2e_rated.ts + orchestrator.ts tightening
- Catalog dedupe cleanup: 16 duplicate manifests removed; canonical
  candidates.parquet (10.5MB -> 76KB) + placements.parquet (1.2MB ->
  11KB) regenerated post-dedupe; fresh manifests for active datasets
- vectord: harness EvalSet refinements (+181), agent portfolio
  rotation + ingest triggers (+158), autotune + rag adjustments
- catalogd/storaged/ingestd/mcp-server: misc tightening
- docs: Phase 28-36 PRD entries + DECISIONS ADR additions;
  control-plane pivot banner added to top of docs/PRD.md (pointing
  at docs/CONTROL_PLANE_PRD.md which lands in next commit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:41:15 -05:00
root
bd8c30c7bd Public URL: devop.live/lakehouse/proof — SSL, no IP needed
Added nginx proxy: /lakehouse/* → localhost:3700 (agent gateway).
Separate include file so the main llms3 config stays clean.

https://devop.live/lakehouse/proof  — styled proof page
https://devop.live/lakehouse/proof.json — raw verification data
https://devop.live/lakehouse/ — dashboard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 12:41:53 -05:00
root
0d037cfac1 Phases 16.2 + L2 + 17 VRAM gate + MySQL + 18 Lance hybrid milestone
Five threads of work landing as one milestone — all individually
verified end-to-end against real data, full release build clean,
46 unit tests pass.

## Phase 16.2 / 16.5 — autotune agent + ingest triggers

`vectord::agent` is a long-running tokio task that watches the trial
journal and autonomously proposes + runs new HNSW configs. Distinct
from `autotune::run_autotune` (synchronous one-shot grid). Triggered
on POST /vectors/agent/enqueue/{idx} or by the periodic wake; ingest
paths now push DatasetAppended events when an index's source dataset
gets re-ingested. Rate-limited (max_trials_per_hour) and cooldown-
gated so it can't saturate Ollama under live load.

The proposer is ε-greedy around the current champion: with prob 0.25
sample random from full bounds, otherwise perturb champion ± small
delta on both axes. Dedup against history. Deterministic — RNG seeded
from history.len() so the same journal state proposes the same next
config (helps offline replay debugging).

`[agent]` config section in lakehouse.toml; opt-in via enabled=true.

## Federation Layer 2 — runtime bucket lifecycle + per-index scoping

`BucketRegistry.buckets` moved to `std::sync::RwLock<HashMap>` so
buckets can be added/removed after startup. POST /storage/buckets
provisions at runtime; DELETE /storage/buckets/{name} unregisters
(refuses primary/rescue with 403). Local-backend buckets get their
root directory auto-created.

`IndexMeta.bucket` (default "primary" via serde) records each index's
home bucket. `TrialJournal` and `PromotionRegistry` now hold
Arc<BucketRegistry> + IndexRegistry; they resolve target store per-
index via IndexMeta.bucket. PromotionRegistry::list_all scans every
bucket and dedups by index_name. Pre-federation indexes keep working
unchanged — they just default to primary.

`ModelProfile.bucket: Option<String>` declares per-profile artifact
home. POST /vectors/profile/{id}/activate auto-provisions the
profile's bucket under storage.profile_root if not yet registered.

EvalSets stay primary-only for now — noted gap, low-risk to extend
later with the same resolver pattern.

## Phase 17 — VRAM-aware two-profile gate

Sidecar gains POST /admin/unload (Ollama keep_alive=0 trick — forces
immediate VRAM release), POST /admin/preload (keep_alive=5m with
empty prompt, takes the slot warm), and GET /admin/vram (combines
nvidia-smi snapshot with Ollama /api/ps). Exposed via aibridge as
unload_model / preload_model / vram_snapshot.

`VectorState.active_profile` is the GPU-slot singleton —
Arc<RwLock<Option<ActiveProfileSlot>>>. activate_profile checks for
a previous profile with a different ollama_name and unloads it
before preloading the new one; same-model reactivations skip the
unload (Ollama no-ops). New routes: POST /vectors/profile/{id}/
deactivate (unload + clear slot), GET /vectors/profile/active.

Verified live: staffing-recruiter (qwen2.5) → docs-assistant
(mistral) swap freed qwen2.5 from VRAM and loaded mistral. nomic-
embed-text persists across swaps because both profiles use it —
free optimization that fell out of the design. Scoped search
correctly 403s cross-profile in both directions.

## MySQL streaming connector

`crates/ingestd/src/my_stream.rs` mirrors pg_stream.rs for MySQL.
Pure-rust `mysql_async` driver (default-features=false to avoid C
deps). Same OFFSET pagination, same Parquet-streaming write shape.
Type mapping per ADR-010: int/bigint → Int32/Int64, decimal/float
→ Float64, tinyint(1)/bool → Boolean, everything else → Utf8 with
fallback parsers for date/time/json/uuid via Display.

POST /ingest/mysql parallel to /ingest/db. Same PII auto-detection,
same lineage capture (source_system="mysql"), same agent-trigger
hook. `redact_dsn` generalized — was hardcoded to "postgresql://"
length, now works for any scheme://user:pass@host/path URL (latent
PII leak fix for MySQL DSNs).

Verified live against MariaDB on localhost: 10 rows × 9 columns of
test data round-tripped through datatypes int/varchar/decimal/
tinyint/datetime/text. PII detection auto-flagged name + email.
Aggregation queries through DataFusion match the source values
exactly.

## Phase 18 — Hybrid Parquet+HNSW ⊕ Lance backend (ADR-019)

`vectord-lance` is a new firewall crate. Lance pulls Arrow 57 and
DataFusion 52 — incompatible with the rest of the workspace's
Arrow 55 / DataFusion 47. The firewall isolates that dep tree:
public API uses only std types (Vec<f32>, Vec<String>, Hit, Row,
*Stats), so no Arrow types cross the crate boundary and nothing
propagates to vectord. The ADR-019 path that didn't ship until now.

`vectord::lance_backend::LanceRegistry` lazy-creates a
LanceVectorStore per index, resolving bucket → URI via the
conventional local-bucket layout. `IndexMeta.vector_backend` and
`ModelProfile.vector_backend` carry the choice (default Parquet so
existing indexes unchanged).

Six routes under /vectors/lance/*:
- migrate/{idx}: convert binary-blob Parquet → Lance FixedSizeList
- index/{idx}: build IVF_PQ
- search/{idx}: vector search (embed via sidecar)
- doc/{idx}/{doc_id}: random row fetch
- append/{idx}: native fragment append
- stats/{idx}: row count + index presence

Verified live on the real resumes_100k_v2 corpus (100K × 768d):
- Migrate: 0.57s
- Build IVF_PQ index: 16.2s (matches ADR-019 bench; 14× faster than
  HNSW's 230s for the same data)
- Search end-to-end (Ollama embed + Lance scan): 23-53ms
- Random doc_id fetch: 5-7ms (filter scan; faster than Parquet's
  ~35ms full-file scan, slower than the bench's 311us positional
  take — would close that gap with a scalar btree on doc_id)
- Append 100 rows: 3.3ms / +320KB on disk vs Parquet's required
  full ~330MB rewrite — the structural win
- Index survives append; both backends coexist cleanly

## Known follow-ups not in this milestone

- ModelProfile.vector_backend doesn't yet auto-route /vectors/profile/
  {id}/search to Lance; callers go through /vectors/lance/* directly
- Scalar btree on doc_id (closes the 5-7ms → ~300us gap)
- vectord-lance built default-features=false → no S3 yet
- IVF_PQ recall not measured (ADR-019 caveat) — needs a Lance-aware
  variant of the eval harness
- Watcher-path ingest doesn't push agent triggers (HTTP paths do)
- EvalSets still primary-only (federation gap)
- No PATCH endpoint to move an existing index between buckets
- The pre-existing storaged::append_log doctest fails to compile
  (malformed `{prefix}/` parses as code fence) — pre-existing bug,
  left for a focused fix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 20:24:46 -05:00
root
76f6fba5de Phase B: Lance pilot — hybrid decision with measured benchmark
Standalone benchmark crate `crates/lance-bench` running Lance 4.0 against
our Parquet+HNSW at 100K × 768d (resumes_100k_v2) measured 8 dimensions.

Results (see docs/ADR-019-vector-storage.md for full scorecard):

  Cold load:        Parquet 0.17s   vs Lance 0.13s   (tie — not ≥2× threshold)
  Disk size:        330.3 MB        vs 330.4 MB      (tie)
  Search p50:       873us           vs 2229us        (Parquet 2.55× faster)
  Search p95:       1413us          vs 4998us        (Parquet 3.54× faster)
  Index build:      230s (ec=80)    vs 16s (IVF_PQ)  (Lance 14× faster)
  Random access:    35ms (scan)     vs 311us         (Lance 112× faster)
  Append 10K rows:  full rewrite    vs 0.08s/+31MB   (Lance structural win)

Decision (ADR-019): hybrid, not migrate-or-reject.

- Parquet+HNSW stays primary — our HNSW at ec=80 es=30 recall=1.00 is
  2.55× faster than Lance IVF_PQ at 100K in-RAM scale
- Lance joins as second backend per-profile for workloads where it wins
  architecturally: random row access (RAG text fetch), append-heavy
  pipelines (Phase C), hot-swap generations (Phase 16, 14× faster
  builds), and indexes past the ~5M RAM ceiling
- Phase 17 ModelProfile gets vector_backend: Parquet | Lance field
- Ceiling table in PRD updated — 5M ceiling now says "switch to Lance"
  instead of "migrate" since Lance runs alongside, not instead of

Isolation: lance-bench is a standalone workspace crate with its own dep
tree (Lance pulls DataFusion 52 + Arrow 57 incompatible with main stack
DataFusion 47 + Arrow 55). Kept off the critical path until API is
stable enough to promote into vectord::lance_store.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 02:37:11 -05:00
root
dbe00d018f Federation foundation + HNSW trial system + Postgres streaming + PRD reframe
Four shipped features and a PRD realignment, all measured end-to-end:

HNSW trial system (Phase 15 horizon item → complete)
- vectord: EmbeddingCache, harness (eval sets + brute-force ground truth),
  TrialJournal, parameterized HnswConfig on build_index_with_config
- /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best,
  /hnsw/evals/{name}/autogen, /hnsw/cache/stats
- Measured on resumes_100k_v2 (100K × 768d): brute-force 44ms -> HNSW 873us
  at 100% recall@10. ec=80 es=30 locked as HnswConfig::default()
- Lower ec values trade recall for build time: 20/30 = 0.96 recall in 8s,
  80/30 = 1.00 recall in 230s

Catalog manifest repair
- catalogd: resync_from_parquet reads parquet footers to restore row_count
  and columns on drifted manifests
- POST /catalog/datasets/{name}/resync + POST /catalog/resync-missing
- All 7 staffing tables recovered to PRD-matching 2,469,278 rows

Federation foundation (ADR-017)
- shared::secrets: SecretsProvider trait + FileSecretsProvider (reads
  /etc/lakehouse/secrets.toml, enforces 0600 perms)
- storaged::registry::BucketRegistry — multi-bucket resolution with
  rescue_bucket read fallback and reachability probing
- storaged::error_journal — bucket op failures visible in one HTTP call
- storaged::append_log — write-once batched append pattern (fixes the RMW
  anti-pattern llms3.com calls out; errors and trial journals both use it)
- /storage/buckets, /storage/errors, /storage/bucket-health,
  /storage/errors/{flush,compact}
- Bucket-aware I/O at /storage/buckets/{bucket}/objects/{*key} with
  X-Lakehouse-Rescue-Used observability headers on fallback

Postgres streaming ingest
- ingestd::pg_stream: DSN parser, batched ORDER BY + LIMIT/OFFSET pagination
  into ArrowWriter, lineage redacts password
- POST /ingest/db — verified against live knowledge_base.team_runs
  (586 rows × 13 cols, 6 batches, 196ms end-to-end)

PRD realignment (2026-04-16)
- Dual use case: staffing analytics + local LLM knowledge substrate
- Removed "multi-tenancy (single-owner system)" from non-goals
- Added invariants 8-11: indexes hot-swappable, per-reader profiles,
  trials-as-data, operational failures findable in one HTTP call
- New phases 16 (hot-swap generations), 17 (model profiles + dataset
  bindings), 18 (Lance vs Parquet+sidecar evaluation)
- Known ceilings table documents the 5M vector wall and escape hatches
- ADR-017 (federation), ADR-018 (append-log pattern) added
- EXECUTION_PLAN.md sequences phases B-E with success gates and
  decision rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 01:50:05 -05:00
root
9992b5f135 Database connector: PostgreSQL → Parquet import
- POST /ingest/postgres/tables — list all tables in a database
- POST /ingest/postgres/import — import table → Parquet → catalog → queryable
- Auto type mapping: int2/4/8 → Int, float4/8 → Float64, bool → Boolean,
  text/varchar/jsonb/timestamp → Utf8 (safe default per ADR-010)
- Auto PII detection + lineage on import
- Empty password support for trust auth
- Tested: imported lab_trials (40 rows, 10 cols) and threat_intel (20 rows, 30 cols)
  from local knowledge_base Postgres database — immediately queryable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:14:16 -05:00
root
04770c97eb HNSW vector index: 100K search in 27ms (58x faster than brute-force)
- instant-distance HNSW implementation for approximate nearest neighbors
- HnswStore: build from stored embeddings, in-memory index, thread-safe
- POST /vectors/hnsw/build — build index from Parquet (100K in 35s release)
- POST /vectors/hnsw/search — fast ANN search
- GET /vectors/hnsw/list — list loaded indexes

Benchmark (100K × 768d, release build):
  Brute-force: 1,567ms
  HNSW:           31ms (50x)
  HNSW warm:      27ms (58x)

Build cost: 35s one-time for 100K vectors (release mode)
ef_construction=40, ef_search=50 — good recall/speed balance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:00:50 -05:00
root
6f0f92a9e4 Phase 12: Tool registry — governed business actions for AI agents
- ToolRegistry: named tools with parameter validation and audit logging
- 6 built-in staffing tools:
  search_candidates (skills, city, state, experience, availability)
  get_candidate (by ID)
  revenue_by_client (top N by billed revenue)
  recruiter_performance (placements, revenue per recruiter)
  cold_leads (called N+ times, never placed)
  open_jobs (by vertical, city)
- Each tool: name, description, params, permission level (read/write/admin)
- SQL template with validated parameter substitution
- Full audit trail: every invocation logged with agent, params, result
- Endpoints: GET /tools (list), GET /tools/{name} (schema),
  POST /tools/{name}/call (execute), GET /tools/audit (log)
- Per ADR-015: governed interface before raw SQL for agents

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 09:31:42 -05:00
root
bf7cf96911 Phase 9: Event journal — append-only mutation history
- journald crate: immutable event log for every data mutation
- Events: entity_type, entity_id, field, action, old_value, new_value,
  actor, source, workspace_id, timestamp
- In-memory buffer with configurable flush threshold (default 100 events)
- Flush writes events as Parquet to journal/ directory
- Query: GET /journal/history/{entity_id} — full history of any record
- Query: GET /journal/recent?limit=50 — latest events across all entities
- Convenience methods: record_insert, record_update, record_ingest
- Stats: GET /journal/stats — buffer size, persisted file count
- Manual flush: POST /journal/flush
- Per ADR-012: events are never modified or deleted

This is the single most important future-proofing decision.
Once history is lost, it's gone forever.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 09:09:33 -05:00
root
6a532cb248 Background job system for embedding — fixes 100K timeout
- JobTracker: create/update/complete/fail jobs with progress tracking
- POST /vectors/index now returns immediately with job_id (HTTP 202)
- Embedding runs in tokio::spawn background task
- GET /vectors/jobs/{id} returns live progress (chunks embedded, rate, ETA)
- GET /vectors/jobs lists all jobs
- Progress logged every 100 batches with chunks/sec and ETA
- 100K embedding job running successfully at 44 chunks/sec
- System stays responsive during embedding (queries in 23ms)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 09:03:07 -05:00
root
6df904a03c Phase 8: Hot cache + incremental delta updates
- MemCache: LRU in-memory cache for hot datasets (configurable max, default 16GB)
  Pin/evict/stats endpoints: POST /query/cache/pin, /cache/evict, GET /cache/stats
- Delta store: append-only delta Parquet files for row-level updates
  Write deltas without rewriting base files, merge at query time
- Compaction: POST /query/compact merges deltas into base Parquet
- Query engine: checks cache first, falls back to Parquet, merges deltas
- Benchmarked on 2.47M rows:
  1M row JOIN: 854ms cold → 96ms hot (8.9x speedup)
  100K filter: 62ms cold → 21ms hot (3x speedup)
  1.1M rows cached in 408MB RAM

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:37:28 -05:00
root
26fc98c885 Phase 7: Vector index + RAG pipeline
- vectord crate: chunk → embed → store → search → RAG
- chunker: configurable chunk size + overlap, sentence-boundary aware splitting
- store: embeddings as Parquet (binary blob f32 vectors), portable format
- search: brute-force cosine similarity (works up to ~100K vectors)
- rag: full pipeline — embed question → search index → retrieve context → LLM answer
- Endpoints: POST /vectors/index, /vectors/search, /vectors/rag
- Gateway wired with vectord service
- Tested: 200 candidate resumes indexed in 5.4s, semantic search + RAG working
- 20 unit tests passing (chunker, search, ingestd, shared)
- AI gives honest "no match found" when context doesn't support an answer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:12:28 -05:00
root
bb05c4412e Phase 6: Ingest pipeline — CSV, JSON, PDF, text file support
- ingestd crate: detect file type → parse → schema detection → Parquet → catalog
- CSV: auto-detect column types (int, float, bool, string), handles $, %, commas
  Strips dollar signs from amounts, flexible row parsing, sanitized column names
- JSON: array or newline-delimited, nested object flattening (a.b.c → a_b_c)
- PDF: text extraction via lopdf, one row per page (source_file, page_number, text)
- Text/SMS: line-based ingestion with line numbers
- Dedup: SHA-256 content hash, re-ingest same file = no-op
- Gateway: POST /ingest/file multipart upload, 256MB body limit
- Schema detection per ADR-010: ambiguous types default to String
- 12 unit tests passing (CSV parsing, JSON flattening, type inference, dedup)
- Tested: messy CSV with missing data, dollar amounts, N/A values → queryable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:07:31 -05:00
root
01373c0e45 Phase 5: hardening — gRPC, observability, auth, config
- proto: lakehouse.proto with CatalogService, QueryService, StorageService, AiService
- proto crate: tonic-build codegen from proto definitions
- catalogd: gRPC CatalogService implementation
- gateway: dual HTTP (:3100) + gRPC (:3101) servers
- gateway: OpenTelemetry tracing with stdout exporter
- gateway: API key auth middleware (toggleable)
- shared: TOML config system with typed structs and defaults
- lakehouse.toml config file
- ADR-006 and ADR-007 documented

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:37:07 -05:00
root
50a8c8013f Phase 4: Dioxus frontend with dataset browser and SQL query editor
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:24:15 -05:00
root
239e471223 Phase 3: AI integration with Ollama via Python sidecar
- sidecar: FastAPI app with /embed, /generate, /rerank hitting Ollama
- sidecar: Dockerfile, env var config (EMBED_MODEL, GEN_MODEL, RERANK_MODEL)
- aibridge: reqwest HTTP client with typed request/response structs
- aibridge: Axum proxy endpoints (POST /ai/embed, /ai/generate, /ai/rerank)
- gateway: wires AiClient with SIDECAR_URL env var
- e2e verified: nomic-embed-text returns 768d vectors, qwen2.5 generates text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:53:56 -05:00
root
19bdfab227 Phase 2: DataFusion query engine over Parquet
- queryd: SessionContext with custom URL scheme to avoid path doubling with LocalFileSystem
- queryd: ListingTable registration from catalog ObjectRefs with schema inference
- queryd: POST /query/sql returns JSON {columns, rows, row_count}
- queryd→catalogd wiring: reads all datasets, registers as named tables
- gateway: wires QueryEngine with shared store + registry
- e2e verified: SELECT *, WHERE/ORDER BY, COUNT/AVG all correct

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:48:20 -05:00
root
655b6c0b37 Phase 1: storage + catalog layer
- storaged: object_store backend (LocalFileSystem), PUT/GET/DELETE/LIST endpoints
- shared: arrow_helpers with Parquet roundtrip + schema fingerprinting (2 tests)
- catalogd: in-memory registry with write-ahead manifest persistence to object storage
- catalogd: POST/GET /datasets, GET /datasets/by-name/{name}
- gateway: wires storaged + catalogd with shared object_store state
- Phase tracker updated: Phase 0 + Phase 1 gates passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:15:27 -05:00
root
a52ca841c6 Phase 0: bootstrap Rust workspace
- Cargo workspace with 6 crates: shared, storaged, catalogd, queryd, aibridge, gateway
- shared: types (DatasetId, ObjectRef, SchemaFingerprint, DatasetManifest) + error enum
- gateway: Axum HTTP entrypoint with nested service routers + tracing
- All services expose /health stubs
- justfile with build/test/run recipes
- PRD, phase tracker, and ADR docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 04:59:05 -05:00