40 Commits

Author SHA1 Message Date
root
92df0e930a ADR-021: semantic-correctness layer on pathway_memory
Spec for the compounding-bug-grammar insight from J's feedback on the
queryd/delta.rs unit-mismatch fix (86901f8). Adds three proposed fields
to PathwayTrace (semantic_flags, type_hints_used, bug_fingerprints),
9 initial SemanticFlag variants, and the truth::evaluate review-time
task_class pattern that reuses existing primitives instead of building
a type-inference engine. Implementation pending approval on the flag
set and fingerprint shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 05:40:59 -05:00
root
39a2856851 docs: rewrite PR #10 description to drop unfalsifiable metric claims
Some checks failed
lakehouse/auditor 1 blocking issue: cloud: claim not backed — "journal event verified live (total_events_created 0→1 after probe)."
Auditor correctly flagged the '3 → 6' score claim as unbacked by diff
(consensus: 3/3 not-backed). The claim referenced scrum_reviews.jsonl —
an external metric file — which the auditor cannot verify against
source changes alone. Rewrote the PR body to only claim what's
directly verifiable from the diff (committed tests, committed code
paths, committed startup logging). Trajectory data remains in
docs/SCRUM_LOOP_NOTES.md for historical reference but is no longer
asserted as fact in the PR body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 03:02:21 -05:00
root
21fd3b9c61 Scrum-driven fixes: P5-001 auth wired, P42-001 truth evaluator, P9-001 journal on ingest
Some checks failed
lakehouse/auditor 2 blocking issues: cloud: claim not backed — "| **P9-001** (partial) | `crates/ingestd/src/service.rs` | **3 → 6** ↑↑↑ | `journal.record_ing
Apply the highest-confidence findings from the Phase 0→42 forensic sweep
after four scrum-master iterations under the adversarial prompt. Each fix
is independently validated by a later scrum iteration scoring the same
file higher under the same bar.

Code changes
────────────
P5-001 — crates/gateway/src/auth.rs + main.rs
  api_key_auth was marked #[allow(dead_code)] and never wrapped around
  the router, so `[auth] enabled=true` logged a green message and
  enforced nothing. Now wired via from_fn_with_state, with constant-time
  header compare and /health exempted for LB probes.

P42-001 — crates/truth/src/lib.rs
  TruthStore::check() ignored RuleCondition entirely — signature looked
  like enforcement, body returned every action unconditionally. Added
  evaluate(task_class, ctx) that actually walks FieldEquals / FieldEmpty /
  FieldGreater / Always against a serde_json::Value via dot-path lookup.
  check() kept for back-compat. Tests 14 → 24 (10 new exercising real
  pass/fail semantics). serde_json moved to [dependencies].

P9-001 (partial) — crates/ingestd/src/service.rs
  Added Optional<Journal> to IngestState + a journal.record_ingest() call
  on /ingest/file success. Gateway wires it with `journal.clone()` before
  the /journal nest consumes the original. First-ever internal mutation
  journal event verified live (total_events_created 0→1 after probe).

Iter-4 scrum scored these files higher under same prompt:
  ingestd/src/service.rs      3 → 6  (P9-001 visible)
  truth/src/lib.rs            3 → 4  (P42-001 visible)
  gateway/src/auth.rs         3 → 4  (P5-001 visible)
  gateway/src/execution_loop  4 → 6  (indirect)
  storaged/src/federation     3 → 4  (indirect)

Infrastructure additions
────────────────────────
 * tests/real-world/scrum_master_pipeline.ts
   - cloud-first ladder: kimi-k2:1t → deepseek-v3.1:671b → mistral-large-3:675b
     → gpt-oss:120b → devstral-2:123b → qwen3.5:397b (deep final thinker)
   - LH_SCRUM_FORENSIC env: injects SCRUM_FORENSIC_PROMPT.md as adversarial preamble
   - LH_SCRUM_PROPOSAL env: per-iter fix-wave doc override
   - Confidence extraction (markdown + JSON), schema v4 KB rows with:
     verdict, critical_failures_count, verified_components_count,
     missing_components_count, output_format, gradient_tier
   - Model trust profile written per file-accept to data/_kb/model_trust.jsonl
   - Fire-and-forget POST to observer /event so by_source.scrum appears in /stats

 * mcp-server/observer.ts — unchanged in shape, confirmed receiving scrum events

 * ui/ — new Visual Control Plane on :3950
   - Bun.serve with /data/{services,reviews,metrics,trust,overrides,findings,file,refactor_signals,search,logs/:svc,scrum_log}
   - Views: MAP (D3 graph, 5 overlays) / TRACE (per-file iter timeline) /
     TRAJECTORY (refactor signals + reverse index search) / METRICS (explainers
     with SOURCE + GOOD lines) / KB (card grid with tooltips) / CONSOLE (per-service
     journalctl tail, tabs for gateway/sidecar/observer/mcp/ctx7/auditor/langfuse)
   - tryFetch always attempts JSON.parse (fix for observer returning JSON without content-type)
   - renderNodeContext primitive-vs-object guard (fix for gateway /health string)

 * docs/SCRUM_FIX_WAVE.md     — iter-specific scope directing the scrum
 * docs/SCRUM_FORENSIC_PROMPT.md — adversarial audit prompt (verdict/critical/verified schema)
 * docs/SCRUM_LOOP_NOTES.md   — iteration observations + fix-next-loop queue
 * docs/SYSTEM_EVOLUTION_LAYERS.md — Layers 1-10 roadmap (trust profiling, execution DNA, drift sentinel, etc)

Measurements across iterations
──────────────────────────────
 iter 1 (soft prompt, gpt-oss:120b):   mean score 5.00/10
 iter 3 (forensic, kimi-k2:1t):        mean score 3.56/10 (−1.44 — bar raised)
 iter 4 (same bar, post fixes):        mean score 4.00/10 (+0.44 — fixes landed)

 Score movement iter3→iter4: ↑5 ↓1 =12
 21/21 first-attempt accept by kimi-k2:1t in iter 4
 20/21 emitted forensic JSON (richer signal than markdown)
 16 verified_components captured (proof-of-life, new metric)
 Permission Gradient distribution: 0 auto · 16 dry_run · 4 sim · 1 block

 Observer loop: by_source {scrum: 21, langfuse: 1985, phase24_audit: 1}
 v1/usage: 224 requests, 477K tokens, all tracked

Signal classes per file (iter 3 → iter 4):
 CONVERGING:  1 (ingestd/service.rs — fix clearly landed)
 LOOPING:     4 (catalogd/registry, main, queryd/service, vectord/index_registry)
 ORBITING:    1 (truth — novel findings surfacing as surface ones fix)
 PLATEAU:     9 (scores flat with high confidence — diminishing returns)
 MIXED:       6

Loop thesis status
──────────────────
A file's score rises only when the scrum confirms a real fix landed.
No false positives yet across 3 iterations. Fixes applied to 3 files all
raised their independent scores under the same adversarial prompt. Loop
is measurable, not hand-wavy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 02:25:43 -05:00
root
4251e94531 Update PHASES.md: Phase 41 + Guard fixes
- Phase 41: ProfileType enum, per-type endpoints
- Guard: scrumaudit.py, fixed watcher.sh + pr-reviewer.md
2026-04-23 03:09:05 -05:00
root
55f8e0fe6e Phase 40: Routing Engine + Policy
- RoutingEngine with RouteDecision (model_pattern → provider)
- config/routing.toml: rules, fallback chain, cost gating
- Per-provider Usage tracking in /v1/usage response
- 12 gateway tests green
2026-04-23 02:36:45 -05:00
root
e27a17e950 Phase 39: Provider Adapter Refactor
- ProviderAdapter trait with chat(), embed(), unload(), health()
- OllamaAdapter wrapping existing AiClient
- OpenRouterAdapter for openrouter.ai API integration
- provider_key() routing by model prefix (openrouter/*, etc)
2026-04-23 02:24:15 -05:00
root
21e8015b60 Phase 37: Hot-swap async + Phase 38: Universal API skeleton
- JobTracker extended with JobType::ProfileActivation + Embed
- activate_profile returns job_id immediately, work spawns in background
- /v1/chat, /v1/usage, /v1/sessions endpoints (OpenAI-compatible)
- Langfuse trace integration (Phase 40 early deliverable)
- 12 gateway unit tests green, curl gates pass
2026-04-23 01:56:17 -05:00
7c1745611a Audit pipeline PR #9: determinism + fact extraction + verifier gate + KB stats + context injection (PR #9)
Bundles PR #9's work for the audit pipeline:

- N=3 consensus on cloud inference (gpt-oss:120b parallel) with qwen3-coder:480b tie-breaker
- audit_discrepancies.jsonl logs N-run disagreements
- scrum_master reviews route through llm_team fact extraction; source="scrum_review"
- Verifier-gated persistence: drops INCORRECT, keeps UNVERIFIABLE/UNCHECKED; schema_version:2
- scrum_master_reviewed flag on accepted reviews
- auditor/kb_stats.ts: on-demand observability script
- claim_parser history/proof pattern class (verified-on-PR, was-flipping, the-proven-X)
- claim_parser quoted-string guard (mirrors static.ts fix)
- fact_extractor project context injection via docs/AUDITOR_CONTEXT.md
- Fixed verifier-verdict parser to handle multiple gemma2 output formats

Empirical: 3-run determinism test on unchanged PR #9 SHA showed 7/7 warn findings stable; block count oscillation eliminated; llm_team quality scores 8-9 on context-injected extract runs.

See PR #9 for full run-by-run commit history.
2026-04-23 05:29:38 +00:00
profit
2a4b81bf48 Phase 45 (first slice): DocRef + doc_refs field on PlaybookEntry
Phase J keeps asking for: playbooks know which external docs they
used, get flagged when those docs drift. This commit ships the data
model; context7 bridge + drift check endpoints land in follow-ups.

Added to crates/vectord/src/playbook_memory.rs:
- pub struct DocRef { tool, version_seen, snippet_hash, source_url,
  seen_at } — one external doc reference
- PlaybookEntry.doc_refs: Vec<DocRef> — empty on legacy entries,
  serde default ensures pre-Phase-45 persisted state loads cleanly
- PlaybookEntry.doc_drift_flagged_at: Option<String> — set by the
  (future) drift-check code when context7 reports newer version
- PlaybookEntry.doc_drift_reviewed_at: Option<String> — set by
  human via /resolve endpoint after reviewing the diagnosis
- impl Default for PlaybookEntry — collapses most test-helper
  constructors from 17 explicit fields to 6-9 fields +
  ..Default::default()

Updated SeedPlaybookRequest + RevisePlaybookRequest (service.rs) to
accept optional doc_refs: the seed/revise endpoints already take the
field, downstream drift detection (Phase 45.2) consumes it.

Docs: docs/CONTROL_PLANE_PRD.md gains full Phase 45 spec with gate
criteria, non-goals, and risk notes.

Tests: 51/51 vectord lib tests green (same count as before, field
additions are backward-compat).

Memory: project_doc_drift_vision.md written so this keeps coming
back to the front of mind.

Next slices (same phase): context7 HTTP bridge in mcp-server,
/vectors/playbook_memory/doc_drift/check/{id} endpoint, overview-
model drift synthesis writing to data/_kb/doc_drift_corrections.jsonl,
boost exclusion for flagged+unreviewed entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 03:14:07 -05:00
profit
6316433062 Phase 40 scope: Langfuse + Gitea MCP recovery as named deliverables
J flagged that a prior version of this stack had Langfuse traces
piping into the observer + Gitea MCP for repo ops — lost. Adding
these as explicit Phase 40 deliverables alongside routing engine
+ Gemini/Claude adapters.

Findings during scope-check:
- Langfuse container is already running (Up 2 days, langfuse:2,
  localhost:3001 healthcheck passes)
- mcp-server/tracing.ts + package.json already have SDK wired
- Credentials pk-lf-staffing / sk-lf-staffing-secret (from env)
- Gitea MCP binary still installed at gitea-mcp@0.0.10

So recovery here is mostly re-connecting existing infra:
1. Add Rust-side Langfuse client for /v1/chat tracing (gateway
   currently bypasses tracing, mcp-server already has it)
2. Wire Langfuse → observer :3800 pipe
3. Register Gitea MCP in mcp-server/index.ts tool list

Each landing as part of Phase 40 when the routing engine ships.
2026-04-22 03:01:28 -05:00
profit
f44b6b3e6b Control-plane pivot: Phase 38-44 plan + bot scaffold
Direction shift 2026-04-22: docs/CONTROL_PLANE_PRD.md becomes the
long-horizon architecture target. Existing Lakehouse (docs/PRD.md,
Phases 0-37) is preserved as the reference implementation and first
consumer. New 6-layer architecture:

  L1 Universal API /v1/chat /v1/usage /v1/sessions /v1/tools /v1/context
  L2 Routing & Policy Engine (rules, fallback chains, cost gating)
  L3 Provider Adapter Layer (Ollama + OpenRouter + Gemini + Claude)
  L4 Knowledge + Memory + Playbooks (already built)
  L5 Execution Loop (scenarios + bot/cycle.ts instances)
  L6 Observability + token accounting

Phases 38-44 sequenced with detailed per-phase specs in the PRD.
Current scope: staffing domain (synthetic workers_500k, contracts,
emails, SMS, playbooks). DevOps (Terraform/Ansible) is long-horizon
target — architecture-compatible but not current.

Files added:
- docs/CONTROL_PLANE_PRD.md — 6-layer architecture, Phase 38-44
  sequencing with staffing-first Truth Layer + Validation pipeline
- bot/ — manual-only PR bot scaffold. First consumer test-bed for
  /v1/chat (Phase 38). Mem0-aligned ADD/UPDATE/NOOP apply semantics;
  KB feedback loop reads prior cycles on same gap and injects into
  cloud prompt so bot cycles compound like scenario.ts runs do.
- tests/multi-agent/run_stress.ts — the 6-task diverse stress test
  referenced in the previous commit but missing from its staging

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:43:31 -05:00
profit
5b1fcf6d27 Phase 28-36 body of work
Accumulated since a6f12e2 (Phase 21 Rust port + Phase 27 versioning):

- Phase 36: embed_semaphore on VectorState (permits=1) serializes
  seed embed calls — prevents sidecar socket collisions under
  concurrent /seed stress load
- Phase 31+: run_stress.ts 6-task diverse stress scaffolding;
  run_e2e_rated.ts + orchestrator.ts tightening
- Catalog dedupe cleanup: 16 duplicate manifests removed; canonical
  candidates.parquet (10.5MB -> 76KB) + placements.parquet (1.2MB ->
  11KB) regenerated post-dedupe; fresh manifests for active datasets
- vectord: harness EvalSet refinements (+181), agent portfolio
  rotation + ingest triggers (+158), autotune + rag adjustments
- catalogd/storaged/ingestd/mcp-server: misc tightening
- docs: Phase 28-36 PRD entries + DECISIONS ADR additions;
  control-plane pivot banner added to top of docs/PRD.md (pointing
  at docs/CONTROL_PLANE_PRD.md which lands in next commit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:41:15 -05:00
profit
a6f12e2609 Phase 21 Rust port + Phase 27 playbook versioning + doc-sync
Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:

  context.rs       — estimate_tokens (chars/4 ceil), context_window_for,
                     assert_context_budget returning a BudgetCheck with
                     numeric diagnostics on both success and overflow.
                     Windows table mirrors config/models.json.
  continuation.rs  — generate_continuable<G: TextGenerator>. Handles the
                     two failure modes: empty-response from thinking
                     models (geometric 2x budget backoff up to budget_cap)
                     and truncated-non-empty (continuation with partial
                     as scratchpad). is_structurally_complete balances
                     braces then JSON.parse-checks. Guards the degen case
                     "all retries empty, don't loop on empty partial".
  tree_split.rs    — generate_tree_split map->reduce with running
                     scratchpad. Per-shard + reduce-prompt go through
                     assert_context_budget first; loud-fails rather than
                     silently truncating. Oldest-digest-first scratchpad
                     truncation at scratchpad_budget (default 6000 t).

TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.

GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).

Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):

  version           u32, default 1
  parent_id         Option<String>, previous version's playbook_id
  superseded_at     Option<String>, set when newer version replaces
  superseded_by     Option<String>, the playbook_id that replaced

New methods:

  revise_entry(parent_id, new_entry) — appends new version, stamps
    superseded_at+superseded_by on parent, inherits parent_id and sets
    version = parent + 1 on the new entry. Rejects revising a retired
    or already-superseded parent (tip-of-chain is the only valid
    revise target).
  history(playbook_id) — returns full chain root->tip from any node.
    Walks parent_id back to root, then superseded_by forward to tip.
    Cycle-safe.

Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.

Endpoints:
  POST /vectors/playbook_memory/revise
  GET  /vectors/playbook_memory/history/{id}

Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:

  - Phase 24 marked shipped (commit b95dd86) with detail of observer
    HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
    rewritten to reflect shipped state.
  - Phase 25 (validity windows, commit e0a843d) added to PHASES +
    PRD.
  - Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
  - Phase 27 entry added to both docs.
  - Phase 19.6 time decay corrected: was documented as "deferred",
    actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
  - Phase E/Phase 8 tombstone-at-compaction limit note updated —
    Phase E.2 closed it.

Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:40:49 -05:00
root
137aed64fb Coherence pass — PRD/PHASES updates, config snapshot wired, unit tests
J flagged the audit: "make sure everything flows coherently, no
pseudocode or unnecessary patches or ignoring any particular part of
what we built." This is that pass.

PRD.md updates:
- Phase 19 refinement block — geo-filter + role-prefilter WIRED with
  citation density numbers (0.32 → 1.38, and 2 → 28 on same scenario).
- Phase 20 rewrite — mistral dropped, qwen3.5 + qwen3 local hot path,
  think:false as the key mechanical finding, kimi-k2.6 upgrade path.
- Phase 21 status block — think plumbing + cloud executor routing
  added after original commit.
- Phase 22 item B (cloud rescue) — pivot sanitizer, rescue verified
  1/3 on stress_01.
- Phase 23 NEW — staffer identity + tool_level + competence-weighted
  retrieval + kb_staffer_report. Auto-discovered worker labels called
  out with real numbers (Rachel Lewis 12× across 4 staffers).
- Phase 24 NEW — Observer/Autotune integration gap DOCUMENTED, not
  fixed. Observer has been idle at 0 ops for 3600+ cycles because
  scenarios hit gateway:3100 directly, bypassing MCP:3700 which the
  observer wraps. This is the honest "we're not using it in these
  tests" signal J surfaced. Fix deferred; gap visible now.

PHASES.md:
- Appended Phases 20-23 as checked, Phase 24 as unchecked gap.
- Updated footer count: 102 unit tests across all layers.
- Latest line updated with 14× citation lift + 46.4pt tool-asymmetry
  finding.

scenario.ts:
- snapshotConfig() was defined but never called. Now fires at every
  scenario start with a stable sha256 hash over the active model set +
  tool_level + cloud flags. config_snapshots.jsonl finally populates,
  which the error_corrections diff path needs to work correctly.

kb.test.ts (new): 4 signature invariant tests — stability across
unrelated fields (date, contract, staffer), sensitivity to role/city/
count changes, digest shape. All pass under `bun test`.

service.rs: 6 Rust extractor tests for extract_target_geo +
extract_target_role — basic, missing-state-returns-none, word
boundary (civilian != city), multi-word role, absent role, quoted
value parse. All pass under `cargo test -p vectord --lib extractor_tests`.

Dangling items now honestly documented rather than silently pending:
- Chunking cache (config/models.json SPEC, not wired) — flagged
- Playbook versioning (SPEC, not wired) — flagged
- Observer integration (WIRED but disconnected) — new Phase 24
2026-04-20 23:29:13 -05:00
root
9c1400d738 Phase 22 — Internal Knowledge Library (KB)
Meta-layer over Phase 19 playbook_memory. Phase 19 answers "which
WORKERS worked for this event"; KB answers "which CONFIG worked for
this playbook signature" — model choice, budget hints, pathway notes,
error corrections.

tests/multi-agent/kb.ts:
- computeSignature(): stable sha256 hash of the (kind, role, count,
  city, state) tuple sequence. Same scenario shape → same sig.
- indexRun(): extracts sig, embeds spec digest via sidecar, appends
  outcome record, upserts signature to data/_kb/signatures.jsonl.
- findNeighbors(): cosine-ranks the k most-similar signatures from
  prior runs for a target spec.
- detectErrorCorrections(): scans outcomes for same-sig fail→succeed
  pairs, diffs the model set, logs to error_corrections.jsonl.
- recommendFor(): feeds target digest + k-NN neighbors + recent
  corrections to the overview model, gets back a structured JSON
  recommendation (top_models, budget_hints, pathway_notes), appends
  to pathway_recommendations.jsonl. JSON-shape constrained so the
  executor can inherit it mechanically.
- loadRecommendation(): at scenario start, pulls newest rec matching
  this sig (or nearest).

scenario.ts:
- Reads KB recommendation at startup (alongside prior lessons).
- Injects pathway_notes into guidanceFor() executor context.
- After retrospective, indexes the run + synthesizes next rec.

Cold-start behavior: first run with no history writes a low-confidence
"no prior data" rec so the signal that something was attempted is
captured. Second run gets "low confidence, 0 neighbors" until a third
distinct sig gives the embedder something to compare against — hence
the upcoming scenario generator.

VERIFIED:
- data/_kb/ populated after one scenario run: 1 outcome (sig=4674…,
  4/5 ok, 16 turns total), 1 signature, 2 recs (cold + post-run).
- Recommendation JSON-parsed cleanly from gpt-oss:20b overview model.

PRD Phase 22 added with file layout, cycle description, and the
rationale for file-based MVP → Rust port progression that matches
how Phase 21 primitives shipped.

What's NOT here yet (batched follow-ups per J's request, tested
between each):
- Lift the k=10 hybrid_search cap to adaptive k=max(count*5, 20)
- Scenario generator to bulk-populate KB with varied signatures
- Rust re-weighting: push playbook_memory success signal INTO
  hybrid_search scoring, not just post-hoc boost
2026-04-20 20:27:12 -05:00
root
0c4868c191 qwen3.5 executor + continuation primitive + think:false
Three coupled fixes that together turned the Riverfront Steel scenario
from 0/5 (mistral) to 4/5 (qwen3.5) with T3 flagging real staffing
concerns rather than linter advice.

MODEL SWAP
- Executor: mistral → qwen3.5:latest (9.7B, 262K ctx, thinking).
  mistral's decoder emitted malformed JSON on complex SQL filters
  regardless of prompt; J called it — stop using mistral.
- Reviewer: qwen2.5 → qwen3:latest (40K ctx)
- Applied to scenario.ts, orchestrator.ts, network_proving.ts,
  run_e2e_rated.ts

CONTINUATION PRIMITIVE (agent.ts)
- generateContinuable(): empty-response → geometric backoff retry;
  truncated-JSON → continue from partial as scratchpad; bounded by
  budget cap + max_continuations. No more "bump max_tokens until it
  stops truncating" tourniquet.
- generateTreeSplit(): map-reduce for oversized input corpora with
  running scratchpad digest, reduce pass for final synthesis.
- Empty text no longer throws — it's a signal to continuable that
  thinking ate the budget.

think:false FOR HOT PATH
- qwen3.5 burned ~650 tokens of hidden thinking for trivial JSON
  emission. For executor/reviewer/draft: think:false. For T3/T4/T5
  overseers: thinking stays on (that's the point).
- Sidecar generate endpoint accepts `think` bool, passes through to
  Ollama's /api/generate.

VERIFIED OUTCOMES
Riverfront Steel 2026-04-21, qwen3.5+continuable+think:false:
  08:00 baseline_fill  3/3  4 turns
  10:30 recurring      2/2  3 turns (1 playbook citation)
  12:15 expansion      0/5  drift-aborted (5-fill orchestration
                            problem, separate work)
  14:00 emergency      4/4  3 turns (1 citation)
  15:45 misplacement   1/1  3 turns
  → T3 caught Patrick Ross double-booking across events
  → T3 flagged forklift cert drift on the event that failed
  → Cross-day lesson proposed "maintain buffer of ≥3 emergency
    candidates, pre-fetch certs for expansion, booking system
    cross-check" — real staffing advice, not generic linter output

PRD PHASE 21 rewritten to reflect the actual primitive shape (two-
call map-reduce with scratchpad glue) instead of the tourniquet
approach originally documented. Rust port queued for next sprint.

scripts/ab_t3_test.sh: A/B harness that chains B→C→D runs and emits
tests/multi-agent/playbooks/ab_scorecard.json.
2026-04-20 20:19:02 -05:00
root
6e7ca1830e Phase 21 foundation — context stability + chunking pipeline
PRD: add Phase 20 (model matrix, wired) and Phase 21 (context stability,
partial). Phase 21 exists because LLM Team hit this exact wall — running
multi-model ranking on large context silently truncated, rankings
degraded, no pipeline caught it. The stable answer: every agent call
goes through a budget check against the model's declared context_window
minus safety_margin, with a declared overflow_policy when the check
fails.

config/models.json:
- context_window + context_budget per tier
- overflow_policies block: summarize_oldest_tool_results_via_t3,
  chunk_lessons_via_cosine_topk, two_pass_map_reduce,
  escalate_to_kimi_k2_1t_or_split_decision
- chunking_cache spec (data/_chunk_cache/, corpus-hash keyed)

agent.ts:
- estimateTokens() chars/4 biased safe ~15%
- CONTEXT_WINDOWS table (fallback; prod reads models.json)
- assertContextBudget() — throws on overflow with exact numbers, can
  bypass with bypass_budget:true for callers with their own policy
- Wired into generate() and generateCloud() so EVERY call is checked

scenario.ts:
- T3 lesson archive to data/_playbook_lessons/*.json (the old
  /vectors/playbook_memory/seed path was silently failing with HTTP 400
  because it requires 'fill: Role xN in City, ST' operation shape)
- loadPriorLessons() at scenario start — filters by city/state match,
  date-sorted, takes top-3
- prior_lessons.json archived per-run (honest signal for A/B)
- guidanceFor() injects up to 2 prior lessons (≤500 chars each) into
  the executor's per-event context
- Retrospective shows explicit "Prior lessons loaded: N" line

Verified: mistral correctly rejects a 150K-char prompt (7532 tokens
over), gpt-oss:120b accepts it with 90K headroom. The enforcement is
in-band on every call now, not an afterthought.

Full chunking service (Rust) remains deferred to the sprint this feeds:
crates/aibridge/src/budget.rs + chunk.rs + storaged/chunk_cache.rs
2026-04-20 19:34:44 -05:00
root
13b01fee9f ADR-021: Sparse data trust path — start with nothing, earn everything
The staffing company said: 'we don't have any of that data.'
They're right. We showed a demo with 18-field profiles and they
have a name and a phone number.

This ADR documents the trust path:
  Phase 1 (Day 1): Work with name + phone + role. That's enough.
  Phase 2 (Week 1-4): Timesheets → reliability. Calls → history.
  Phase 3 (Month 2+): AI starts helping with real earned data.

Key principles:
- Never show empty fields or 0% bars
- Show what's THERE, not what's missing
- Trust indicators: 'based on 3 placements' not just 'Reliability: 87%'
- The system earns data by being useful, not by demanding it upfront

Also created sparse_workers dataset (200 workers, 74% have role,
34% have notes, 5 have ONLY name+phone) for realistic testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:32:06 -05:00
root
937569d188 ADR-020: Universal ID mapping — fix the flat embedding identity problem
THE REAL PROBLEM: Every new data source produces different doc_id
prefixes in vector indexes (W-, W500K-, W5K-, CAND-). Hybrid search
had to hardcode strip_prefix for each one. New datasets broke hybrid
until someone added another prefix. This violates "any data source
without pre-defined schemas."

THE FIX: IndexMeta.id_prefix — the catalog records what prefix each
index uses. Hybrid search reads it and strips automatically. Legacy
indexes fall back to heuristic stripping. New indexes can set
id_prefix=None to use raw IDs (no prefix, no stripping needed).

This means: ingest a new dataset, embed it, hybrid search works
immediately without code changes. The system is truly source-agnostic.

Also: full ADR document at docs/ADR-020-universal-id-mapping.md
with the three options considered and rationale for the chosen approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 11:58:18 -05:00
root
296bdaa746 PRD: hybrid search is operational, Ethereal data integrated
Status updated to reflect hybrid SQL+vector search, IVF_PQ 0.97
recall, 10K Ethereal worker profiles, autonomous agent validation.
Query Paths section updated with the shipped hybrid endpoint and
its verified zero-hallucination results from the staffing simulation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 23:10:56 -05:00
root
3bc82833ac Update PRD + PHASES.md — reflect 8-commit 2026-04-17 push
PRD status line: "Phases 0-18 shipped; hybrid operational; scheduled
ingest live; PDF OCR live; entering horizon items."

PHASES.md: federation L2 items marked complete, Phase 16.2 (autotune
agent), Phase 17 VRAM gate, MySQL connector, Phase 18 (hybrid Lance),
scheduled ingest, PDF OCR all documented with dates and measurements.

Stats updated: 52+ unit tests, 13 crates, 19 ADRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 20:54:05 -05:00
root
4e1c400f5d Phase E.2: Compaction integrates tombstones — physical deletion closes GDPR loop
Phase E gave us soft-delete at query time (tombstones hide rows via a
DataFusion filter view). This completes the invariant: after compact,
tombstoned rows are PHYSICALLY absent from the parquet on disk.

delta::compact changes:
- Signature adds tombstones: &[Tombstone]
- After merging base + deltas, apply_tombstone_filter builds a
  BooleanArray keep-mask per batch (True where row_key_value is NOT
  in the tombstone set) and applies arrow::compute::filter_record_batch
- Supports Utf8, Int32, Int64 key columns (matches refresh.rs coverage
  for pg- and csv-derived schemas)
- CompactResult gains tombstones_applied + rows_dropped_by_tombstones
- Caller clears tombstone store on success

Critical correctness fix surfaced during E2E testing:
The original Phase 8 compact concatenated N independent Parquet byte
streams from record_batch_to_parquet() — each with its own footer.
Parquet readers only see the FIRST footer's data; the rest is invisible.
Latent since Phase 8 shipped; triggered by tombstone-filtering produc-
ing multiple batches. Corrupted candidates.parquet on first test run
(restored from UI fixture copy — good argument for test data in repo).

Fix:
- Single ArrowWriter per compaction, writes every batch into one
  properly-footered Parquet
- Snappy compression to match ingest defaults (otherwise rewrite
  inflated file 3× — 10.5MB → 34MB — because no compression was set)
- Verify-before-swap: parse written buf back to confirm row count
  matches expected; refuses to overwrite base_key if verification fails
- Write to {base_key}.compact-{ts}.tmp first, then to base_key; delete
  temp; only then delete delta files. Any error along the way leaves
  the original base intact.

TombstoneStore::clear(dataset) drops all tombstone batch files and
evicts the per-dataset AppendLog from cache. Called after successful
compact.

QueryEngine::catalog() accessor exposes the Registry so queryd
handlers can reach the tombstone store without routing through gateway
state.

E2E on candidates (100K rows, 15 cols):
- Baseline: 10.59 MB, 100000 rows
- Tombstone CAND-000001/2/3 (soft-delete): 99997 visible, 100000 raw
- Compact: tombstones_applied=3, rows_dropped=3, final_rows=99997
- Post: 10.72 MB (Snappy), valid parquet (1 row_group), 99997 rows
- Restart: persists, tombstones list empty, __raw__candidates also
  99997 (the 3 IDs are physically gone from disk)

PRD invariant close: deletion is now actually deletion, not just
masking. GDPR erasure request → tombstone + schedule compact → data
gone.

Deferred:
- Compact-all-datasets cron (currently manual per-dataset via
  POST /query/compact)
- Compaction of tombstone batch files themselves (they grow at
  flush_threshold=1 per tombstone; TombstoneStore::compact exists
  but not auto-called)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:38:30 -05:00
root
4d5c49090c Phase 16: Hot-swap generations + autotune agent loop
Closes the self-iteration loop from the PRD reframe: an agent can
tune HNSW configs autonomously and the winner flows through to the
next profile activation without human intervention.

Three primitives:

1. PromotionRegistry (vectord::promotion)
   - Per-index current + history at _hnsw_promotions/{index}.json
   - promote(index, entry) atomically swaps current, pushes prior
     onto history (capped at 50)
   - rollback() pops history back onto current; clears current if
     history exhausted
   - config_or(index, default) — the read side used at build time,
     returns promoted config if set else caller's default
   - Full cache + persistence; writes are durable on return

2. Autotune (vectord::autotune)
   - run_autotune(request, ...) — synchronous agent loop
   - Default grid: 5 configs covering the practical range
     (ec=20/40/80/80/160, es=30/30/30/60/30) with seed=42 for
     reproducibility
   - Every trial goes through the existing trial-journal pipeline
     so autotune runs land alongside manual trials in the
     "trials are data" log
   - Winner: max recall first, then min p50 latency; must clear
     min_recall gate (default 0.9) or no promotion happens
   - Config bounds (ec ∈ [10,400], es ∈ [10,200]) reject absurd
     values from the request's optional custom grid
   - On winner: promote with note "autotune winner: recall=X p50=Y"

3. Wiring
   - VectorState gains promotion_registry
   - activate_profile now calls promotion_registry.config_or(...)
     so newly-promoted configs are picked up on next activation —
     the "hot-swap" is: autotune promotes -> profile activates ->
     HNSW rebuilt with new config
   - New endpoints:
       POST /vectors/hnsw/promote/{index}/{trial_id}
             ?promoted_by=...&note=...
       POST /vectors/hnsw/rollback/{index}
       GET  /vectors/hnsw/promoted/{index}
       POST /vectors/hnsw/autotune  { index_name, harness,
                                      min_recall?, grid? }

End-to-end verified on threat_intel_v1 (54 vectors):
- autogen harness 'threat_intel_smoke' (10 queries)
- POST /autotune -> 5 trials in 620ms, winner ec=20 es=30
  recall=1.00 p50=64us auto-promoted
- Manual promote of ec=80 es=30 -> history depth 1
- Rollback -> back to ec=20 es=30 autotune winner
- Second rollback -> current cleared
- Re-promote + restart -> persistence verified
- Profile activation after promotion logged:
  "building HNSW ef_construction=80 ef_search=30 seed=Some(42)"
  proving the hot-swap loop is closed.

Deferred:
- Bayesian optimization (random-grid is fine at this config-space size)
- Append-triggered autotune (Phase 17.5 — refresh OnAppend policy
  can schedule autotune after appending sufficient new rows)
- Concurrent autotune per index guard (JobTracker integration)

PRD invariants satisfied: invariant 8 (hot-swappable indexes) is now
real code — promote is atomic, rollback is always available, the
active generation is a persistent pointer not a runtime convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:26:21 -05:00
root
a293502265 Phase 17: Model profiles + scoped search — the LLM-brain keystone
Implements PRD invariant 9 ("every reader gets its own profile") and
completes the multi-model substrate vision. Local models (or agents)
bind to a named set of datasets; activation pre-loads their vector
indexes into memory; search enforces scope.

Schema (shared::types):
- ModelProfile { id, ollama_name, description, bound_datasets,
                 hnsw_config, embed_model, created_at, created_by }
- ProfileHnswConfig mirrors vectord::trial::HnswConfig to avoid a
  cross-crate dep cycle. Default (ec=80, es=30) matches the Phase 15
  trial winner.
- bound_datasets can reference raw dataset names OR AiView names
  (both register as DataFusion tables with the same name, so mixing
  raw tables and PII-redacted views composes naturally)

Catalog (catalogd::registry):
- put_profile validates id is a slug (alphanumeric + -_ only) and
  every binding resolves to an existing dataset or view
- Persistence at _catalog/profiles/{id}.json, loaded on rebuild
- get_profile / list_profiles / delete_profile

HTTP endpoints:
- POST /catalog/profiles  (create/update)
- GET  /catalog/profiles  (list)
- GET/DELETE /catalog/profiles/{id}
- POST /vectors/profile/{id}/activate  (HNSW hot-load)
- POST /vectors/profile/{id}/search    (scope-enforced)

Activation (vectord::service::activate_profile):
- For each bound dataset, find vector indexes with matching source
- Pre-load embeddings into EmbeddingCache
- Build HNSW with profile's config
- Report warmed indexes + per-binding failures + duration
- Failures on individual bindings don't abort — "substrate keeps
  working" per ADR-017

Scoped search (vectord::service::profile_scoped_search):
- Look up profile, verify index.source ∈ profile.bound_datasets
- Returns 403 with allowed bindings list if out-of-scope
- Uses HNSW if index is warm, brute-force cosine otherwise (graceful
  degradation — no "must activate first" friction)

Bug fix surfaced during testing: vectord::refresh::try_update_index_meta
was a no-op for first-time indexes, so threat_intel_v1 and
kb_team_runs_v1 (both built via refresh after Phase C shipped) didn't
show up in the index registry. Now it auto-infers the source from the
index name convention (`{source}_vN`) and registers new metadata with
reasonable defaults.

End-to-end verified:
- Created security-analyst profile bound to [threat_intel]
- POST /vectors/profile/security-analyst/activate → warmed
  threat_intel_v1 (54 vectors) in 156ms, HNSW built
- Within-scope search: method=hnsw, returned relevant IP indicators
- Out-of-scope: tried to search resumes_100k_v2 (source=candidates)
  → 403 "profile 'security-analyst' is not bound to 'candidates' —
    allowed bindings: [\"threat_intel\"]"
- staffing-recruiter profile created bound to candidates + placements;
  search without activation fell through to brute_force (graceful)

Deferred (Phase 17 followups):
- VRAM-aware activation (unload-then-load via Ollama keep_alive=0)
  — Ollama already handles this; we don't need to reinvent
- Model-identity in audit trail — Phase 13 has role-based audit;
  adding model_id is ~20 LOC when we want it
- Profile bucket pre-load (profile:user bucket mount) — Phase 17.5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:09:43 -05:00
root
d87f2ccac6 Phase E: Soft deletes (tombstones) for compliance-grade row deletion
Implements GDPR/CCPA-compatible row-level deletion without rewriting
the underlying Parquet. Tombstone markers live beside each dataset and
are applied at query time via a DataFusion view that excludes the
deleted row_key_values.

Schema (shared::types):
- Tombstone { dataset, row_key_column, row_key_value, deleted_at,
              actor, reason }
- All tombstones for a dataset must share one row_key_column —
  enforced at write so the query-time filter remains a single
  WHERE NOT IN (...) clause

Storage (catalogd::tombstones):
- Per-dataset AppendLog at _catalog/tombstones/{dataset}/
- flush_threshold=1 + explicit flush after every append — tombstones
  are high-value, low-frequency; durability on return is the contract
- Reuses storaged::append_log infra so compaction is already wired
  (POST .../tombstones/compact will work once we expose it)

Catalog (catalogd::registry):
- add_tombstone validates dataset exists + key column compatibility
- list_tombstones for the GET endpoint
- TombstoneStore exposed via Registry::tombstones() for queryd

HTTP (catalogd::service):
- POST /catalog/datasets/by-name/{name}/tombstone
    { row_key_column, row_key_values[], actor, reason }
  Returns rows_tombstoned count + per-value failure list (207 on
  partial success).
- GET same path lists active tombstones with full audit info.

Query layer (queryd::context):
- Snapshot tombstones-by-dataset before registering tables
- Tombstoned tables: raw goes to "__raw__{name}", public "{name}"
  becomes DataFusion view with
  SELECT * FROM "__raw__{name}" WHERE CAST(col AS VARCHAR) NOT IN (...)
- CAST AS VARCHAR handles both string and integer key columns
- Untombstoned tables register as before — zero overhead

End-to-end on candidates (100K rows):
- Pick CAND-000001/2/3 (Linda/Charles/Kimberly)
- POST tombstone -> rows_tombstoned: 3
- COUNT(*) drops 100000 -> 99997
- WHERE candidate_id IN (those 3) -> 0 rows
- candidates_safe view transitively excludes them
  (Linda+Denver: __raw__candidates=159, candidates_safe=158)
- Restart: COUNT still 99997, 3 tombstones reload from disk

Reversibility: tombstones are reversible deletes, not destruction.
Power users can still query "__raw__{name}" to see deleted rows.
Phase 13 access control is what stops a non-admin from accessing
__raw__* tables.

Limits / follow-up:
- Physical compaction not yet integrated — Phase 8's compact_files
  doesn't read tombstones during merge. Tombstoned rows are still
  on disk until that integration ships.
- Phase 9 journald event emission for tombstones not wired —
  tombstone records carry their own actor+reason+timestamp so the
  audit trail is intact, but cross-referencing with the mutation
  event log would help compliance reporting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:40:48 -05:00
root
09fd446c8d Phase D: AI-safe views — capability-surface projections over base data
Implements the llms3.com "AI-safe views" pattern: a named projection
that exposes only whitelisted columns, with optional row filter and
per-column redactions. AI agents (or Phase 13 roles) bind to the view;
they can never accidentally see PII even if they write raw SQL.

Schema (shared::types):
- AiView { name, base_dataset, columns: Vec<String>, row_filter,
           column_redactions: HashMap<String, Redaction>, ... }
- Redaction enum: Null | Hash | Mask { keep_prefix, keep_suffix }

Catalog (catalogd::registry):
- put_view validates base dataset exists + columns non-empty
- Persists JSON at _catalog/views/{name}.json (sanitized name)
- rebuild() loads views alongside dataset manifests on startup

Query layer (queryd::context):
- build_context registers every AiView as a DataFusion view object
- Constructed SELECT applies whitelist projection, WHERE filter, and
  redaction expressions per column
  - Mask: substr(prefix) + repeat('*', mid_len) + substr(suffix)
  - Hash: digest(value, 'sha256')
  - Null: CAST(NULL AS VARCHAR) AS col
- DataFusion handles JOINs/aggregates over the view natively — it's a
  real view, not a query rewrite

HTTP (catalogd::service):
- POST /catalog/views (create)
- GET  /catalog/views (list)
- GET  /catalog/views/{name} (full def)
- DELETE /catalog/views/{name}

End-to-end test on candidates (100K rows, 15 columns):

  candidates_safe view:
    columns: candidate_id, first_name, city, state, vertical,
             skills, years_experience, status
    row_filter: status != 'blocked'
    redaction: candidate_id mask(prefix=3, suffix=2)

  SELECT * FROM candidates_safe LIMIT 5
    -> 8 columns only, candidate_id shown as "CAN******01"
       (PII fields email/phone/last_name absent from result)

  SELECT email FROM candidates_safe
    -> fails (column not in projection)

  SELECT email FROM candidates
    -> succeeds (raw table still accessible by name —
       Phase 13 access control is the gate, not the view itself)

Survives restart — view definitions reload from object storage.

Limits / not in MVP:
- View CANNOT shadow base table by name (DataFusion treats them as
  separate identifiers; access control must restrict raw-table access)
- row_filter is treated as trusted SQL — operators must validate
  before persisting; only authenticated admin path should call put_view
- Redaction expressions assume column is castable to VARCHAR; numeric
  redactions could be misleading (a Hash on Int64 returns a hex string
  that won't equi-join with another hash on the same value type)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:16:44 -05:00
root
24f1249a62 Federation layer 2: header routing + cross-bucket SQL
Three pieces of the multi-bucket federation made real:

1. Catalog migration (POST /catalog/migrate-buckets)
   - One-shot normalizer for ObjectRef.bucket field
   - Empty -> "primary"; legacy "data"/"local" -> "primary"
   - Idempotent; re-running on canonical state is no-op
   - Ran on existing catalog: 12 refs renamed from "data", 2 already
     "primary", all 14 now canonical

2. X-Lakehouse-Bucket header middleware on ingest
   - resolve_bucket() helper extracts header, returns
     (bucket_name, store) or 404 with valid bucket list
   - ingest_file and ingest_db_stream now route writes per-request
   - Defaults to "primary" when header absent
   - pipeline::ingest_file_to_bucket records the actual bucket on the
     ObjectRef so catalog stays the source of truth for "where does this
     data live"
   - Verified: ingest with X-Lakehouse-Bucket: testing lands in
     data/_testing/, ingest without header lands in data/, bad header
     returns 404 with hint

3. queryd registers every bucket with DataFusion
   - QueryEngine now holds Arc<BucketRegistry> instead of single store
   - build_context iterates all buckets, registers each as a separate
     ObjectStore under URL scheme "lakehouse-{bucket}://"
   - ListingTable URLs include the per-object bucket scheme so
     DataFusion routes scans automatically based on ObjectRef.bucket
   - Profile bucket names like "profile:user" sanitized to
     "lakehouse-profile-user" since URL host segments can't contain ":"
   - Tolerant of duplicate manifest entries (pre-existing
     pipeline::ingest_file behavior creates a fresh dataset id per
     ingest); duplicates skipped with debug log
   - Backward compat: legacy "lakehouse://data/" URL still registered
     pointing at primary

Success gate: cross-bucket CROSS JOIN
  SELECT p.name, p.role, a.species
  FROM people_test p          (bucket: testing)
  CROSS JOIN animals a        (bucket: primary)
  LIMIT 5
returns rows correctly. DataFusion routed each scan to its bucket's
ObjectStore based on the URL scheme.

No regressions: SELECT COUNT(*) FROM candidates still returns 100000
from the primary bucket.

Deferred to Phase 17:
- POST /profile/{user}/activate (HNSW hot-load on profile switch)
- vectord storage paths becoming bucket-scoped (trial journals,
  eval sets per-profile)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:52:32 -05:00
root
97a376482c Phase C: Decoupled embedding refresh
Implements the llms3.com-inspired pattern: embeddings refresh
asynchronously, decoupled from transactional row writes. New rows arrive,
ingest marks the vector index stale, a later refresh embeds only the
delta (doc_ids not already in the index).

Schema additions (DatasetManifest):
- last_embedded_at: Option<DateTime> - when the index was last refreshed
- embedding_stale_since: Option<DateTime> - set when data written, cleared on refresh
- embedding_refresh_policy: Option<RefreshPolicy> - Manual | OnAppend | Scheduled

Ingest paths (pipeline::ingest_file + pg_stream) call
registry.mark_embeddings_stale after writing. No-op if the dataset has
never been embedded — stale semantics only kick in once last_embedded_at
is set.

Refresh pipeline (vectord::refresh::refresh_index):
- Reads the dataset Parquet, extracts (doc_id, text) pairs
- Accepts Utf8 / Int32 / Int64 id columns (covers both CSV and pg schemas)
- Loads existing embeddings via EmbeddingCache (empty on first-time build)
- Filters to rows whose doc_id is NOT in the existing set
- Chunks (chunker::chunk_column), embeds via Ollama (batches of 32),
  writes combined index, clears stale flag

Endpoints:
- POST /vectors/refresh/{dataset_name} - body {index_name, id_column,
  text_column, chunk_size?, overlap?}
- GET /vectors/stale - lists datasets whose embedding_stale_since is set

End-to-end verified on threat_intel (knowledge_base.threat_intel):
- Initial refresh: 20 rows -> 20 chunks -> embedded in 2.1s,
  last_embedded_at set
- Idempotent second refresh: 0 new docs -> 1.8ms (pure delta check)
- Re-ingest to 54 rows: mark_embeddings_stale fires -> stale_since set
- /vectors/stale surfaces threat_intel with timestamps + policy
- Delta refresh: 34 new docs embedded in 970ms (6x faster than full
  re-embed); stale_cleared = true

Not in MVP scope:
- UPDATE semantics (same doc_id, different content) - would need
  per-row content hashing
- OnAppend policy auto-trigger - just declares intent; actual scheduler
  deferred
- Scheduler runtime - the Scheduled(cron) variant declares the intent so
  operators can see which datasets expect what, but the cron itself is
  separate

Per ADR-019: when a profile switches to vector_backend=Lance, this
refresh path benefits — Lance's native append replaces our "read all +
rewrite" Parquet rebuild pattern. Current MVP works well enough at
~500-5K rows to validate the architecture; Lance unblocks the 5M+ case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 03:00:43 -05:00
root
76f6fba5de Phase B: Lance pilot — hybrid decision with measured benchmark
Standalone benchmark crate `crates/lance-bench` running Lance 4.0 against
our Parquet+HNSW at 100K × 768d (resumes_100k_v2) measured 8 dimensions.

Results (see docs/ADR-019-vector-storage.md for full scorecard):

  Cold load:        Parquet 0.17s   vs Lance 0.13s   (tie — not ≥2× threshold)
  Disk size:        330.3 MB        vs 330.4 MB      (tie)
  Search p50:       873us           vs 2229us        (Parquet 2.55× faster)
  Search p95:       1413us          vs 4998us        (Parquet 3.54× faster)
  Index build:      230s (ec=80)    vs 16s (IVF_PQ)  (Lance 14× faster)
  Random access:    35ms (scan)     vs 311us         (Lance 112× faster)
  Append 10K rows:  full rewrite    vs 0.08s/+31MB   (Lance structural win)

Decision (ADR-019): hybrid, not migrate-or-reject.

- Parquet+HNSW stays primary — our HNSW at ec=80 es=30 recall=1.00 is
  2.55× faster than Lance IVF_PQ at 100K in-RAM scale
- Lance joins as second backend per-profile for workloads where it wins
  architecturally: random row access (RAG text fetch), append-heavy
  pipelines (Phase C), hot-swap generations (Phase 16, 14× faster
  builds), and indexes past the ~5M RAM ceiling
- Phase 17 ModelProfile gets vector_backend: Parquet | Lance field
- Ceiling table in PRD updated — 5M ceiling now says "switch to Lance"
  instead of "migrate" since Lance runs alongside, not instead of

Isolation: lance-bench is a standalone workspace crate with its own dep
tree (Lance pulls DataFusion 52 + Arrow 57 incompatible with main stack
DataFusion 47 + Arrow 55). Kept off the critical path until API is
stable enough to promote into vectord::lance_store.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 02:37:11 -05:00
root
dbe00d018f Federation foundation + HNSW trial system + Postgres streaming + PRD reframe
Four shipped features and a PRD realignment, all measured end-to-end:

HNSW trial system (Phase 15 horizon item → complete)
- vectord: EmbeddingCache, harness (eval sets + brute-force ground truth),
  TrialJournal, parameterized HnswConfig on build_index_with_config
- /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best,
  /hnsw/evals/{name}/autogen, /hnsw/cache/stats
- Measured on resumes_100k_v2 (100K × 768d): brute-force 44ms -> HNSW 873us
  at 100% recall@10. ec=80 es=30 locked as HnswConfig::default()
- Lower ec values trade recall for build time: 20/30 = 0.96 recall in 8s,
  80/30 = 1.00 recall in 230s

Catalog manifest repair
- catalogd: resync_from_parquet reads parquet footers to restore row_count
  and columns on drifted manifests
- POST /catalog/datasets/{name}/resync + POST /catalog/resync-missing
- All 7 staffing tables recovered to PRD-matching 2,469,278 rows

Federation foundation (ADR-017)
- shared::secrets: SecretsProvider trait + FileSecretsProvider (reads
  /etc/lakehouse/secrets.toml, enforces 0600 perms)
- storaged::registry::BucketRegistry — multi-bucket resolution with
  rescue_bucket read fallback and reachability probing
- storaged::error_journal — bucket op failures visible in one HTTP call
- storaged::append_log — write-once batched append pattern (fixes the RMW
  anti-pattern llms3.com calls out; errors and trial journals both use it)
- /storage/buckets, /storage/errors, /storage/bucket-health,
  /storage/errors/{flush,compact}
- Bucket-aware I/O at /storage/buckets/{bucket}/objects/{*key} with
  X-Lakehouse-Rescue-Used observability headers on fallback

Postgres streaming ingest
- ingestd::pg_stream: DSN parser, batched ORDER BY + LIMIT/OFFSET pagination
  into ArrowWriter, lineage redacts password
- POST /ingest/db — verified against live knowledge_base.team_runs
  (586 rows × 13 cols, 6 batches, 196ms end-to-end)

PRD realignment (2026-04-16)
- Dual use case: staffing analytics + local LLM knowledge substrate
- Removed "multi-tenancy (single-owner system)" from non-goals
- Added invariants 8-11: indexes hot-swappable, per-reader profiles,
  trials-as-data, operational failures findable in one HTTP call
- New phases 16 (hot-swap generations), 17 (model profiles + dataset
  bindings), 18 (Lance vs Parquet+sidecar evaluation)
- Known ceilings table documents the 5M vector wall and escape hatches
- ADR-017 (federation), ADR-018 (append-log pattern) added
- EXECUTION_PLAN.md sequences phases B-E with success gates and
  decision rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 01:50:05 -05:00
root
8282842eaf Sync memory + phases: all 15 phases marked complete
PHASES.md and project memory updated to reflect actual build state.
Phases 11-14 were built but trackers weren't updated.

Final stats: 11 crates, 30 tests, 16 ADRs, 2.47M rows, 100K vectors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 19:34:15 -05:00
root
6d49f81ebf Add read-mem skill + comprehensive project memory
- /read-mem skill: reads PRD, phases, decisions, checks live services
- Updated PHASES.md with all 15 phases tracked
- Updated project_lakehouse.md memory with full context
- Updated CLAUDE.md with project reference
- Skill at ~/.claude/skills/read-mem/ and project level
- Triggers on: "read mem", "project status", "where were we", "catch me up"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 09:23:01 -05:00
root
354c9c4a04 PRD v3: future-proofing roadmap — event journal, rich catalog, tool registry
Phases 9-15 designed based on "future regret" analysis:
- Phase 9: Event journal (append-only mutation history — can't retrofit)
- Phase 10: Rich catalog v2 (ownership, sensitivity, lineage, freshness)
- Phase 11: Embedding versioning (model-proof vector layer)
- Phase 12: Tool registry (governed agent actions via MCP)
- Phase 13: Security & access control (field-level, row-level, audit)
- Phase 14: Schema evolution with AI migration rules
- Phase 15+: Federated query, DB connectors, OCR, fine-tuned models

8 design principles: store truth openly, describe richly, never destroy
evidence, secure centrally, expose through tools, version everything,
unstructured first-class, separate storage/compute/intelligence.

ADR-012 through ADR-016 documenting key future-proofing decisions.
Updated benchmarks: 2.47M rows, hot cache 9.8x speedup.
Updated operating rules: cheap-now/expensive-later built first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:57:29 -05:00
root
6740a017c7 PRD v2: production roadmap with ingest, vector search, hot cache phases
- Phase 6: Ingest pipeline (CSV/JSON → schema detect → Parquet → catalog)
- Phase 7: Vector index + RAG (embed → HNSW → semantic search → LLM answer)
- Phase 8: Hot cache + incremental updates (MemTable, delta files, merge-on-read)
- ADR-008 through ADR-011: embeddings as Parquet, delta files not Delta Lake,
  schema defaults to string, not a CRM replacement
- Staffing company reference dataset (286K rows, 7 tables)
- Honest risk assessment: vector search at scale and incremental updates are hard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:54:24 -05:00
root
01373c0e45 Phase 5: hardening — gRPC, observability, auth, config
- proto: lakehouse.proto with CatalogService, QueryService, StorageService, AiService
- proto crate: tonic-build codegen from proto definitions
- catalogd: gRPC CatalogService implementation
- gateway: dual HTTP (:3100) + gRPC (:3101) servers
- gateway: OpenTelemetry tracing with stdout exporter
- gateway: API key auth middleware (toggleable)
- shared: TOML config system with typed structs and defaults
- lakehouse.toml config file
- ADR-006 and ADR-007 documented

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:37:07 -05:00
root
50a8c8013f Phase 4: Dioxus frontend with dataset browser and SQL query editor
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:24:15 -05:00
root
239e471223 Phase 3: AI integration with Ollama via Python sidecar
- sidecar: FastAPI app with /embed, /generate, /rerank hitting Ollama
- sidecar: Dockerfile, env var config (EMBED_MODEL, GEN_MODEL, RERANK_MODEL)
- aibridge: reqwest HTTP client with typed request/response structs
- aibridge: Axum proxy endpoints (POST /ai/embed, /ai/generate, /ai/rerank)
- gateway: wires AiClient with SIDECAR_URL env var
- e2e verified: nomic-embed-text returns 768d vectors, qwen2.5 generates text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:53:56 -05:00
root
19bdfab227 Phase 2: DataFusion query engine over Parquet
- queryd: SessionContext with custom URL scheme to avoid path doubling with LocalFileSystem
- queryd: ListingTable registration from catalog ObjectRefs with schema inference
- queryd: POST /query/sql returns JSON {columns, rows, row_count}
- queryd→catalogd wiring: reads all datasets, registers as named tables
- gateway: wires QueryEngine with shared store + registry
- e2e verified: SELECT *, WHERE/ORDER BY, COUNT/AVG all correct

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:48:20 -05:00
root
655b6c0b37 Phase 1: storage + catalog layer
- storaged: object_store backend (LocalFileSystem), PUT/GET/DELETE/LIST endpoints
- shared: arrow_helpers with Parquet roundtrip + schema fingerprinting (2 tests)
- catalogd: in-memory registry with write-ahead manifest persistence to object storage
- catalogd: POST/GET /datasets, GET /datasets/by-name/{name}
- gateway: wires storaged + catalogd with shared object_store state
- Phase tracker updated: Phase 0 + Phase 1 gates passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:15:27 -05:00
root
a52ca841c6 Phase 0: bootstrap Rust workspace
- Cargo workspace with 6 crates: shared, storaged, catalogd, queryd, aibridge, gateway
- shared: types (DatasetId, ObjectRef, SchemaFingerprint, DatasetManifest) + error enum
- gateway: Axum HTTP entrypoint with nested service routers + tracing
- All services expose /health stubs
- justfile with build/test/run recipes
- PRD, phase tracker, and ADR docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 04:59:05 -05:00