52 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7bb66f08c3 |
lance: scrum-driven sanitizer + smoke-gate fixes (opus 2026-05-02 BLOCK)
Some checks failed
lakehouse/auditor 9 blocking issues: cloud: claim not backed — "Verified live (post-restart): scale_test_10m doc-fetch 4-15ms across"
Cross-lineage scrum on the lance wave (4 bundles, 33 distinct findings)
surfaced 1 real BLOCK and 2 real WARNs from opus that the kimi/qwen
lineages missed. Per feedback_cross_lineage_review.md, opus is the
load-bearing reviewer; cross-lineage convergence is noise unless verified.
BLOCK fix — sanitize_lance_err path-stripping was unsound:
err.split("/home/").next().unwrap_or(&err)
returns Some("") when err STARTS with "/home/", erasing the entire
message. Replaced truncation with redact_paths() — a hand-rolled scanner
that walks the input once, replacing path-shaped substrings with
[REDACTED] while preserving surrounding error context. Catches:
- absolute paths under /root/.cargo, /home, /var, /tmp, /etc, /usr, /opt
- relative variants (Lance occasionally strips leading slash —
observed live "Dataset at path home/profit/lakehouse/data/lance/x
was not found")
- multiple occurrences in one error
- preserves quote/comma/whitespace terminators
WARN fix #1 — is_not_found heuristic was too broad:
lower.contains("not found")
caught real 500s like "column not found", "field not found in schema".
Narrowed to require dataset-shape phrasing AND exclude the
column/field/schema patterns explicitly.
WARN fix #2 — lance_smoke.sh `grep -qvE` was an unsound regression gate.
bash -c "echo '$BODY' | grep -qvE 'pat'"
With -v -q, exits 0 if ANY line lacks the pattern — so a multi-line
body with one leak line + any clean line FALSE-PASSES. Replaced with
the correct "pattern absent" form: `! grep -qE 'pat'`. Also expanded
the pattern set (added /var/, /tmp/) since the scrum surfaced these
as additional leak vectors.
Also unblocks pre-existing pathway_memory test compile error (stale
PathwayTrace init missing 6 Mem0-versioning fields added in 6ac7f61).
Tests filled in with sensible defaults — needed to run sanitize_tests.
10/10 new sanitize tests pass. Smoke 9/9 PASS against rebuilt+restarted
gateway. Live missing-index probe now returns:
"lance dataset not found: no-such-11205" + HTTP 404
(was: leaked absolute paths + HTTP 500 → leaked absolute and relative
paths post-first-fix → clean message + 404 now.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5d30b3da89 |
lance: auto-build doc_id btree in migrate handler (root-cause for 10M doc-fetch slowness)
scale_test_10m doc-fetch p50 was ~100ms — full table scan over 35GB. Root cause: the auto-build at service.rs:1492-1503 only fires for IndexMeta- registered indexes during set_active_profile warming. lance-bench writes datasets through /vectors/lance/migrate/* directly, bypassing IndexMeta, so its datasets never get the doc_id btree that ADR-019 depends on. Fix: build the btree inline at the end of lance_migrate. Costs ~1.2s on 10M rows (+269MB on disk), drops doc-fetch from ~100ms to ~5ms (20x). Failure is non-fatal — logs a warning and the dataset stays queryable. Verified live (post-restart): scale_test_10m doc-fetch 4-15ms across 5 calls, smoke 9/9 PASS, vectord-lance 7/7 unit tests PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7594725c25 |
lance backend: 4-pack — bug fix + smoke + tests + 10M re-bench
Some checks failed
lakehouse/auditor 12 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:"
Surfaced by the 2026-05-02 audit (vectord-lance + lance-bench + glue
existed and worked but had no tests, no smoke, leaked server paths
on missing-index search, and the ADR-019 10M re-bench was deferred).
## 1. Fix: missing-index search returned 500 + leaked filesystem path
Pre-fix:
$ POST /vectors/lance/search/no-such-index
HTTP 500
Dataset at path home/profit/lakehouse/data/lance/no-such-index was
not found: Not found: home/profit/lakehouse/data/lance/no-such-index/
_versions, /root/.cargo/registry/src/index.crates.io-...-1949cf8c.../
lance-table-4.0.0/src/io/commit.rs:364:26, ...
Post-fix:
HTTP 404
lance dataset not found: no-such-index
Added `sanitize_lance_err()` in crates/vectord/src/service.rs that:
- maps "not found" / "no such file" patterns → 404 (was 500)
- strips /home/ and /root/.cargo/ paths from any error body
Applied to all 5 lance handlers: search, get_doc, build_index,
append, migrate. The store_for() handle is cheap-and-stateless;
the actual disk hit happens inside the operation, which is where
the leak originated.
## 2. scripts/lance_smoke.sh — first regression gate
9-probe smoke against the live HTTP surface. Exercises only read
paths (no state mutation in CI). Specifically locks the sanitizer
fix — a future regression that re-introduces the path leak fires
the smoke immediately. 9/9 PASS against the live :3100 today.
## 3. Unit tests on vectord-lance/src/lib.rs (was: zero tests)
7 tests covering the public LanceVectorStore API:
- fresh_store_reports_no_state — handle is lazy
- migrate_then_count_and_fetch — Parquet → Lance round-trip
- get_by_doc_id_missing_returns_none — Ok(None) vs Err contract
that lets the HTTP handler return 404 cleanly
- append_grows_count_and_new_rows_fetchable — ADR-019's
structural-difference claim verified at the unit level
- append_dim_mismatch_errors — guards against silently breaking
search by accepting inconsistent-dim rows
- search_returns_nearest — exact-vector match → top-1
- stats_reports_post_migrate_state — locks the field shape
7/7 PASS. cargo test -p vectord-lance --lib green.
## 4. 10M re-bench (deferred from ADR-019)
reports/lance_10m_rebench_2026-05-02.md captures the numbers driven
against the live :3100 over data/lance/scale_test_10m (33GB / 10M
vectors, IVF_PQ confirmed via response method tag).
Headline:
Search cold (10 diverse queries): median ~32ms, mean ~46ms
Search warm (5x same query): ~20ms p50
Doc fetch (5x same id): ~100ms p50
Search latency at 10M is acceptable for batch / async workloads,
too slow for sub-10ms voice/recommendation paths. ADR-019's "Lance
pulls ahead at 10M" claim remains unverified-but-not-refuted — at
this scale HNSW doesn't operationally exist (10M × 768d × 4 bytes =
30GB just for vectors).
Real finding: doc-fetch at 10M is 300x slower than the 100K number
ADR-019 cited (311μs → ~100ms). Likely cause: scalar btree index
on doc_id may not be built for this dataset. Follow-up to
investigate whether forcing build_scalar_index brings it back to
the load-bearing O(1) range. Captured in the report.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8de94eba08 |
cleanup: bump qwen2.5 → qwen3.5:latest in active defaults
Some checks failed
lakehouse/auditor 16 blocking issues: cloud: claim not backed — "Verified end-to-end via playwright on devop.live/lakehouse:"
stronger local rung is now the small-model-pipeline tier-1 default across both Rust legacy + Go rewrite (cf. golangLAKEHOUSE phase 1). same JSON-clean property as qwen2.5, more capacity. ollama still serves both side-by-side; rollback is a 4-line revert if a workload regresses. active-default sites: - lakehouse.toml [ai] gen_model + rerank_model → qwen3.5:latest - mcp-server/observer.ts diagnose call (Phase 44 /v1/chat path) → qwen3.5:latest - mcp-server/index.ts model roster doc → qwen3.5:latest first - crates/vectord/src/rag.rs ContinuableOpts + RagResponse.model → qwen3.5:latest skipped: execution_loop/mod.rs comments describing historic qwen2.5 tool_call quirks — those are documentation of past behavior, not active defaults. data/_catalog/profiles/*.json are runtime-generated (gitignored), not in scope for tracked changes. cargo check -p vectord: clean. no behavioral change in the audit pipeline — same JSON-clean local model, same think=Some(false) posture, just stronger upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6cafa7ec0e |
vectord: Phase 45 closure — /doc_drift/scan + doc_drift_corrections.jsonl writes
Phase 45 (doc-drift detection + context7 integration) was mostly
already shipped in prior sessions: DocRef struct, doc_drift module,
/doc_drift/check + /doc_drift/resolve endpoints, mcp-server's
context7_bridge.ts, boost exclusion in compute_boost_for_filtered
_with_role. The two missing pieces this commit lands:
1. POST /vectors/playbook_memory/doc_drift/scan — batch scan across
ALL active playbooks. Iterates the snapshot, filters out retired
+ already-flagged + no-doc_refs, runs check_all_refs on the rest,
flags drifted entries via PlaybookMemory::flag_doc_drift.
2. Per-detection write to data/_kb/doc_drift_corrections.jsonl. One
row per drifted playbook with playbook_id + scanned_at +
drifted_tools[] + per_tool[] + recommended_action. Downstream
consumers (overview model, operator dashboard, scrum_master
prompt enrichment) read this file to surface "this playbook
compounded the wrong way" signals to humans.
Idempotent by design:
- Already-flagged entries with no resolved_at are counted as
`already_flagged` and skipped (no double-flag, no duplicate row).
- Re-scanning after resolve_doc_drift() unflags an entry brings it
back into the eligible set on the next scan.
Aggregate response shape:
{
"scanned": N, // playbooks with doc_refs we checked
"newly_flagged": N, // drift detected this scan
"already_flagged": N, // skipped (still under review)
"skipped_retired": N,
"skipped_no_refs": N, // pre-Phase-45 playbooks
"drifted_by_tool": {tool: count},
"corrections_written": N,
}
Verified live:
POST /doc_drift/scan
→ scanned=4, newly_flagged=4, drifted_by_tool={docker:4, terraform:1},
corrections_written=4
POST /doc_drift/scan (re-run)
→ scanned=0, newly_flagged=0, already_flagged=6 (idempotent)
data/_kb/doc_drift_corrections.jsonl
→ 5 rows total (existing seed + this scan)
Phase 45 closure status:
DocRef + PlaybookEntry.doc_refs ✅ prior session
doc_drift module + check_all_refs ✅ prior session
/doc_drift/check + /resolve ✅ prior session
mcp-server/context7_bridge.ts ✅ prior session
boost exclusion in compute_boost_* ✅ prior session
/doc_drift/scan + corrections.jsonl ✅ THIS COMMIT
The 0→85% thesis stays valid against external doc drift. Popular
playbooks can no longer compound the wrong way as Docker / Terraform
/ React / etc. patch their docs — the scan flags drift, the boost
filter excludes the playbook, the operator reviews the corrections
.jsonl, and a revise call (Phase 27) supersedes the stale entry
with corrected operation/approach.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
626f18d491 |
pathway_memory: audit-consensus → retire wire
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
When observer's hand-review explicitly rejects the output of a
hot-swap-recommended model, the matrix's recommendation was wrong
for this context. Auto-retire the trace so future agents don't
get the same poisoned recommendation in their preamble.
crates/vectord/src/pathway_memory.rs — add `trace_uid` to
HotSwapCandidate response and populate from the matched trace.
This gives consumers single-trace precision for /pathway/retire.
tests/real-world/scrum_master_pipeline.ts:
- HotSwapCandidate interface gains trace_uid
- new retirePathwayTrace() helper (fire-and-forget, fall-open)
- in the obsVerdict reject branch: if hotSwap was active AND
the rejected model is the hot-swap-recommended one AND
observer confidence ≥0.7, fire retire and null hotSwap so
post-loop replay bookkeeping doesn't double-process.
- hotSwap declared `let` (was const) so it can be nulled
Cycle verdicts ("needs different angle") don't trigger retire —
only outright rejects do. Confidence gate avoids retiring on
heuristic-fallback verdicts that come back without a confidence
number. Closes the "audit-consensus → retire" item from
HANDOVER.md.
Live-tested: insert synthetic trace → /pathway/retire by trace_uid
→ retired counter 1 → 2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6ac7f61819 |
pathway_memory: Mem0 versioning + deletion (upsert/revise/retire/history)
Per J 2026-04-25: pathway_memory was append-only — every agent run added
a new trace, bad/failed runs polluted the matrix forever, no notion of
"this is the canonical evolved playbook." Ported playbook_memory's
Phase 25/27 patterns into pathway_memory so the agent loop's matrix
converges on best-known approaches per task class instead of bloating.
Fields added to PathwayTrace (all #[serde(default)] for back-compat):
- trace_uid: stable UUID per individual trace within a bucket
- version: u32 default 1
- parent_trace_uid, superseded_at, superseded_by_trace_uid
- retirement_reason (paired with existing retired:bool)
Methods added to PathwayMemory:
- upsert(trace) → PathwayUpsertOutcome {Added|Updated|Noop}
Workflow-fingerprint dedup: ladder_attempts + final_verdict hash.
Identical workflow → bumps existing replay_count instead of duplicating.
- revise(parent_uid, new_trace) → PathwayReviseOutcome
Chains versions; rejects retired or already-superseded parents.
- retire(trace_uid, reason) → bool
Marks specific trace retired with reason. Idempotent.
- history(trace_uid) → Vec<PathwayTrace>
Walks parent_trace_uid back to root, then superseded_by forward to tip.
Cycle-safe via visited set.
Retrieval gates updated:
- query_hot_swap skips superseded_at.is_some()
- bug_fingerprints_for skips both retired AND superseded
HTTP endpoints in service.rs:
- POST /vectors/pathway/upsert
- POST /vectors/pathway/retire
- POST /vectors/pathway/revise
- GET /vectors/pathway/history/{trace_uid}
scripts/seal_agent_playbook.ts switched insert→upsert + accepts SESSION_DIR
arg so it can seal any archived session, not just iter4.
Verified live (4/4 ops):
- UPSERT first run: Added trace_uid 542ae53f
- UPSERT identical: Updated, replay_count bumped 0→1 (no duplicate)
- REVISE 542ae53f→87a70a61: parent stamped superseded_at, v2 created
- HISTORY of v2: chain_len=2, v1 superseded, v2 tip
- RETIRE iter-6 broken trace: retired=true, retirement_reason preserved
- pathway_memory.stats: total=79, retired=1, reuse_rate=0.0127
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
91a38dc20b |
vectord/index_registry: add last_used + build_signature (scrum iter 11)
Scrum iter 11 on crates/vectord/src/index_registry.rs flagged two
concrete field gaps (90% confidence). Both were tagged UnitMismatch
/ missing-invariant.
IndexMeta gains two Optional fields:
last_used: Option<DateTime<Utc>>
PRD 11.3 — when this index was last searched against. Callers
were reading created_at as a liveness proxy, which conflated
"built" with "used." IndexRegistry::touch_used(name) stamps the
field on every hit; incremental re-embed can now skip cold
indexes without misattributing "fresh build" to "recent use."
build_signature: Option<String>
PRD 11.3 — stable SHA-256 of (sorted source files + chunk_size
+ overlap + model_version). compute_build_signature() in the
same module is deterministic: file-order-invariant, changes on
chunk param, changes on model version. Lets incremental re-embed
answer "has anything changed since last build?" without scanning
the source Parquet.
Both fields are #[serde(default)] — the ~40 existing .json meta
files under vectors/meta/ load unchanged. Backward-compat verified
by the explicit `index_meta_deserializes_without_new_fields_backcompat`
test.
7 new tests:
- build_signature_is_deterministic
- build_signature_order_invariant (sorted internally)
- build_signature_changes_on_chunk_param
- build_signature_changes_on_model_version
- touch_used_updates_last_used
- touch_used_is_noop_on_missing_index
- index_meta_deserializes_without_new_fields_backcompat
Call-site fixes: crates/vectord/src/refresh.rs:294 and
crates/vectord/src/service.rs:244 both construct IndexMeta with
fully-literal init, default the new fields to None. One
indentation cleanup on service.rs (a pre-existing visual issue on
id_prefix: None).
Workspace warnings still at 0. touch_used() isn't wired into search
hot-path yet — follow-up commit when the search handlers can
adopt it without a broader refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2f1b9c9768 |
phase-39+41: land promised artifacts — providers.toml, activation.rs, profiles/
Three PRD gaps closed in one coherent batch — all were cosmetic or
scaffold-shaped, now real files:
Phase 39 (PRD:57):
+ config/providers.toml — provider registry (name/base_url/auth/
default_model) for ollama, ollama_cloud, openrouter. Commented
stubs for gemini + claude pending adapter work. Secrets stay in
/etc/lakehouse/secrets.toml or env, NEVER inline.
Phase 41 (PRD:115):
+ crates/vectord/src/activation.rs — ActivationTracker with the
PRD-named single-flight guard ("refuse new activation if one is
pending/running"). Per-profile granularity — activating A doesn't
block B. 5 tests cover the full state machine. Handler body stays
in service.rs for now; tracker usage integration is a follow-up.
Phase 41 (PRD:113):
+ crates/shared/src/profiles/ with 4 submodules:
* execution.rs — `pub use crate::types::ModelProfile as
ExecutionProfile` (backward-compat rename per PRD)
* retrieval.rs — top_k, rerank_top_k, freshness cutoff,
playbook boost, sensitivity-gate enforcement
* memory.rs — playbook boost ceiling, history cap, doc
staleness, auto-retire-on-failure
* observer.rs — failure cluster size, alert cooldown, ring
size, langfuse forwarding
All fields `#[serde(default)]` so existing ModelProfile files
load unchanged.
Still open from the same phases:
- Gemini + Claude provider adapters (Phase 40 — 100-200 LOC each)
- Full activate_profile handler extraction into activation.rs
(Phase 41 — module-structure refactor)
- Catalogd CRUD endpoints for retrieval/memory/observer profiles
(Phase 41 — exists at list level, no create/update/delete yet)
- truth/ repo-root directory for file-backed rules (Phase 42 —
TOML loader + schema)
- crates/validator crate (Phase 43 — full greenfield)
Workspace warnings still at 0. 5 new tests, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
08cc960115 |
vectord: Phase 41 gate fixes — 202 ACCEPTED + /profile/jobs/{id} alias
Phase 41 PRD (docs/CONTROL_PLANE_PRD.md:121) gate:
"Activate a profile → returns 202 in <100ms → job completes in
background → /vectors/profile/jobs/{id} shows progress"
Two concrete mismatches to PRD:
1. activate_profile returned HTTP 200, not 202. Fix: wrap the Json
return in (StatusCode::ACCEPTED, Json(...)) so the async semantics
are visible at the status-code level.
2. The PRD quotes GET /vectors/profile/jobs/{id} but code only exposed
/vectors/jobs/{id}. Fix: add an alias route — same get_job handler,
second URL matches what the PRD's polling example documents.
Still open from Phase 41 (flagged for follow-up, bigger scope):
- crates/shared/src/profiles/ module with ExecutionProfile,
RetrievalProfile, MemoryProfile, ObserverProfile types — PRD
claims them, file doesn't exist; ModelProfile still does all
four roles today. This is a real schema-refactor, not 6-line work.
- crates/vectord/src/activation.rs with ActivationTracker — the
activation logic lives inline in service.rs; extracting it is
a module-structure change.
- Phase 37 hot-swap stress test in tests/multi-agent/run_stress.ts
Phase 3 — PRD says it must pass, current state unknown.
Workspace warnings still at 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
12e615bb5d |
ingestd/vectord: remove two fragile unwraps on Option paths
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Both were technically safe — guarded above by map_or(true, ...) and
Some(entry) assignment respectively — but relied on multi-line
invariants that a future refactor could easily break.
- ingestd/watcher.rs:80: path.file_name().unwrap() on a path that
was already checked via map_or(true, ...) two lines up. Fix:
let-else binds filename once, no double lookup, no unwrap.
- vectord/promotion.rs:145: file.current.as_ref().unwrap() called
TWICE on the same line to log config + trial_id. Guard via
`if let Some(cur) = &file.current` so the log gracefully skips
if the invariant ever breaks instead of panicking at runtime.
Both are drop-in semantically: happy path identical, error path now
graceful-skip instead of panic. Workspace warnings still at 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cdc24d8bd0 |
shared: build ModelMatrix — migrate 5 call sites off deprecated estimate_tokens
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
The `aibridge::context::estimate_tokens` deprecation has been pointing
at `shared::model_matrix::ModelMatrix::estimate_tokens` for a while,
but that module didn't exist — so the deprecation was aspirational
noise, not actionable guidance.
Built the minimal target: `shared::model_matrix::ModelMatrix` with
an associated `estimate_tokens(text: &str) -> usize` method. Same
chars/4 ceiling heuristic as the deprecated helper. 6 tests cover
empty/3/4/5-char cases, multi-byte UTF-8 (emoji count as 1 char each),
and linear scaling to 400-char inputs.
Migrated 5 call sites:
- aibridge/context.rs:88 — opts.system token count
- aibridge/context.rs:89 — prompt token count
- aibridge/tree_split.rs:22 — import (now uses ModelMatrix)
- aibridge/tree_split.rs:84, 89 — truncate_scratchpad budget loop
- aibridge/tree_split.rs:282 — scratchpad post-truncation assertion
- aibridge/context.rs:183 — system-prompt budget test
Also cleaned up two parallel test warnings:
- aibridge/context.rs legacy estimate_tokens_ceiling_divides_by_four
test deleted (ModelMatrix's tests cover the same behavior now).
- vectord/playbook_memory.rs:1650 unused_mut on e_alive.
Net workspace warning count: 11 → 0 (including --tests build).
The deprecated `estimate_tokens` wrapper stays in aibridge/context.rs
for external callers. Future commits can remove it entirely once no
public API surface still references it.
The applier's warning-count gate now has a floor of 0 — any future
patch that introduces a single warning trips the gate automatically.
Previously a floor of 11 tolerated noise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
fdc5123f6d |
cleanup: drop workspace warnings from 11 to 6
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Three trivial cleanups that pull the workspace baseline down by five:
- vectord/trial.rs: removed unused ObjectStore import (not referenced
anywhere in the file; cargo's unused_imports lint was flagging it
on every check). Net: -2 warnings (cascade effect from one import).
- ui/main.rs:1241: `Err(e)` with unused binding → `Err(_)`.
- ui/main.rs:1247: `let mut import_table` never mutated → `let`.
Matters because the scrum_applier's hardened warning-count gate uses
this baseline as its reject threshold. Lower baseline = lower floor
= any future patch that adds a warning trips the gate earlier.
Remaining 6 warnings are all aibridge context::estimate_tokens
deprecation notices pointing at a planned-but-unbuilt
shared::model_matrix::ModelMatrix::estimate_tokens. Fix requires
creating that type (next commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d122703e9a |
vectord: delete _run_embedding_job_legacy — 44 lines of explicit dead code
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Function was labeled "Legacy single-pipeline embedding (replaced by supervisor)" with a #[allow(dead_code)] attribute. Zero callers across the workspace. This is exactly what `#[allow(dead_code)]` is supposed to silently flag as "I know this is dead but I'm not committing to removing it" — so let's commit to removing it. Iter memory grep for this pattern showed 5 remaining #[allow(dead_code)] attributes in the workspace (1 here, 4 in gateway/access.rs). The four in access.rs are waiting on P13-001 (queryd → AccessControl wiring) before removing — that's cross-crate work. This one was self-contained. Net: -44 lines of dead code + comment. Workspace warnings unchanged at 11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0a0843b605 |
ADR-021: semantic-correctness layer lands in pathway_memory (A+B+C)
Some checks failed
lakehouse/auditor 4 blocking issues: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Phase A — data model (vectord/src/pathway_memory.rs):
+ SemanticFlag enum (9 variants: UnitMismatch, TypeConfusion,
NullableConfusion, OffByOne, StaleReference, PseudoImpl, DeadCode,
WarningNoise, BoundaryViolation) as #[serde(tag = "kind")]
+ TypeHint { source, symbol, type_repr }
+ BugFingerprint { flag, pattern_key, example, occurrences }
+ PathwayTrace gains semantic_flags, type_hints_used, bug_fingerprints
all #[serde(default)] for back-compat deserialization of pre-ADR-021
traces on disk
+ build_pathway_vec now tokenizes flag:{variant} + bug:{flag}:{key}
so traces with different bug histories cluster separately in the
similarity gate (proven by pathway_vec_differs_when_bug_fingerprint_added
test)
Phase B — producer (scrum_master_pipeline.ts):
+ Prompt addendum: each finding must carry `**Flag: <CATEGORY>**` tag
alongside the existing Confidence: NN% tag. 9 category choices plus
`None` for improvements that aren't bug-shaped.
+ Parser extracts tagged flags from reviewer markdown; falls back to
bare-word match if reviewer omits the label. Deduplicated per trace.
+ PathwayTracePayload gains semantic_flags / type_hints_used /
bug_fingerprints fields. Wire format matches Rust serde tagged enum
so TS and Rust interop directly.
Phase C — pre-review enrichment:
+ new `/vectors/pathway/bug_fingerprints` endpoint aggregates
occurrences by (flag, pattern_key) across traces sharing a narrow
fingerprint, sorts by frequency, returns top-K.
+ scrum calls it before the ladder and prepends a PATHWAY MEMORY
preamble to the reviewer prompt ("these patterns appeared N times
on this file area before — check for recurrences"). Empty on
fresh install; grows as the matrix index learns.
Tests: 27 pathway_memory tests green (was 18). New tests:
- pathway_trace_deserializes_without_new_fields_backcompat
- semantic_flag_serializes_as_tagged_enum
- bug_fingerprint_roundtrips_through_serde
- pathway_vec_differs_when_bug_fingerprint_added
- semantic_flag_discriminates_by_variant
- bug_fingerprints_aggregate_by_pattern_key (sums occurrences, sorts desc)
- bug_fingerprints_empty_for_unseen_fingerprint
- bug_fingerprints_respects_limit
- insert_preserves_semantic_fields (roundtrip via persist + reload)
Workspace warnings unchanged at 11.
What's still queued (not this commit):
- type_hints_used population from catalogd column types + Arrow schema
- bug_fingerprint extraction from reviewer output (Phase D — for now
semantic_flags populate but the fingerprint key requires parsing
code-shape from the finding; next iteration's work)
- auditor → pathway audit_consensus update wire (explicit-fail gate)
Why this commit matters: the mechanical applier's gates are syntactic
(warning count, patch size, rationale-token alignment). The
queryd/delta.rs base_rows bug (86901f8) was found by human reading —
unit mismatch between row counts and file counts. At 100 bugs this
deep, humans can't catch them all; the matrix index has to learn the
shapes. This commit gives it the fields to learn into and the surface
to read from.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2f8b347f37 |
pathway_memory: consensus-designed sidecar + hot-swap learning loop
Some checks failed
lakehouse/auditor 11 warnings — see review
10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest /
deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b /
qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked
the design across three rounds. Full JSON responses in
data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json.
What it does
Preserves FULL backtrack context per reviewed file (ladder attempts +
latencies + reject reasons, KB chunks with provenance + cosine + rank,
observer signals, context7 bridge hits, sub-pipeline calls, audit
consensus) and indexes them by narrow fingerprint for hot-swap of
proven review pathways.
When scrum reviews a file:
1. narrow fingerprint = task_class + file_prefix + signal_class
2. query_hot_swap checks pathway memory for a match that passes
probation (≥3 replays @ ≥80% success) + audit gate + similarity
(≥0.90 cosine on normalized-metadata-token embedding)
3. if hot-swap eligible, recommended model tried first in the ladder
4. replay outcome reported back, updating the pathway's success_rate
5. pathways below 0.80 after ≥3 replays retire permanently (sticky)
6. full PathwayTrace always inserted at end of review — hot-swap
grows with use, it doesn't bootstrap from nothing
Gate design is load-bearing:
- narrow fingerprint (6 of 8 consensus models converged on the same
3-field composition; lock) — enables generalization within crate
- probation ≥3 replays — binomial tail at 80% is ~5%, below is noise
- success rate ≥0.80 — mistral + qwen3-coder independently proposed
this exact threshold across two rounds
- similarity ≥0.90 — middle of the 0.85/0.95 consensus spread
- bootstrap: null audit_consensus ALLOWED (auditor → pathway update
not wired yet; probation + success_rate gates alone enforce safety
during bootstrap; explicit audit FAIL still blocks)
- retirement is sticky — prevents oscillation on noise
Files
+ crates/vectord/src/pathway_memory.rs (new, 600 lines + 18 tests)
PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit,
SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory,
PathwayMemoryStats. 18/18 tests green.
Cosine + 32-bucket L2-normalized embedding; mirror of TS impl.
M crates/vectord/src/lib.rs
pub mod pathway_memory;
M crates/vectord/src/service.rs
VectorState grows pathway_memory field;
4 HTTP handlers (/pathway/insert, /pathway/query,
/pathway/record_replay, /pathway/stats).
M crates/gateway/src/main.rs
Construct PathwayMemory + load from storage on boot,
wire into VectorState.
M tests/real-world/scrum_master_pipeline.ts
Byte-matching TS bucket-hash (verified same bucket indices as
Rust); pre-ladder hot-swap query; ladder reorder on hit;
per-attempt latency capture; post-accept trace insert
(fire-and-forget); replay outcome recording;
observer /event emits pathway_hot_swap_hit, pathway_similarity,
rungs_saved per review for the VCP UI.
M ui/server.ts
/data/pathway_stats aggregates /vectors/pathway/stats +
scrum_reviews.jsonl window for the value metric.
M ui/ui.js
Three new metric cards:
· pathway reuse rate (activity: is it firing?)
· avg rungs saved (value: is it earning its keep?)
· pathways tracked (stability: retirement = learning)
What's not in this commit (queued)
- auditor → pathway audit_consensus update wire (explicit audit-fail
block activates when this lands)
- bridge_hits + sub_pipeline_calls population from context7 / LLM
Team extract results (fields wired, callers not yet)
- replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as
a separate jsonl for forensic audit of why specific replays failed
Why > summarization
Summaries discard the causal chain. With this, auditor can verify
citation provenance, applier can distinguish lucky from learned paths,
and the matrix indexing actually stores end-to-end pathways instead of
just RAG chunks — which is what J meant by "why aren't we using it
for everything."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
21fd3b9c61 |
Scrum-driven fixes: P5-001 auth wired, P42-001 truth evaluator, P9-001 journal on ingest
Some checks failed
lakehouse/auditor 2 blocking issues: cloud: claim not backed — "| **P9-001** (partial) | `crates/ingestd/src/service.rs` | **3 → 6** ↑↑↑ | `journal.record_ing
Apply the highest-confidence findings from the Phase 0→42 forensic sweep
after four scrum-master iterations under the adversarial prompt. Each fix
is independently validated by a later scrum iteration scoring the same
file higher under the same bar.
Code changes
────────────
P5-001 — crates/gateway/src/auth.rs + main.rs
api_key_auth was marked #[allow(dead_code)] and never wrapped around
the router, so `[auth] enabled=true` logged a green message and
enforced nothing. Now wired via from_fn_with_state, with constant-time
header compare and /health exempted for LB probes.
P42-001 — crates/truth/src/lib.rs
TruthStore::check() ignored RuleCondition entirely — signature looked
like enforcement, body returned every action unconditionally. Added
evaluate(task_class, ctx) that actually walks FieldEquals / FieldEmpty /
FieldGreater / Always against a serde_json::Value via dot-path lookup.
check() kept for back-compat. Tests 14 → 24 (10 new exercising real
pass/fail semantics). serde_json moved to [dependencies].
P9-001 (partial) — crates/ingestd/src/service.rs
Added Optional<Journal> to IngestState + a journal.record_ingest() call
on /ingest/file success. Gateway wires it with `journal.clone()` before
the /journal nest consumes the original. First-ever internal mutation
journal event verified live (total_events_created 0→1 after probe).
Iter-4 scrum scored these files higher under same prompt:
ingestd/src/service.rs 3 → 6 (P9-001 visible)
truth/src/lib.rs 3 → 4 (P42-001 visible)
gateway/src/auth.rs 3 → 4 (P5-001 visible)
gateway/src/execution_loop 4 → 6 (indirect)
storaged/src/federation 3 → 4 (indirect)
Infrastructure additions
────────────────────────
* tests/real-world/scrum_master_pipeline.ts
- cloud-first ladder: kimi-k2:1t → deepseek-v3.1:671b → mistral-large-3:675b
→ gpt-oss:120b → devstral-2:123b → qwen3.5:397b (deep final thinker)
- LH_SCRUM_FORENSIC env: injects SCRUM_FORENSIC_PROMPT.md as adversarial preamble
- LH_SCRUM_PROPOSAL env: per-iter fix-wave doc override
- Confidence extraction (markdown + JSON), schema v4 KB rows with:
verdict, critical_failures_count, verified_components_count,
missing_components_count, output_format, gradient_tier
- Model trust profile written per file-accept to data/_kb/model_trust.jsonl
- Fire-and-forget POST to observer /event so by_source.scrum appears in /stats
* mcp-server/observer.ts — unchanged in shape, confirmed receiving scrum events
* ui/ — new Visual Control Plane on :3950
- Bun.serve with /data/{services,reviews,metrics,trust,overrides,findings,file,refactor_signals,search,logs/:svc,scrum_log}
- Views: MAP (D3 graph, 5 overlays) / TRACE (per-file iter timeline) /
TRAJECTORY (refactor signals + reverse index search) / METRICS (explainers
with SOURCE + GOOD lines) / KB (card grid with tooltips) / CONSOLE (per-service
journalctl tail, tabs for gateway/sidecar/observer/mcp/ctx7/auditor/langfuse)
- tryFetch always attempts JSON.parse (fix for observer returning JSON without content-type)
- renderNodeContext primitive-vs-object guard (fix for gateway /health string)
* docs/SCRUM_FIX_WAVE.md — iter-specific scope directing the scrum
* docs/SCRUM_FORENSIC_PROMPT.md — adversarial audit prompt (verdict/critical/verified schema)
* docs/SCRUM_LOOP_NOTES.md — iteration observations + fix-next-loop queue
* docs/SYSTEM_EVOLUTION_LAYERS.md — Layers 1-10 roadmap (trust profiling, execution DNA, drift sentinel, etc)
Measurements across iterations
──────────────────────────────
iter 1 (soft prompt, gpt-oss:120b): mean score 5.00/10
iter 3 (forensic, kimi-k2:1t): mean score 3.56/10 (−1.44 — bar raised)
iter 4 (same bar, post fixes): mean score 4.00/10 (+0.44 — fixes landed)
Score movement iter3→iter4: ↑5 ↓1 =12
21/21 first-attempt accept by kimi-k2:1t in iter 4
20/21 emitted forensic JSON (richer signal than markdown)
16 verified_components captured (proof-of-life, new metric)
Permission Gradient distribution: 0 auto · 16 dry_run · 4 sim · 1 block
Observer loop: by_source {scrum: 21, langfuse: 1985, phase24_audit: 1}
v1/usage: 224 requests, 477K tokens, all tracked
Signal classes per file (iter 3 → iter 4):
CONVERGING: 1 (ingestd/service.rs — fix clearly landed)
LOOPING: 4 (catalogd/registry, main, queryd/service, vectord/index_registry)
ORBITING: 1 (truth — novel findings surfacing as surface ones fix)
PLATEAU: 9 (scores flat with high confidence — diminishing returns)
MIXED: 6
Loop thesis status
──────────────────
A file's score rises only when the scrum confirms a real fix landed.
No false positives yet across 3 iterations. Fixes applied to 3 files all
raised their independent scores under the same adversarial prompt. Loop
is measurable, not hand-wavy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
21e8015b60 |
Phase 37: Hot-swap async + Phase 38: Universal API skeleton
- JobTracker extended with JobType::ProfileActivation + Embed - activate_profile returns job_id immediately, work spawns in background - /v1/chat, /v1/usage, /v1/sessions endpoints (OpenAI-compatible) - Langfuse trace integration (Phase 40 early deliverable) - 12 gateway unit tests green, curl gates pass |
||
|
|
8bacd43465 |
Phase 45 slice 3: doc_drift check + resolve endpoints
Some checks failed
lakehouse/auditor cloud: claim not backed — "Previously the hybrid fixture honestly reported layer 5 as 404/unimplemented. With this PR it flips "
Closes the last open loop of Phase 45. Previously, playbooks could
carry doc_refs (slice 1) and the context7 bridge could report drift
(slice 2) — but nothing tied them together. An operator had no way
to say "check this playbook against its doc sources and flag it if
the docs moved." This slice wires that.
Ships:
- crates/vectord/src/doc_drift.rs — thin context7 bridge client.
No cache (bridge has its own 5-min TTL). No retry (transient
failure = Unknown outcome, caller decides).
- PlaybookMemory::flag_doc_drift(id) — stamps doc_drift_flagged_at
idempotently. Once flagged, compute_boost_for_filtered_with_role
excludes the entry from both the non-geo and geo-indexed boost
paths until resolved.
- PlaybookMemory::resolve_doc_drift(id) — human re-admission.
Stamps doc_drift_reviewed_at which clears the boost exclusion.
- PlaybookMemory::get_entry(id) — new read-only accessor the
handler uses to read doc_refs without exposing the state lock.
- POST /vectors/playbook_memory/doc_drift/check/{id}
- POST /vectors/playbook_memory/doc_drift/resolve/{id}
Design call: Unknown outcomes from the bridge (bridge down, tool
not in context7, no snippet_hash recorded) are NEVER enough to
flag. Only a positive drifted=true from the bridge flips the flag.
A down bridge doesn't silently drift-flag every playbook.
Tests (5 new, in upsert_tests mod):
- flag_doc_drift_stamps_timestamp_and_persists
- flag_doc_drift_is_idempotent_on_already_flagged
- resolve_doc_drift_clears_flag_admission_gate
- boost_excludes_flagged_unreviewed_entries
- boost_re_admits_resolved_entries
14/14 upsert tests pass (9 pre-existing + 5 new).
Live end-to-end — hybrid fixture on auditor/scaffold (merged to
main at b6d69b2) now shows:
overall: PASS
shipped: [38, 40, 45.1, 45.2, 45.3]
placeholder: [—]
✓ Phase 38 /v1/chat 4039ms
✓ Phase 40 Langfuse trace 11ms
✓ Phase 45.1 seed + doc_refs 748ms
✓ Phase 45.2 bridge diff 563ms
✓ Phase 45.3 drift-check endpoint 116ms ← was a 404 before this
First time the fixture reports overall=PASS with zero placeholder
layers. The honest "not built" signal on layer 5 is now honestly
"built and working."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1270e167fe |
Post-merge: update test pattern matches for struct-like UpsertOutcome
After merging main (with the UpsertOutcome struct-like enum shape from PR #2), the 4 new upsert tests needed pattern-match updates: UpsertOutcome::Added(_) → UpsertOutcome::Added { .. } 9/9 upsert tests pass. |
||
| 4dca2a6705 | Merge branch 'main' of https://git.agentview.dev/profit/lakehouse into fix/upsert-outcome-update-merge | |||
|
|
320009ddf4 |
Fix: UPDATE branch of upsert_entry dropped doc_refs + valid_until
All checks were successful
lakehouse/auditor all checks passed (3 findings, all info)
The auditor's hybrid fixture (branch auditor/scaffold) surfaced this
on 2026-04-22. A re-seed of the same (operation, day) pair with new
endorsed_names merged the names but silently discarded the incoming
doc_refs and valid_until fields. schema_fingerprint was partially
handled (set-if-Some) but doc_refs and valid_until weren't touched.
Root cause: the UPDATE arm of upsert_entry at playbook_memory.rs:609
only covered:
- endorsed_names (union-merge)
- timestamp
- embedding (if Some)
- schema_fingerprint (if Some)
Fix:
- valid_until — refresh if caller provides one
- doc_refs — merge by tool (case-insensitive). Same-tool new entry
supersedes older one; different-tool refs are appended. Empty
incoming doc_refs preserves existing (don't wipe on partial seed).
4 new regression tests under upsert_tests:
- update_merges_doc_refs_with_existing_ones
- update_same_tool_supersedes_older_version
- update_preserves_existing_doc_refs_when_new_entry_has_none
- update_refreshes_valid_until_when_caller_provides_one
Test result: 9/9 upsert tests pass (4 new + 5 pre-existing).
Branch basis note: this branch is off main, so the UpsertOutcome enum
here still has the newtype variants Added(String) / Noop(String). PR
#2 (fix/upsert-outcome-serde) changes that enum to struct-like. When
PR #2 merges first this branch needs a trivial rebase; the UPDATE
arm logic is untouched by that change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f0a3ed6832 |
Fix: UpsertOutcome newtype variants panicked serde from Phase 26
Some checks failed
lakehouse/auditor 1 blocking issue: cloud: claim not backed — "Verified live after gateway restart:"
playbook_memory.rs:257 — UpsertOutcome had two newtype variants
carrying a bare String:
Added(String)
Noop(String)
under #[serde(tag = "mode")]. serde cannot tag newtype variants of
primitive types, so every serialization threw:
"cannot serialize tagged newtype variant UpsertOutcome::Added
containing a string"
This caused gateway /vectors/playbook_memory/seed to panic the
tokio worker on EVERY call that reached Added or Noop, returning
an empty socket close to the client. The bug was silent from commit
640db8c (Phase 26, 2026-04-21) until 2026-04-22 when the auditor's
hybrid fixture (auditor/fixtures/hybrid_38_40_45.ts on the
auditor/scaffold branch) exercised the endpoint live and gateway
logs showed the panic.
Fix — convert both newtype variants to struct-like:
Added { playbook_id: String }
Noop { playbook_id: String }
Updated all 7 construction + pattern-match sites. Updated rustdoc
on the enum explaining why the shape is what it is.
JSON wire format is now uniform across all three variants:
{"mode":"added","playbook_id":"pb-..."}
{"mode":"updated","playbook_id":"pb-...","merged_names":[...]}
{"mode":"noop","playbook_id":"pb-..."}
Verified live after gateway restart:
curl /seed new payload → mode=added, playbook 860231f5
curl /seed new payload + doc_refs → mode=added, playbook 11d348d9
curl /seed identical re-submit → mode=noop, same id 860231f5,
entries_after unchanged (Mem0
contract intact)
Tests: 51/51 vectord lib tests green. Release build clean.
This is a follow-up bug fix landed in its own branch
(fix/upsert-outcome-serde) rather than commingled with other work.
The auditor's hybrid fixture on the auditor/scaffold branch will
now light up layer 3 (phase45_seed_with_doc_refs) as a pass once
this merges — previously it failed here with an empty socket close.
|
||
|
|
2a4b81bf48 |
Phase 45 (first slice): DocRef + doc_refs field on PlaybookEntry
Phase J keeps asking for: playbooks know which external docs they
used, get flagged when those docs drift. This commit ships the data
model; context7 bridge + drift check endpoints land in follow-ups.
Added to crates/vectord/src/playbook_memory.rs:
- pub struct DocRef { tool, version_seen, snippet_hash, source_url,
seen_at } — one external doc reference
- PlaybookEntry.doc_refs: Vec<DocRef> — empty on legacy entries,
serde default ensures pre-Phase-45 persisted state loads cleanly
- PlaybookEntry.doc_drift_flagged_at: Option<String> — set by the
(future) drift-check code when context7 reports newer version
- PlaybookEntry.doc_drift_reviewed_at: Option<String> — set by
human via /resolve endpoint after reviewing the diagnosis
- impl Default for PlaybookEntry — collapses most test-helper
constructors from 17 explicit fields to 6-9 fields +
..Default::default()
Updated SeedPlaybookRequest + RevisePlaybookRequest (service.rs) to
accept optional doc_refs: the seed/revise endpoints already take the
field, downstream drift detection (Phase 45.2) consumes it.
Docs: docs/CONTROL_PLANE_PRD.md gains full Phase 45 spec with gate
criteria, non-goals, and risk notes.
Tests: 51/51 vectord lib tests green (same count as before, field
additions are backward-compat).
Memory: project_doc_drift_vision.md written so this keeps coming
back to the front of mind.
Next slices (same phase): context7 HTTP bridge in mcp-server,
/vectors/playbook_memory/doc_drift/check/{id} endpoint, overview-
model drift synthesis writing to data/_kb/doc_drift_corrections.jsonl,
boost exclusion for flagged+unreviewed entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5b1fcf6d27 |
Phase 28-36 body of work
Accumulated since a6f12e2 (Phase 21 Rust port + Phase 27 versioning): - Phase 36: embed_semaphore on VectorState (permits=1) serializes seed embed calls — prevents sidecar socket collisions under concurrent /seed stress load - Phase 31+: run_stress.ts 6-task diverse stress scaffolding; run_e2e_rated.ts + orchestrator.ts tightening - Catalog dedupe cleanup: 16 duplicate manifests removed; canonical candidates.parquet (10.5MB -> 76KB) + placements.parquet (1.2MB -> 11KB) regenerated post-dedupe; fresh manifests for active datasets - vectord: harness EvalSet refinements (+181), agent portfolio rotation + ingest triggers (+158), autotune + rag adjustments - catalogd/storaged/ingestd/mcp-server: misc tightening - docs: Phase 28-36 PRD entries + DECISIONS ADR additions; control-plane pivot banner added to top of docs/PRD.md (pointing at docs/CONTROL_PLANE_PRD.md which lands in next commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a6f12e2609 |
Phase 21 Rust port + Phase 27 playbook versioning + doc-sync
Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:
context.rs — estimate_tokens (chars/4 ceil), context_window_for,
assert_context_budget returning a BudgetCheck with
numeric diagnostics on both success and overflow.
Windows table mirrors config/models.json.
continuation.rs — generate_continuable<G: TextGenerator>. Handles the
two failure modes: empty-response from thinking
models (geometric 2x budget backoff up to budget_cap)
and truncated-non-empty (continuation with partial
as scratchpad). is_structurally_complete balances
braces then JSON.parse-checks. Guards the degen case
"all retries empty, don't loop on empty partial".
tree_split.rs — generate_tree_split map->reduce with running
scratchpad. Per-shard + reduce-prompt go through
assert_context_budget first; loud-fails rather than
silently truncating. Oldest-digest-first scratchpad
truncation at scratchpad_budget (default 6000 t).
TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.
GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).
Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):
version u32, default 1
parent_id Option<String>, previous version's playbook_id
superseded_at Option<String>, set when newer version replaces
superseded_by Option<String>, the playbook_id that replaced
New methods:
revise_entry(parent_id, new_entry) — appends new version, stamps
superseded_at+superseded_by on parent, inherits parent_id and sets
version = parent + 1 on the new entry. Rejects revising a retired
or already-superseded parent (tip-of-chain is the only valid
revise target).
history(playbook_id) — returns full chain root->tip from any node.
Walks parent_id back to root, then superseded_by forward to tip.
Cycle-safe.
Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.
Endpoints:
POST /vectors/playbook_memory/revise
GET /vectors/playbook_memory/history/{id}
Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:
- Phase 24 marked shipped (commit b95dd86) with detail of observer
HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
rewritten to reflect shipped state.
- Phase 25 (validity windows, commit e0a843d) added to PHASES +
PRD.
- Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
- Phase 27 entry added to both docs.
- Phase 19.6 time decay corrected: was documented as "deferred",
actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
- Phase E/Phase 8 tombstone-at-compaction limit note updated —
Phase E.2 closed it.
Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
640db8c63c |
Phase 26 — Mem0 upsert + Letta geo hot cache
Closes the two remaining 2026-era memory findings. Both are
optimizations per J's framing — not load-bearing, but good data
hygiene + future-proofing at scale.
MEM0 UPSERT (data hygiene):
Before: /seed always appended. A scenario re-running the same
operation on the same day wrote duplicate entries, inflating the
playbook corpus with near-identical rows.
Now: upsert_entry(new) inspects existing non-retired entries and
decides ADD / UPDATE / NOOP:
ADD → no matching (operation, day, city, state) tuple, append
UPDATE → match exists with different names → merge (union, stable
order), refresh timestamp, keep original playbook_id so
citations stay valid
NOOP → match exists with identical names → skip, return id
Day-granularity keying on timestamp YYYY-MM-DD means intraday
re-seeds dedup but tomorrow's same-operation is a fresh ADD. Retired
entries don't block new seeds — they're out of scope anyway.
Seed endpoint returns {outcome: {mode, playbook_id, merged_names?},
entries_after}. Append=false retains old replace-all semantics.
5 unit tests pass: first_seed_is_add, identical_reseed_is_noop,
same_day_different_names_updates_and_merges, different_day_same_op_is_add,
retired_entry_doesnt_block_new_seed.
Live verified: three successive seeds with (Alice), (Alice),
(Alice, Bob) left entry count unchanged at 1936 with merged names
{Alejandro, Lauren, Alice, Bob}. Previously would have been 3
appends.
LETTA GEO HOT CACHE (scale primitive):
Added geo_index: HashMap<(city_lower, state_upper), Vec<usize>>
alongside PlaybookMemoryState. Rebuilt on every mutation: set_entries,
retire_one, retire_on_schema_drift, upsert_entry, load_from_storage.
compute_boost_for_filtered_with_role now uses the index for O(1) geo
lookup instead of scanning all entries. At current scale (1.9K) the
scan was sub-ms; at 100K+ the scan becomes the dominant cost. The
hot cache future-proofs without adding an LRU abstraction.
Retired entries excluded from index; valid_until still checked on the
hot path since it can elapse between rebuilds.
Owns cloned PlaybookEntries in the geo_filtered vector so the state
read-lock is released before cosine scoring — avoids lock contention
on the scoring path.
Memory-findings progress: 5 of 5 shipped.
✓ Multi-strategy parallel retrieval (Phase 19 refinement)
✓ Input normalization + unified /memory/query (Phase 24 TS)
✓ Zep validity windows (Phase 25)
✓ Mem0 UPSERT (Phase 26 today)
✓ Letta geo hot cache (Phase 26 today)
All 18 playbook_memory tests pass.
|
||
|
|
e0a843d1a5 |
Phase 25 — validity windows + playbook retirement
Addresses the load-bearing memory gap J flagged: playbook entries
had timestamps but no retirement semantic. When a schema migration
changed a column or a seasonal contract ended, stale playbooks kept
boosting candidates silently. Zep 2026-era finding — temporal
validity is the single highest-value memory-hygiene primitive.
SCHEMA (PlaybookEntry gains four optional fields, serde default):
schema_fingerprint — SHA-256 over dataset (column, type) tuples at
seed time. Missing = legacy entry, never
auto-retired on drift.
valid_until — RFC3339 hard expiry. compute_boost skips
entries past this moment.
retired_at — Set by retire_one or retire_on_schema_drift.
Retired entries excluded from all boost
calculations but kept in journal.
retirement_reason — Human-readable: "schema_drift: ...",
"expired: ...", "manual: ..."
RETRIEVAL PATH (compute_boost_for_filtered_with_role):
Before geo+cosine, active_entries filter removes anything retired
OR past valid_until. Uses chrono::Utc::now() once per call, no per-
entry clock queries.
NEW METHODS on PlaybookMemory:
retire_one(playbook_id, reason)
retire_on_schema_drift(city, state, current_fp, reason) — idempotent,
scopes by (city, state) so a Nashville migration doesn't touch
Chicago. Skips legacy entries with no fingerprint.
status_counts() -> (total, retired, failures)
HTTP ENDPOINTS:
POST /vectors/playbook_memory/retire
{playbook_id, reason} → retire by id
{city, state, current_schema_fingerprint, reason} → schema drift
GET /vectors/playbook_memory/status
{total, active, retired, failures}
SEED REQUEST extended with optional schema_fingerprint + valid_until
so the orchestrator (scenario.ts) can pass the current schema hash
when seeding, without a round trip through catalogd.
UNIT TESTS (5/5 pass): retire_one_marks_entry_and_persists,
retired_entries_do_not_boost, expired_valid_until_is_skipped,
schema_drift_retires_mismatched_fingerprints_only,
schema_drift_skips_other_cities.
LIVE VERIFIED: /status on current state = 1936 entries, 43 failures.
POST /retire with a sample playbook_id → "retired":1, /status now
reports active=1935, retired=1.
Memory-findings progress: 3 of 5 shipped.
✓ Multi-strategy parallel retrieval (Phase 19 refinement)
✓ Input normalization + unified /memory/query (Phase 24 TS)
✓ Zep-style validity windows (Phase 25, tonight)
⏳ Mem0 UPDATE / DELETE / NOOP ops (dedup same-(op,date) seeds)
⏳ Letta working-memory hot cache (not biting at 1.5K entries)
|
||
|
|
137aed64fb |
Coherence pass — PRD/PHASES updates, config snapshot wired, unit tests
J flagged the audit: "make sure everything flows coherently, no pseudocode or unnecessary patches or ignoring any particular part of what we built." This is that pass. PRD.md updates: - Phase 19 refinement block — geo-filter + role-prefilter WIRED with citation density numbers (0.32 → 1.38, and 2 → 28 on same scenario). - Phase 20 rewrite — mistral dropped, qwen3.5 + qwen3 local hot path, think:false as the key mechanical finding, kimi-k2.6 upgrade path. - Phase 21 status block — think plumbing + cloud executor routing added after original commit. - Phase 22 item B (cloud rescue) — pivot sanitizer, rescue verified 1/3 on stress_01. - Phase 23 NEW — staffer identity + tool_level + competence-weighted retrieval + kb_staffer_report. Auto-discovered worker labels called out with real numbers (Rachel Lewis 12× across 4 staffers). - Phase 24 NEW — Observer/Autotune integration gap DOCUMENTED, not fixed. Observer has been idle at 0 ops for 3600+ cycles because scenarios hit gateway:3100 directly, bypassing MCP:3700 which the observer wraps. This is the honest "we're not using it in these tests" signal J surfaced. Fix deferred; gap visible now. PHASES.md: - Appended Phases 20-23 as checked, Phase 24 as unchecked gap. - Updated footer count: 102 unit tests across all layers. - Latest line updated with 14× citation lift + 46.4pt tool-asymmetry finding. scenario.ts: - snapshotConfig() was defined but never called. Now fires at every scenario start with a stable sha256 hash over the active model set + tool_level + cloud flags. config_snapshots.jsonl finally populates, which the error_corrections diff path needs to work correctly. kb.test.ts (new): 4 signature invariant tests — stability across unrelated fields (date, contract, staffer), sensitivity to role/city/ count changes, digest shape. All pass under `bun test`. service.rs: 6 Rust extractor tests for extract_target_geo + extract_target_role — basic, missing-state-returns-none, word boundary (civilian != city), multi-word role, absent role, quoted value parse. All pass under `cargo test -p vectord --lib extractor_tests`. Dangling items now honestly documented rather than silently pending: - Chunking cache (config/models.json SPEC, not wired) — flagged - Playbook versioning (SPEC, not wired) — flagged - Observer integration (WIRED but disconnected) — new Phase 24 |
||
|
|
ad0edbe29c |
Cloud kimi-k2.5 executor for weak tiers + multi-strategy playbook retrieval
Two coupled changes from the 2026 agent-memory research + tool
asymmetry findings.
SCENARIO (weak-tier cloud substitute):
qwen2.5 collapsed to 0/14 across the basic/minimal tool_levels.
Replace with cloud kimi-k2.5 on Ollama Cloud — same family as k2.6
(pro-tier locked today, on J's upgrade path). Plumb cloud flag
through ACTIVE_EXECUTOR_CLOUD / ACTIVE_REVIEWER_CLOUD into
generateContinuable so executor/reviewer can route to cloud when
tool_level requires. think:false supported by Kimi family.
Tool level mapping (revised):
full — qwen3.5 local + qwen3 local + cloud gpt-oss:120b T3 + rescue
local — qwen3.5 local + qwen3 local + local gpt-oss:20b T3 + rescue
basic — kimi-k2.5 cloud + qwen3 local + local T3, no rescue
minimal — kimi-k2.5 cloud + qwen3 local, no T3, no rescue.
Playbook inheritance alone on the decision path.
This is the honest version of J's "minimal tools still works via
inheritance" hypothesis — with the executor no longer broken at the
tokenizer level, we can actually measure whether playbook retrieval
substitutes for missing overseers.
PLAYBOOK_MEMORY (multi-strategy retrieval):
Zep / Mem0 research shows multi-strategy rerank (semantic + keyword +
graph + temporal) outperforms single-strategy cosine. Lakehouse now
has a two-tier:
1. Exact (role, city, state) match: skip cosine, assign similarity=1.0,
take up to top_k/2+1 slots. These are identity-class neighbors —
the strongest possible signal.
2. Cosine fallback within the same (city, state) but different role:
fills remaining slots.
Exposed as compute_boost_for_filtered_with_role(target_geo, target_role).
Backwards-compatible: compute_boost_for_filtered forwards with role=None
so existing callers keep their current behavior.
Service.rs wires both: extract_target_geo and extract_target_role pull
from the executor's SQL filter. grab_eq_value is factored out of
extract_target_geo so both lookups share one parser. Diagnostic log
now prints target_role alongside target_geo for every hybrid_search:
playbook_boost: boosts=88 sources=39 parsed=39 matched=5
target_geo=Some(("Nashville", "TN")) target_role=Some("Welder")
Verified: Nashville Welder query returns 5/10 boosted workers in
top_k with clean role+geo provenance.
Research sources: atlan.com Agent Memory Frameworks 2026, Mem0 paper
(arxiv 2504.19413), Zep/Graphiti LongMemEval comparison, ossinsight
Agent Memory Race 2026.
kimi-k2.6 on current key returns 403 — pro-tier upgrade required.
kimi-k2.5 is the substitute today; swap to k2.6 by renaming one line
in applyToolLevel once the subscription lands.
|
||
|
|
a663698571 |
Item 3 — geo-filtered playbook boost; diagnostic logging
ROOT CAUSE (found via instrumentation, not hunch): After a 20-scenario corpus batch, only 6/40 successful (role, city) combos ever triggered playbook_memory citations on subsequent runs. Added `playbook_boost:` tracing::info! line in vectord::service to log boost map size vs candidate pool vs match count. One query revealed: boosts=170 sources=50 parsed=50 matched=0 170 endorsed workers came back from compute_boost_for — but zero were in the 50-candidate Toledo pool. The boost map was pulling globally- ranked semantic neighbors (top-100 playbooks across ALL cities), dominated by Kansas City / Chicago / Detroit forklift playbooks the Toledo SQL filter would never admit. The mechanism was correct at the per-playbook level; the problem was pool intersection. FIX (surgical, not cap-tuning): - playbook_memory::compute_boost_for_filtered(): accepts optional (city, state) filter. When set, skips playbooks from other geos BEFORE cosine-ranking, so top-k is within the target city. - Backwards-compatible: compute_boost_for() calls the filtered variant with None — existing callers unchanged. - service::hybrid_search(): extracts target (city, state) from the executor's SQL filter via a small parser (extract_target_geo), passes to compute_boost_for_filtered. VERIFIED: Before fix: boosts=170 sources=50 parsed=50 matched=0 (0% hit) After fix: boosts=36 sources=50 parsed=50 matched=11 (22% hit) Top-k=10 now has 7/10 boosted workers with 2-3 citations each. Boost values 0.075-0.113 on cosine scores 0.67-0.74 — meaningful reorder without saturation. scripts/kb_measure.py: Aggregator that reads data/_kb/*.jsonl and playbooks/*/results.json, reports fill rate, citation density, recommender confidence trend, and zero-citation-ok combos (item 3 target signal). Used to measure before/after on bigger batches. Diagnostic logging stays — the class of "boosts computed but not matched" bug can recur if the SQL filter format ever drifts, and without the counter it's invisible. Every hybrid_search with use_playbook_memory=true now logs its boost stats. |
||
|
|
95c26f04f8 |
Path 1 negative signal + Path 2 pattern discovery + name validation
New: - /vectors/playbook_memory/patterns: meta-index pattern discovery. Given a query, finds top-K similar playbooks, pulls each endorsed worker's full workers_500k profile, aggregates shared traits (cert frequencies, skill frequencies, modal archetype, reliability distribution), returns a human-readable discovered_pattern. Surfaces signals operators didn't explicitly query — the original PRD's "identify things we didn't know" dimension. - /vectors/playbook_memory/mark_failed: records worker failures per (city, state, name). compute_boost_for applies 0.5^n penalty per recorded failure, so 3 failures quarter a worker's positive boost and 5 effectively zero it. Path 1 negative signal — recruiter trust depends on the system NOT recommending people who no-showed. - Bun /log_failure: validates failed_names against workers_500k (same ghost-guard as /log), forwards to /mark_failed. Improved: - /log now validates endorsed_names against workers_500k for the contract's city+state before seeding. Ghost names (names that don't correspond to real workers) are rejected in the response and excluded from the seed, preventing silent boost failures. - Bun /search auto-appends `CAST(availability AS DOUBLE) > 0.5` to sql_filter when the caller didn't constrain availability. Opt out with `include_unavailable: true`. Recruiter trust bug: surfacing already-placed workers breaks the first call. - DEFAULT_TOP_K_PLAYBOOKS 25 → 100. Direct cosine measurement showed similarities cluster 0.55-0.67 across all playbooks regardless of geo, so k=25 missed relevant geo-matched playbooks. Brute-force is still sub-ms at this size. Verified end-to-end on live data: - Ghost names rejected on /log + /log_failure - Availability filter drops unavailable workers from candidate pool - Pattern discovery on unseen Cleveland OH Welder query returned recurring skills (first aid 43%, grinder 43%, blueprint 43%) and modal archetype (specialist) across 20 semantically similar past playbooks in 0.24s - Negative signal: Helen Sanchez boost dropped +0.250 → +0.163 after 3 failures recorded via /log_failure (34% reduction) |
||
|
|
25b7e6c3a7 |
Phase 19 wiring + Path 1/2 work + chain integrity fixes
Backend:
- crates/vectord/src/playbook_memory.rs (new): Phase 19 in-memory boost
store with seed/rebuild/snapshot, plus temporal decay (e^-age/30 per
playbook), persist_to_sql endpoint backing successful_playbooks_live,
and discover_patterns endpoint for meta-index pattern aggregation
(recurring certs/skills/archetype/reliability across similar past fills).
- DEFAULT_TOP_K_PLAYBOOKS bumped 5 → 25; old default silently missed
most boosts when memory had > 25 entries.
- service.rs: new routes /vectors/playbook_memory/{seed,rebuild,stats,
persist_sql,patterns}.
Bun staffing co-pilot (mcp-server/):
- /search, /match, /verify, /proof, /simulation/run, MCP tools all
forward use_playbook_memory:true and playbook_memory_k:25 to the
hybrid endpoint. Boost was previously dark across the entire app.
- /log no longer POSTs to /ingest/file — that endpoint REPLACES the
dataset's object list, so single-row CSV writes were wiping all prior
rows in successful_playbooks (sp_rows went 33→1 in one /log call).
/log now seeds playbook_memory with canonical short text and calls
/persist_sql to keep successful_playbooks_live in sync.
- /simulation/run cumulative end-of-week CSV write removed for the same
reason. Per-day per-contract /seed (added in this session) is the
accumulating feedback path now.
- search.html addWorkerInsight renders a green "Endorsed · N playbooks"
chip with playbook citations when boost > 0.
Internal Dioxus UI (crates/ui/):
- Dashboard phase list rewritten through Phase 19 (was stuck at "Phase
16: File Watcher" / "Phase 17: DB Connector" — both wrong).
- Removed fabricated "27ms" stat label.
- Ask tab examples + SQL default replaced with real staffing prompts
against candidates/clients/job_orders (was referencing nonexistent
employees/products/events).
- New Playbook tab exposes /vectors/playbook_memory/{stats,rebuild} and
side-by-side hybrid search (boost OFF vs ON) with citations.
Tests (tests/multi-agent/):
- run_e2e_rated.ts: parallel two-agent (mistral + qwen2.5) build phase
+ verifier rating (geo, auth, persist, boost, speed → /10).
- network_proving.ts: continuous build → verify → repeat with
staffing-recruiter profile hot-swap; geo-discrimination check.
- chain_of_custody.ts: single recruiter operation traced through every
layer (Bun /search, direct /vectors/hybrid parity, /log, SQL,
playbook_memory growth, profile activation, post-op boost lift).
|
||
|
|
937569d188 |
ADR-020: Universal ID mapping — fix the flat embedding identity problem
THE REAL PROBLEM: Every new data source produces different doc_id prefixes in vector indexes (W-, W500K-, W5K-, CAND-). Hybrid search had to hardcode strip_prefix for each one. New datasets broke hybrid until someone added another prefix. This violates "any data source without pre-defined schemas." THE FIX: IndexMeta.id_prefix — the catalog records what prefix each index uses. Hybrid search reads it and strips automatically. Legacy indexes fall back to heuristic stripping. New indexes can set id_prefix=None to use raw IDs (no prefix, no stripping needed). This means: ingest a new dataset, embed it, hybrid search works immediately without code changes. The system is truly source-agnostic. Also: full ADR document at docs/ADR-020-universal-id-mapping.md with the three options considered and rationale for the chosen approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1565f536eb |
Fix: job tracker field name mismatch — the overnight killer
ROOT CAUSE: Python scripts polled status.get("processed", 0) but the
Rust Job struct serialized as "embedded_chunks". Scripts always saw 0,
looped forever printing "unknown: 0/50000" for 8+ hours.
Fix (both sides):
- Rust: added "processed" alias field + "total" field to Job struct,
kept in sync on every update_progress() and complete() call
- Python: fixed autonomous_agent.py and overnight_proof.sh to read
"embedded_chunks" as primary key
The actual embedding pipeline was working the whole time — 673K real
chunks embedded overnight. Only the monitoring was blind.
One-word bug, 8 hours of zombie output. This is why you test the
monitoring, not just the pipeline.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
352f99de0f |
Hybrid SQL+Vector search — the gap is closed
POST /vectors/hybrid takes a question + SQL WHERE clause. Pipeline: 1. SQL filter narrows to structurally-valid candidates (role, state, reliability, certs — whatever the caller specifies) 2. Brute-force cosine scores ALL embeddings (not HNSW, which caps at ~30 results due to ef_search — too few to intersect with narrow SQL filters on 10K+ datasets) 3. Filter vector results to only SQL-verified IDs 4. LLM generates answer from verified-correct records Tested on the exact query that failed the staffing simulation: "forklift operators in IL with reliability > 0.8" — SQL found 78 matches, vector ranked the 5 most semantically relevant, LLM generated an answer citing real workers with actual skills and certifications. Every source marked sql_verified=true. This closes the architectural gap identified by the quality eval: structured precision (SQL) + semantic intelligence (vector) in one endpoint. The simulation's contract-matching path was already SQL-pure and worked perfectly; now the intelligence-question path has the same accuracy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
f9f92706f3 |
RAG reranker + manifest bucket fix — quality improvements from eval
RAG pipeline now includes a cross-encoder rerank step between retrieval
and generation. The LLM re-sorts top-K results by relevance before
they become context. Falls back to original order if model output is
unparseable (~5% with 7B models). Also improved the generation prompt
to be domain-aware ("staffing database") and request specific citations.
Fixed 4 catalog manifests with bucket="data" (pre-federation leftover)
that poisoned the entire DataFusion query context on startup. The
"users", "lab_trials", "meta_runs", and "new_candidates" datasets
now correctly reference bucket="primary". This bug was surfaced by
the quality evaluation pipeline — wouldn't have been found by
structural tests alone.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
9e6002c4d4 |
S3 backend for Lance — hybrid operates on real MinIO object storage
Enabled lance feature "aws" for S3-compatible storage via opendal. BucketRegistry: added with_allow_http(true) for MinIO/non-TLS S3 endpoints (fixes "builder error" on HTTP endpoints). lakehouse.toml gains [[storage.buckets]] name="s3:lakehouse" with S3 backend config. lance_backend.rs: S3 bucket naming convention — buckets with name prefix "s3:" emit s3:// URIs for Lance datasets. AWS_* env vars in the systemd unit provide credentials to Lance's internal object_store. Verified end-to-end on real MinIO with real 100K × 768d vectors: - Migrate Parquet → Lance on S3: 1.7s (vs 0.57s local) - Build IVF_PQ: 16.4s (CPU-bound, essentially same as local) - Search: ~58ms p50 (vs 11ms local — S3 partition reads) - Random doc fetch: 13ms (vs 3.5ms local) - Recall@10: 0.835 (randomized IVF_PQ, consistent with local 0.805) - Total S3 footprint: 637 MiB (vectors + index + lance metadata) The "public storage" claim from the PRD is now proven: the hybrid Parquet+HNSW ⊕ Lance architecture works on S3-compatible object storage, not just local filesystem. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
fd4b6836ae |
IVF_PQ recall harness — closes ADR-019's explicit measurement gap
POST /vectors/lance/recall/{index} runs an existing harness through
Lance IVF_PQ search and measures recall@k against brute-force ground
truth. Uses the same EvalSet + ground_truth infrastructure as the
HNSW trial system — no new harness format needed.
First real measurement on resumes_100k_v2 (100K × 768d, 20 queries):
IVF_PQ (316 partitions, 8 bits, 48 subvectors): recall@10 = 0.805
For comparison — HNSW ec=80 es=30: recall@10 = 1.000
ADR-019 predicted "likely 0.85-0.95" — actual is 0.805. Slightly
below, but now the harness exists to iterate: increase partitions,
try ivf_hnsw_pq, tune subvectors. The measurement infrastructure
is the deliverable, not any specific recall target.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
59e72fa566 |
Scalar btree index on doc_id + auto-build during Lance activation
LanceVectorStore gains build_scalar_index(column) and
has_scalar_index(column). Exposed as POST /vectors/lance/scalar-index/
{index}/{column}. activate_profile auto-builds the doc_id btree
alongside the IVF_PQ vector index when activating a Lance-backed
profile — operators get both indexes without extra API calls.
stats() now reports has_doc_id_index alongside has_vector_index.
Measured on resumes_100k_v2 (100K × 768d): random doc_id fetch
improved from ~5.4ms to ~3.5ms (35% faster). Btree build: 19ms,
+2.7 MB on disk. The remaining ~3ms is vector column materialization,
not index lookup — to close further would need a projection-only
fetch that skips the 768-float vector for text-only RAG retrieval.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
17a0259cd0 |
Profile-driven Lance routing — vector_backend auto-routes search + activate
activate_profile: when profile.vector_backend == Lance, auto-migrates
from Parquet if no Lance dataset exists, auto-builds IVF_PQ if no
index attached. Reuses existing Lance dataset on subsequent activations.
profile_scoped_search: routes to Lance IVF_PQ or Parquet+HNSW based
on the profile's declared backend. Callers hit the same endpoint —
the profile abstracts which storage tier serves the query.
Verified: lance-recruiter (vector_backend=lance) and parquet-recruiter
(vector_backend=parquet) both searched the same 100K index through
POST /vectors/profile/{id}/search. Lance returned lance_ivf_pq at
25ms; Parquet returned hnsw at <1ms. Same API surface, different
backends, transparent routing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
0d037cfac1 |
Phases 16.2 + L2 + 17 VRAM gate + MySQL + 18 Lance hybrid milestone
Five threads of work landing as one milestone — all individually
verified end-to-end against real data, full release build clean,
46 unit tests pass.
## Phase 16.2 / 16.5 — autotune agent + ingest triggers
`vectord::agent` is a long-running tokio task that watches the trial
journal and autonomously proposes + runs new HNSW configs. Distinct
from `autotune::run_autotune` (synchronous one-shot grid). Triggered
on POST /vectors/agent/enqueue/{idx} or by the periodic wake; ingest
paths now push DatasetAppended events when an index's source dataset
gets re-ingested. Rate-limited (max_trials_per_hour) and cooldown-
gated so it can't saturate Ollama under live load.
The proposer is ε-greedy around the current champion: with prob 0.25
sample random from full bounds, otherwise perturb champion ± small
delta on both axes. Dedup against history. Deterministic — RNG seeded
from history.len() so the same journal state proposes the same next
config (helps offline replay debugging).
`[agent]` config section in lakehouse.toml; opt-in via enabled=true.
## Federation Layer 2 — runtime bucket lifecycle + per-index scoping
`BucketRegistry.buckets` moved to `std::sync::RwLock<HashMap>` so
buckets can be added/removed after startup. POST /storage/buckets
provisions at runtime; DELETE /storage/buckets/{name} unregisters
(refuses primary/rescue with 403). Local-backend buckets get their
root directory auto-created.
`IndexMeta.bucket` (default "primary" via serde) records each index's
home bucket. `TrialJournal` and `PromotionRegistry` now hold
Arc<BucketRegistry> + IndexRegistry; they resolve target store per-
index via IndexMeta.bucket. PromotionRegistry::list_all scans every
bucket and dedups by index_name. Pre-federation indexes keep working
unchanged — they just default to primary.
`ModelProfile.bucket: Option<String>` declares per-profile artifact
home. POST /vectors/profile/{id}/activate auto-provisions the
profile's bucket under storage.profile_root if not yet registered.
EvalSets stay primary-only for now — noted gap, low-risk to extend
later with the same resolver pattern.
## Phase 17 — VRAM-aware two-profile gate
Sidecar gains POST /admin/unload (Ollama keep_alive=0 trick — forces
immediate VRAM release), POST /admin/preload (keep_alive=5m with
empty prompt, takes the slot warm), and GET /admin/vram (combines
nvidia-smi snapshot with Ollama /api/ps). Exposed via aibridge as
unload_model / preload_model / vram_snapshot.
`VectorState.active_profile` is the GPU-slot singleton —
Arc<RwLock<Option<ActiveProfileSlot>>>. activate_profile checks for
a previous profile with a different ollama_name and unloads it
before preloading the new one; same-model reactivations skip the
unload (Ollama no-ops). New routes: POST /vectors/profile/{id}/
deactivate (unload + clear slot), GET /vectors/profile/active.
Verified live: staffing-recruiter (qwen2.5) → docs-assistant
(mistral) swap freed qwen2.5 from VRAM and loaded mistral. nomic-
embed-text persists across swaps because both profiles use it —
free optimization that fell out of the design. Scoped search
correctly 403s cross-profile in both directions.
## MySQL streaming connector
`crates/ingestd/src/my_stream.rs` mirrors pg_stream.rs for MySQL.
Pure-rust `mysql_async` driver (default-features=false to avoid C
deps). Same OFFSET pagination, same Parquet-streaming write shape.
Type mapping per ADR-010: int/bigint → Int32/Int64, decimal/float
→ Float64, tinyint(1)/bool → Boolean, everything else → Utf8 with
fallback parsers for date/time/json/uuid via Display.
POST /ingest/mysql parallel to /ingest/db. Same PII auto-detection,
same lineage capture (source_system="mysql"), same agent-trigger
hook. `redact_dsn` generalized — was hardcoded to "postgresql://"
length, now works for any scheme://user:pass@host/path URL (latent
PII leak fix for MySQL DSNs).
Verified live against MariaDB on localhost: 10 rows × 9 columns of
test data round-tripped through datatypes int/varchar/decimal/
tinyint/datetime/text. PII detection auto-flagged name + email.
Aggregation queries through DataFusion match the source values
exactly.
## Phase 18 — Hybrid Parquet+HNSW ⊕ Lance backend (ADR-019)
`vectord-lance` is a new firewall crate. Lance pulls Arrow 57 and
DataFusion 52 — incompatible with the rest of the workspace's
Arrow 55 / DataFusion 47. The firewall isolates that dep tree:
public API uses only std types (Vec<f32>, Vec<String>, Hit, Row,
*Stats), so no Arrow types cross the crate boundary and nothing
propagates to vectord. The ADR-019 path that didn't ship until now.
`vectord::lance_backend::LanceRegistry` lazy-creates a
LanceVectorStore per index, resolving bucket → URI via the
conventional local-bucket layout. `IndexMeta.vector_backend` and
`ModelProfile.vector_backend` carry the choice (default Parquet so
existing indexes unchanged).
Six routes under /vectors/lance/*:
- migrate/{idx}: convert binary-blob Parquet → Lance FixedSizeList
- index/{idx}: build IVF_PQ
- search/{idx}: vector search (embed via sidecar)
- doc/{idx}/{doc_id}: random row fetch
- append/{idx}: native fragment append
- stats/{idx}: row count + index presence
Verified live on the real resumes_100k_v2 corpus (100K × 768d):
- Migrate: 0.57s
- Build IVF_PQ index: 16.2s (matches ADR-019 bench; 14× faster than
HNSW's 230s for the same data)
- Search end-to-end (Ollama embed + Lance scan): 23-53ms
- Random doc_id fetch: 5-7ms (filter scan; faster than Parquet's
~35ms full-file scan, slower than the bench's 311us positional
take — would close that gap with a scalar btree on doc_id)
- Append 100 rows: 3.3ms / +320KB on disk vs Parquet's required
full ~330MB rewrite — the structural win
- Index survives append; both backends coexist cleanly
## Known follow-ups not in this milestone
- ModelProfile.vector_backend doesn't yet auto-route /vectors/profile/
{id}/search to Lance; callers go through /vectors/lance/* directly
- Scalar btree on doc_id (closes the 5-7ms → ~300us gap)
- vectord-lance built default-features=false → no S3 yet
- IVF_PQ recall not measured (ADR-019 caveat) — needs a Lance-aware
variant of the eval harness
- Watcher-path ingest doesn't push agent triggers (HTTP paths do)
- EvalSets still primary-only (federation gap)
- No PATCH endpoint to move an existing index between buckets
- The pre-existing storaged::append_log doctest fails to compile
(malformed `{prefix}/` parses as code fence) — pre-existing bug,
left for a focused fix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
4d5c49090c |
Phase 16: Hot-swap generations + autotune agent loop
Closes the self-iteration loop from the PRD reframe: an agent can
tune HNSW configs autonomously and the winner flows through to the
next profile activation without human intervention.
Three primitives:
1. PromotionRegistry (vectord::promotion)
- Per-index current + history at _hnsw_promotions/{index}.json
- promote(index, entry) atomically swaps current, pushes prior
onto history (capped at 50)
- rollback() pops history back onto current; clears current if
history exhausted
- config_or(index, default) — the read side used at build time,
returns promoted config if set else caller's default
- Full cache + persistence; writes are durable on return
2. Autotune (vectord::autotune)
- run_autotune(request, ...) — synchronous agent loop
- Default grid: 5 configs covering the practical range
(ec=20/40/80/80/160, es=30/30/30/60/30) with seed=42 for
reproducibility
- Every trial goes through the existing trial-journal pipeline
so autotune runs land alongside manual trials in the
"trials are data" log
- Winner: max recall first, then min p50 latency; must clear
min_recall gate (default 0.9) or no promotion happens
- Config bounds (ec ∈ [10,400], es ∈ [10,200]) reject absurd
values from the request's optional custom grid
- On winner: promote with note "autotune winner: recall=X p50=Y"
3. Wiring
- VectorState gains promotion_registry
- activate_profile now calls promotion_registry.config_or(...)
so newly-promoted configs are picked up on next activation —
the "hot-swap" is: autotune promotes -> profile activates ->
HNSW rebuilt with new config
- New endpoints:
POST /vectors/hnsw/promote/{index}/{trial_id}
?promoted_by=...¬e=...
POST /vectors/hnsw/rollback/{index}
GET /vectors/hnsw/promoted/{index}
POST /vectors/hnsw/autotune { index_name, harness,
min_recall?, grid? }
End-to-end verified on threat_intel_v1 (54 vectors):
- autogen harness 'threat_intel_smoke' (10 queries)
- POST /autotune -> 5 trials in 620ms, winner ec=20 es=30
recall=1.00 p50=64us auto-promoted
- Manual promote of ec=80 es=30 -> history depth 1
- Rollback -> back to ec=20 es=30 autotune winner
- Second rollback -> current cleared
- Re-promote + restart -> persistence verified
- Profile activation after promotion logged:
"building HNSW ef_construction=80 ef_search=30 seed=Some(42)"
proving the hot-swap loop is closed.
Deferred:
- Bayesian optimization (random-grid is fine at this config-space size)
- Append-triggered autotune (Phase 17.5 — refresh OnAppend policy
can schedule autotune after appending sufficient new rows)
- Concurrent autotune per index guard (JobTracker integration)
PRD invariants satisfied: invariant 8 (hot-swappable indexes) is now
real code — promote is atomic, rollback is always available, the
active generation is a persistent pointer not a runtime convention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
a293502265 |
Phase 17: Model profiles + scoped search — the LLM-brain keystone
Implements PRD invariant 9 ("every reader gets its own profile") and
completes the multi-model substrate vision. Local models (or agents)
bind to a named set of datasets; activation pre-loads their vector
indexes into memory; search enforces scope.
Schema (shared::types):
- ModelProfile { id, ollama_name, description, bound_datasets,
hnsw_config, embed_model, created_at, created_by }
- ProfileHnswConfig mirrors vectord::trial::HnswConfig to avoid a
cross-crate dep cycle. Default (ec=80, es=30) matches the Phase 15
trial winner.
- bound_datasets can reference raw dataset names OR AiView names
(both register as DataFusion tables with the same name, so mixing
raw tables and PII-redacted views composes naturally)
Catalog (catalogd::registry):
- put_profile validates id is a slug (alphanumeric + -_ only) and
every binding resolves to an existing dataset or view
- Persistence at _catalog/profiles/{id}.json, loaded on rebuild
- get_profile / list_profiles / delete_profile
HTTP endpoints:
- POST /catalog/profiles (create/update)
- GET /catalog/profiles (list)
- GET/DELETE /catalog/profiles/{id}
- POST /vectors/profile/{id}/activate (HNSW hot-load)
- POST /vectors/profile/{id}/search (scope-enforced)
Activation (vectord::service::activate_profile):
- For each bound dataset, find vector indexes with matching source
- Pre-load embeddings into EmbeddingCache
- Build HNSW with profile's config
- Report warmed indexes + per-binding failures + duration
- Failures on individual bindings don't abort — "substrate keeps
working" per ADR-017
Scoped search (vectord::service::profile_scoped_search):
- Look up profile, verify index.source ∈ profile.bound_datasets
- Returns 403 with allowed bindings list if out-of-scope
- Uses HNSW if index is warm, brute-force cosine otherwise (graceful
degradation — no "must activate first" friction)
Bug fix surfaced during testing: vectord::refresh::try_update_index_meta
was a no-op for first-time indexes, so threat_intel_v1 and
kb_team_runs_v1 (both built via refresh after Phase C shipped) didn't
show up in the index registry. Now it auto-infers the source from the
index name convention (`{source}_vN`) and registers new metadata with
reasonable defaults.
End-to-end verified:
- Created security-analyst profile bound to [threat_intel]
- POST /vectors/profile/security-analyst/activate → warmed
threat_intel_v1 (54 vectors) in 156ms, HNSW built
- Within-scope search: method=hnsw, returned relevant IP indicators
- Out-of-scope: tried to search resumes_100k_v2 (source=candidates)
→ 403 "profile 'security-analyst' is not bound to 'candidates' —
allowed bindings: [\"threat_intel\"]"
- staffing-recruiter profile created bound to candidates + placements;
search without activation fell through to brute_force (graceful)
Deferred (Phase 17 followups):
- VRAM-aware activation (unload-then-load via Ollama keep_alive=0)
— Ollama already handles this; we don't need to reinvent
- Model-identity in audit trail — Phase 13 has role-based audit;
adding model_id is ~20 LOC when we want it
- Profile bucket pre-load (profile:user bucket mount) — Phase 17.5
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
650f5e97b6 |
Fix chunker UTF-8 boundary panic (causes 120GB OOM in refresh path)
The chunker's &text[start..end] slice could land inside a multi-byte
UTF-8 character (e.g. narrow no-break space \u{202f}, em-dashes, smart
quotes — universal in pg-imported editorial data). Rust panics on
non-boundary string slicing. In the refresh path that panic is caught
by tokio's task machinery but somehow causes linear memory growth at
~540MB/sec until OOM at 120GB+.
Root cause: chunk boundaries computed by byte arithmetic without
checking is_char_boundary(). The existing "look for last sentence / \n
/ space" logic finds ASCII-safe positions, but the *primary* `end`
calculation `(start + chunk_size).min(text.len())` lands wherever.
Fix:
- ceil_char_boundary(s, idx) — forward-scan to the nearest valid
UTF-8 char boundary. Used at end, actual_end, and next_start.
- Iteration cap — break if iterations exceed text.len(). Any
non-progressing loop dies safely instead of burning memory.
- Forced forward advance — if overlap + boundary math produce a
next_start <= start, force +1 char to guarantee termination.
Reproduced on kb_team_runs (585 pg-imported prompts with editorial
unicode): previous run grew memory linearly to 124GB over 240s then
OOM-killed. Same request after fix: peaks at <100MB, completes in
~4m42s to produce 12,693 embeddings. /vectors/search returns
relevant results.
Regression tests added:
- handles_multibyte_utf8_at_chunk_boundary — exact \u{202f} repro
- no_infinite_loop_on_no_spaces — 5KB text, no whitespace
- no_infinite_loop_on_degenerate_params — chunk_size == overlap
Surfaced by Phase C, but pre-existed as a latent bug since Phase 7.
Any Ollama-targeted RAG corpus with non-ASCII content would have hit
this once it grew past ~13KB per document.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
97a376482c |
Phase C: Decoupled embedding refresh
Implements the llms3.com-inspired pattern: embeddings refresh
asynchronously, decoupled from transactional row writes. New rows arrive,
ingest marks the vector index stale, a later refresh embeds only the
delta (doc_ids not already in the index).
Schema additions (DatasetManifest):
- last_embedded_at: Option<DateTime> - when the index was last refreshed
- embedding_stale_since: Option<DateTime> - set when data written, cleared on refresh
- embedding_refresh_policy: Option<RefreshPolicy> - Manual | OnAppend | Scheduled
Ingest paths (pipeline::ingest_file + pg_stream) call
registry.mark_embeddings_stale after writing. No-op if the dataset has
never been embedded — stale semantics only kick in once last_embedded_at
is set.
Refresh pipeline (vectord::refresh::refresh_index):
- Reads the dataset Parquet, extracts (doc_id, text) pairs
- Accepts Utf8 / Int32 / Int64 id columns (covers both CSV and pg schemas)
- Loads existing embeddings via EmbeddingCache (empty on first-time build)
- Filters to rows whose doc_id is NOT in the existing set
- Chunks (chunker::chunk_column), embeds via Ollama (batches of 32),
writes combined index, clears stale flag
Endpoints:
- POST /vectors/refresh/{dataset_name} - body {index_name, id_column,
text_column, chunk_size?, overlap?}
- GET /vectors/stale - lists datasets whose embedding_stale_since is set
End-to-end verified on threat_intel (knowledge_base.threat_intel):
- Initial refresh: 20 rows -> 20 chunks -> embedded in 2.1s,
last_embedded_at set
- Idempotent second refresh: 0 new docs -> 1.8ms (pure delta check)
- Re-ingest to 54 rows: mark_embeddings_stale fires -> stale_since set
- /vectors/stale surfaces threat_intel with timestamps + policy
- Delta refresh: 34 new docs embedded in 970ms (6x faster than full
re-embed); stale_cleared = true
Not in MVP scope:
- UPDATE semantics (same doc_id, different content) - would need
per-row content hashing
- OnAppend policy auto-trigger - just declares intent; actual scheduler
deferred
- Scheduler runtime - the Scheduled(cron) variant declares the intent so
operators can see which datasets expect what, but the cron itself is
separate
Per ADR-019: when a profile switches to vector_backend=Lance, this
refresh path benefits — Lance's native append replaces our "read all +
rewrite" Parquet rebuild pattern. Current MVP works well enough at
~500-5K rows to validate the architecture; Lance unblocks the 5M+ case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
dbe00d018f |
Federation foundation + HNSW trial system + Postgres streaming + PRD reframe
Four shipped features and a PRD realignment, all measured end-to-end:
HNSW trial system (Phase 15 horizon item → complete)
- vectord: EmbeddingCache, harness (eval sets + brute-force ground truth),
TrialJournal, parameterized HnswConfig on build_index_with_config
- /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best,
/hnsw/evals/{name}/autogen, /hnsw/cache/stats
- Measured on resumes_100k_v2 (100K × 768d): brute-force 44ms -> HNSW 873us
at 100% recall@10. ec=80 es=30 locked as HnswConfig::default()
- Lower ec values trade recall for build time: 20/30 = 0.96 recall in 8s,
80/30 = 1.00 recall in 230s
Catalog manifest repair
- catalogd: resync_from_parquet reads parquet footers to restore row_count
and columns on drifted manifests
- POST /catalog/datasets/{name}/resync + POST /catalog/resync-missing
- All 7 staffing tables recovered to PRD-matching 2,469,278 rows
Federation foundation (ADR-017)
- shared::secrets: SecretsProvider trait + FileSecretsProvider (reads
/etc/lakehouse/secrets.toml, enforces 0600 perms)
- storaged::registry::BucketRegistry — multi-bucket resolution with
rescue_bucket read fallback and reachability probing
- storaged::error_journal — bucket op failures visible in one HTTP call
- storaged::append_log — write-once batched append pattern (fixes the RMW
anti-pattern llms3.com calls out; errors and trial journals both use it)
- /storage/buckets, /storage/errors, /storage/bucket-health,
/storage/errors/{flush,compact}
- Bucket-aware I/O at /storage/buckets/{bucket}/objects/{*key} with
X-Lakehouse-Rescue-Used observability headers on fallback
Postgres streaming ingest
- ingestd::pg_stream: DSN parser, batched ORDER BY + LIMIT/OFFSET pagination
into ArrowWriter, lineage redacts password
- POST /ingest/db — verified against live knowledge_base.team_runs
(586 rows × 13 cols, 6 batches, 196ms end-to-end)
PRD realignment (2026-04-16)
- Dual use case: staffing analytics + local LLM knowledge substrate
- Removed "multi-tenancy (single-owner system)" from non-goals
- Added invariants 8-11: indexes hot-swappable, per-reader profiles,
trials-as-data, operational failures findable in one HTTP call
- New phases 16 (hot-swap generations), 17 (model profiles + dataset
bindings), 18 (Lance vs Parquet+sidecar evaluation)
- Known ceilings table documents the 5M vector wall and escape hatches
- ADR-017 (federation), ADR-018 (append-log pattern) added
- EXECUTION_PLAN.md sequences phases B-E with success gates and
decision rules
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
04770c97eb |
HNSW vector index: 100K search in 27ms (58x faster than brute-force)
- instant-distance HNSW implementation for approximate nearest neighbors - HnswStore: build from stored embeddings, in-memory index, thread-safe - POST /vectors/hnsw/build — build index from Parquet (100K in 35s release) - POST /vectors/hnsw/search — fast ANN search - GET /vectors/hnsw/list — list loaded indexes Benchmark (100K × 768d, release build): Brute-force: 1,567ms HNSW: 31ms (50x) HNSW warm: 27ms (58x) Build cost: 35s one-time for 100K vectors (release mode) ef_construction=40, ef_search=50 — good recall/speed balance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
6cd1daeb51 |
Phase 11: Embedding versioning — model-proof vector layer
- IndexRegistry: tracks all vector indexes with model metadata
(model_name, model_version, dimensions, build stats)
- Index metadata persisted as JSON in vectors/meta/
- Rebuilt on startup for crash recovery
- GET /vectors/indexes — list all indexes (filter by source/model)
- GET /vectors/indexes/{name} — get index metadata
- Background jobs auto-register metadata on completion
- Multi-version support: same data, different models, coexist
- Per ADR-014: enables incremental re-embed on model upgrade
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
3b695cd592 |
Dual-pipeline supervisor for embedding ingestion
- 4 parallel pipelines (tuned for i9 + A4000)
- Range-based work splitting (2500 chunks per range)
- Round-robin retry on failure (3 attempts before dead-letter)
- Checkpointing to disk every 1000 chunks (crash recovery)
- On restart, loads checkpoint and skips completed ranges
- Dead-letter queue for permanently failed ranges
- Vectors assembled in order after all pipelines finish
- Batch size 64 for GPU throughput
Architecture:
Supervisor → splits 100K chunks into 40 ranges
├── Pipeline 0: grabs range, embeds, reports progress
├── Pipeline 1: grabs range, embeds, reports progress
├── Pipeline 2: grabs range, embeds, reports progress
└── Pipeline 3: grabs range, embeds, reports progress
Failed range → back to queue → next available pipeline retries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|