2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7fe47babd9 |
Phase 45 slice 4: batch scan + T3 drift-correction synthesis
Some checks failed
lakehouse/auditor cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, l
Closes the PRD-listed remaining deliverables for Phase 45:
- POST /vectors/playbook_memory/doc_drift/scan
- T3 synthesis writing data/_kb/doc_drift_corrections.jsonl
- Backfill: unit tests for mcp-server/context7_bridge.ts (slice 2
never had any)
crates/vectord/src/drift_synth.rs (NEW, 240 LOC)
- DriftCorrection shape matching the PRD spec exactly
- synthesize(): HTTPS POST to ollama.com/api/generate with
gpt-oss:120b. Prompt explicitly instructs the model to admit
"preview insufficient" rather than fabricate a diff.
- append_correction(): JSONL append to data/_kb/ with mkdir -p
on the parent; atomic at line level on Linux for typical sizes.
- spawn_synthesize_and_append(): fire-and-forget wrapper. Never
blocks the handler. No cloud key → skipped silently with a
tracing::warn. Cloud failure → logged + dropped.
- resolve_cloud_key(): same sources v1/ollama_cloud.rs uses
(env OLLAMA_CLOUD_KEY → /root/llm_team_config.json → env
OLLAMA_CLOUD_API_KEY).
- 5 unit tests: JSON extraction (first object, code fences,
unclosed), prompt composition, jsonl append shape.
crates/vectord/src/service.rs
- /playbook_memory/doc_drift/scan — iterates active entries with
doc_refs, optional (city, state, max_entries) filter. Per-entry:
bridge check → flag if drifted → spawn synthesis per drifted
tool. Honest response: scanned, eligible, drifted, newly_flagged,
unknown, synthesis_spawned, details[].
- /playbook_memory/doc_drift/check/{id}: slice 3 handler now also
spawns synthesis per drifted tool. Response adds
synthesis_spawned: bool.
mcp-server/context7_bridge.ts
- Export normalizeTool + hashContent for testing.
- Guard Bun.serve() behind `if (import.meta.main)` so imports
don't double-bind :3900 (collides with systemd service).
mcp-server/context7_bridge.test.ts (NEW, 6 tests)
- normalizeTool: lowercases + trims, preserves internal chars
- hashContent: deterministic, sensitive to 1-char change,
16 hex chars, differs for empty vs whitespace
Live verification (after gateway restart):
seed playbook pb-seed-88abc7d1 with doc_refs[docker v23.0.0,
stale hash]
POST /doc_drift/scan {city:"Toledo", state:"OH", max_entries:5}
→ scanned=1 drifted=1 newly_flagged=1
synthesis_spawned=1 unknown=0
wait 30s
cat data/_kb/doc_drift_corrections.jsonl
→ 1 record (603 bytes) with diff_summary + recommended_action
from gpt-oss:120b. Model correctly noted "preview unavailable"
rather than fabricating.
Tests: 6 bridge tests + 6 drift_synth tests + 51 pre-existing
vectord lib tests. All green. Release build clean.
NOT in this PR (deliberately — cohesion review pending):
- Auditor's kb_query check consulting hybrid_search + context7
- Auditor's inference check consuming KB neighbors + drift
corrections as context
- Observer → KB → auditor feedback loop beyond append
- Integration test exercising the full smarter-DB loop
- Python script (sidecar/*, scripts/*) inventory
Those are the cohesion work J flagged — handled on a separate
branch after this merges.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
affab8ac83 |
Phase 45 slice 2: context7 HTTP bridge for doc drift detection
Bun bridge on :3900 that wraps context7's public API and exposes the
surface gateway consumes for Phase 45 drift checks. Own port so a
failure here never tips over mcp-server on :3700.
Endpoints:
GET /health status + cache stats
GET /docs/:tool resolve tool → library_id → fetch
docs → return descriptor
{snippet_hash, last_updated,
source_url, docs_preview, ...}
GET /docs/:tool/diff?since=X compare current snippet_hash to X;
returns {drifted: bool, current,
previous, preview if drifted}
GET /cache debug dump of cached entries
Implementation notes:
- 5 minute in-memory cache (context7 rate-limits by IP; gateway
drift-checks are the hot caller)
- 1500-token slices from context7 (enough for drift-meaningful
hash, not so much we hammer their API)
- snippet_hash = SHA-256 prefix (16 hex chars) of fetched content
- Library resolution prefers "finalized" state; falls back to top
result if none finalized
Verified live against context7.com:
- /health → ok, 0 cache, 300s TTL
- /docs/docker → library_id /docker/docs,
title "Docker", hash
475a0396ca436bba, last
updated 2026-04-20
- /docs/docker (again) → cache hit, 0.37ms
(5400× speedup)
- /docs/docker/diff?since=stale-hash-0000 → drifted=true, preview
included
- /docs/docker/diff?since=<current hash> → drifted=false, preview
omitted (honest: no
drift to show)
Not yet wired:
- Gateway consumer (Phase 45 slice 3):
/vectors/playbook_memory/doc_drift/check/{id} calls this bridge
and updates DocRef.snippet_hash + doc_drift_flagged_at
- Systemd unit (bridge is manual-start for now, same as bot/)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|