lakehouse

profit/lakehouse

Fork 0

Commit Graph

Author	SHA1	Message	Date
profit	7fe47babd9	Phase 45 slice 4: batch scan + T3 drift-correction synthesis Some checks failed lakehouse/auditor cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, l Closes the PRD-listed remaining deliverables for Phase 45: - POST /vectors/playbook_memory/doc_drift/scan - T3 synthesis writing data/_kb/doc_drift_corrections.jsonl - Backfill: unit tests for mcp-server/context7_bridge.ts (slice 2 never had any) crates/vectord/src/drift_synth.rs (NEW, 240 LOC) - DriftCorrection shape matching the PRD spec exactly - synthesize(): HTTPS POST to ollama.com/api/generate with gpt-oss:120b. Prompt explicitly instructs the model to admit "preview insufficient" rather than fabricate a diff. - append_correction(): JSONL append to data/_kb/ with mkdir -p on the parent; atomic at line level on Linux for typical sizes. - spawn_synthesize_and_append(): fire-and-forget wrapper. Never blocks the handler. No cloud key → skipped silently with a tracing::warn. Cloud failure → logged + dropped. - resolve_cloud_key(): same sources v1/ollama_cloud.rs uses (env OLLAMA_CLOUD_KEY → /root/llm_team_config.json → env OLLAMA_CLOUD_API_KEY). - 5 unit tests: JSON extraction (first object, code fences, unclosed), prompt composition, jsonl append shape. crates/vectord/src/service.rs - /playbook_memory/doc_drift/scan — iterates active entries with doc_refs, optional (city, state, max_entries) filter. Per-entry: bridge check → flag if drifted → spawn synthesis per drifted tool. Honest response: scanned, eligible, drifted, newly_flagged, unknown, synthesis_spawned, details[]. - /playbook_memory/doc_drift/check/{id}: slice 3 handler now also spawns synthesis per drifted tool. Response adds synthesis_spawned: bool. mcp-server/context7_bridge.ts - Export normalizeTool + hashContent for testing. - Guard Bun.serve() behind `if (import.meta.main)` so imports don't double-bind :3900 (collides with systemd service). mcp-server/context7_bridge.test.ts (NEW, 6 tests) - normalizeTool: lowercases + trims, preserves internal chars - hashContent: deterministic, sensitive to 1-char change, 16 hex chars, differs for empty vs whitespace Live verification (after gateway restart): seed playbook pb-seed-88abc7d1 with doc_refs[docker v23.0.0, stale hash] POST /doc_drift/scan {city:"Toledo", state:"OH", max_entries:5} → scanned=1 drifted=1 newly_flagged=1 synthesis_spawned=1 unknown=0 wait 30s cat data/_kb/doc_drift_corrections.jsonl → 1 record (603 bytes) with diff_summary + recommended_action from gpt-oss:120b. Model correctly noted "preview unavailable" rather than fabricating. Tests: 6 bridge tests + 6 drift_synth tests + 51 pre-existing vectord lib tests. All green. Release build clean. NOT in this PR (deliberately — cohesion review pending): - Auditor's kb_query check consulting hybrid_search + context7 - Auditor's inference check consuming KB neighbors + drift corrections as context - Observer → KB → auditor feedback loop beyond append - Integration test exercising the full smarter-DB loop - Python script (sidecar/, scripts/) inventory Those are the cohesion work J flagged — handled on a separate branch after this merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:15:32 -05:00
profit	affab8ac83	Phase 45 slice 2: context7 HTTP bridge for doc drift detection Bun bridge on :3900 that wraps context7's public API and exposes the surface gateway consumes for Phase 45 drift checks. Own port so a failure here never tips over mcp-server on :3700. Endpoints: GET /health status + cache stats GET /docs/:tool resolve tool → library_id → fetch docs → return descriptor {snippet_hash, last_updated, source_url, docs_preview, ...} GET /docs/:tool/diff?since=X compare current snippet_hash to X; returns {drifted: bool, current, previous, preview if drifted} GET /cache debug dump of cached entries Implementation notes: - 5 minute in-memory cache (context7 rate-limits by IP; gateway drift-checks are the hot caller) - 1500-token slices from context7 (enough for drift-meaningful hash, not so much we hammer their API) - snippet_hash = SHA-256 prefix (16 hex chars) of fetched content - Library resolution prefers "finalized" state; falls back to top result if none finalized Verified live against context7.com: - /health → ok, 0 cache, 300s TTL - /docs/docker → library_id /docker/docs, title "Docker", hash 475a0396ca436bba, last updated 2026-04-20 - /docs/docker (again) → cache hit, 0.37ms (5400× speedup) - /docs/docker/diff?since=stale-hash-0000 → drifted=true, preview included - /docs/docker/diff?since=<current hash> → drifted=false, preview omitted (honest: no drift to show) Not yet wired: - Gateway consumer (Phase 45 slice 3): /vectors/playbook_memory/doc_drift/check/{id} calls this bridge and updates DocRef.snippet_hash + doc_drift_flagged_at - Systemd unit (bridge is manual-start for now, same as bot/) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 03:17:17 -05:00

Author

SHA1

Message

Date

profit

7fe47babd9

Phase 45 slice 4: batch scan + T3 drift-correction synthesis

lakehouse/auditor cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, l

Closes the PRD-listed remaining deliverables for Phase 45:
- POST /vectors/playbook_memory/doc_drift/scan
- T3 synthesis writing data/_kb/doc_drift_corrections.jsonl
- Backfill: unit tests for mcp-server/context7_bridge.ts (slice 2
  never had any)

crates/vectord/src/drift_synth.rs (NEW, 240 LOC)
  - DriftCorrection shape matching the PRD spec exactly
  - synthesize(): HTTPS POST to ollama.com/api/generate with
    gpt-oss:120b. Prompt explicitly instructs the model to admit
    "preview insufficient" rather than fabricate a diff.
  - append_correction(): JSONL append to data/_kb/ with mkdir -p
    on the parent; atomic at line level on Linux for typical sizes.
  - spawn_synthesize_and_append(): fire-and-forget wrapper. Never
    blocks the handler. No cloud key → skipped silently with a
    tracing::warn. Cloud failure → logged + dropped.
  - resolve_cloud_key(): same sources v1/ollama_cloud.rs uses
    (env OLLAMA_CLOUD_KEY → /root/llm_team_config.json → env
    OLLAMA_CLOUD_API_KEY).
  - 5 unit tests: JSON extraction (first object, code fences,
    unclosed), prompt composition, jsonl append shape.

crates/vectord/src/service.rs
  - /playbook_memory/doc_drift/scan — iterates active entries with
    doc_refs, optional (city, state, max_entries) filter. Per-entry:
    bridge check → flag if drifted → spawn synthesis per drifted
    tool. Honest response: scanned, eligible, drifted, newly_flagged,
    unknown, synthesis_spawned, details[].
  - /playbook_memory/doc_drift/check/{id}: slice 3 handler now also
    spawns synthesis per drifted tool. Response adds
    synthesis_spawned: bool.

mcp-server/context7_bridge.ts
  - Export normalizeTool + hashContent for testing.
  - Guard Bun.serve() behind `if (import.meta.main)` so imports
    don't double-bind :3900 (collides with systemd service).

mcp-server/context7_bridge.test.ts (NEW, 6 tests)
  - normalizeTool: lowercases + trims, preserves internal chars
  - hashContent: deterministic, sensitive to 1-char change,
    16 hex chars, differs for empty vs whitespace

Live verification (after gateway restart):

  seed playbook pb-seed-88abc7d1 with doc_refs[docker v23.0.0,
  stale hash]
  POST /doc_drift/scan {city:"Toledo", state:"OH", max_entries:5}
    → scanned=1 drifted=1 newly_flagged=1
       synthesis_spawned=1 unknown=0
  wait 30s
  cat data/_kb/doc_drift_corrections.jsonl
    → 1 record (603 bytes) with diff_summary + recommended_action
      from gpt-oss:120b. Model correctly noted "preview unavailable"
      rather than fabricating.

Tests: 6 bridge tests + 6 drift_synth tests + 51 pre-existing
vectord lib tests. All green. Release build clean.

NOT in this PR (deliberately — cohesion review pending):
  - Auditor's kb_query check consulting hybrid_search + context7
  - Auditor's inference check consuming KB neighbors + drift
    corrections as context
  - Observer → KB → auditor feedback loop beyond append
  - Integration test exercising the full smarter-DB loop
  - Python script (sidecar/*, scripts/*) inventory

Those are the cohesion work J flagged — handled on a separate
branch after this merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 17:15:32 -05:00

profit

affab8ac83

Phase 45 slice 2: context7 HTTP bridge for doc drift detection

Bun bridge on :3900 that wraps context7's public API and exposes the
surface gateway consumes for Phase 45 drift checks. Own port so a
failure here never tips over mcp-server on :3700.

Endpoints:
  GET /health                    status + cache stats
  GET /docs/:tool                resolve tool → library_id → fetch
                                 docs → return descriptor
                                 {snippet_hash, last_updated,
                                 source_url, docs_preview, ...}
  GET /docs/:tool/diff?since=X   compare current snippet_hash to X;
                                 returns {drifted: bool, current,
                                 previous, preview if drifted}
  GET /cache                     debug dump of cached entries

Implementation notes:
- 5 minute in-memory cache (context7 rate-limits by IP; gateway
  drift-checks are the hot caller)
- 1500-token slices from context7 (enough for drift-meaningful
  hash, not so much we hammer their API)
- snippet_hash = SHA-256 prefix (16 hex chars) of fetched content
- Library resolution prefers "finalized" state; falls back to top
  result if none finalized

Verified live against context7.com:
- /health                                  → ok, 0 cache, 300s TTL
- /docs/docker                             → library_id /docker/docs,
                                             title "Docker", hash
                                             475a0396ca436bba, last
                                             updated 2026-04-20
- /docs/docker (again)                     → cache hit, 0.37ms
                                             (5400× speedup)
- /docs/docker/diff?since=stale-hash-0000  → drifted=true, preview
                                             included
- /docs/docker/diff?since=<current hash>   → drifted=false, preview
                                             omitted (honest: no
                                             drift to show)

Not yet wired:
- Gateway consumer (Phase 45 slice 3):
  /vectors/playbook_memory/doc_drift/check/{id} calls this bridge
  and updates DocRef.snippet_hash + doc_drift_flagged_at
- Systemd unit (bridge is manual-start for now, same as bot/)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 03:17:17 -05:00

2 Commits