Phase 45 slice 4: batch scan + T3 drift-correction synthesis #6

profit · 2026-04-22T22:16:04Z

profit commented

2026-04-22 22:16:04 +00:00

Closes Phase 45 per the PRD spec

PRD Phase 45 deliverables are now:

✅ DocRef + doc_refs field on PlaybookEntry (slice 1)
✅ context7_bridge.ts (slice 2)
✅ /doc_drift/check + /resolve (slice 3)
✅ /doc_drift/scan + T3 synthesis → data/_kb/doc_drift_corrections.jsonl (this PR)

What's in this PR

crates/vectord/src/drift_synth.rs — 240 LOC, 5 unit tests
/doc_drift/scan handler in service.rs
Synthesis hook in /doc_drift/check (fire-and-forget per drifted tool)
Backfill: unit tests for context7_bridge.ts (slice 2 had none)

Live-verified

Seed → scan Toledo OH → drifted=1 flagged=1 synthesis_spawned=1 → cat doc_drift_corrections.jsonl shows record with real diff_summary + recommended_action from gpt-oss:120b (model honestly noted preview-unavailable rather than fabricating).

What this PR does NOT ship

The cohesion concern J flagged: auditor's kb_query + inference checks don't yet consult hybrid_search / context7 / KB neighbors as context. That's the integration-test work on a separate branch. This PR closes Phase 45's stated deliverables cleanly; the meta-index-gets-smarter loop is the next effort.

## Closes Phase 45 per the PRD spec PRD Phase 45 deliverables are now: - ✅ `DocRef` + `doc_refs` field on `PlaybookEntry` (slice 1) - ✅ `context7_bridge.ts` (slice 2) - ✅ `/doc_drift/check` + `/resolve` (slice 3) - ✅ `/doc_drift/scan` + T3 synthesis → `data/_kb/doc_drift_corrections.jsonl` (this PR) ## What's in this PR - `crates/vectord/src/drift_synth.rs` — 240 LOC, 5 unit tests - `/doc_drift/scan` handler in `service.rs` - Synthesis hook in `/doc_drift/check` (fire-and-forget per drifted tool) - Backfill: unit tests for `context7_bridge.ts` (slice 2 had none) ## Live-verified Seed → scan Toledo OH → drifted=1 flagged=1 synthesis_spawned=1 → cat doc_drift_corrections.jsonl shows record with real diff_summary + recommended_action from gpt-oss:120b (model honestly noted preview-unavailable rather than fabricating). ## What this PR does NOT ship The cohesion concern J flagged: auditor's kb_query + inference checks don't yet consult hybrid_search / context7 / KB neighbors as context. That's the integration-test work on a separate branch. This PR closes Phase 45's stated deliverables cleanly; the meta-index-gets-smarter loop is the next effort.

profit added 1 commit 2026-04-22 22:16:05 +00:00

Phase 45 slice 4: batch scan + T3 drift-correction synthesis

lakehouse/auditor cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, l

7fe47babd9

Closes the PRD-listed remaining deliverables for Phase 45:
- POST /vectors/playbook_memory/doc_drift/scan
- T3 synthesis writing data/_kb/doc_drift_corrections.jsonl
- Backfill: unit tests for mcp-server/context7_bridge.ts (slice 2
  never had any)

crates/vectord/src/drift_synth.rs (NEW, 240 LOC)
  - DriftCorrection shape matching the PRD spec exactly
  - synthesize(): HTTPS POST to ollama.com/api/generate with
    gpt-oss:120b. Prompt explicitly instructs the model to admit
    "preview insufficient" rather than fabricate a diff.
  - append_correction(): JSONL append to data/_kb/ with mkdir -p
    on the parent; atomic at line level on Linux for typical sizes.
  - spawn_synthesize_and_append(): fire-and-forget wrapper. Never
    blocks the handler. No cloud key → skipped silently with a
    tracing::warn. Cloud failure → logged + dropped.
  - resolve_cloud_key(): same sources v1/ollama_cloud.rs uses
    (env OLLAMA_CLOUD_KEY → /root/llm_team_config.json → env
    OLLAMA_CLOUD_API_KEY).
  - 5 unit tests: JSON extraction (first object, code fences,
    unclosed), prompt composition, jsonl append shape.

crates/vectord/src/service.rs
  - /playbook_memory/doc_drift/scan — iterates active entries with
    doc_refs, optional (city, state, max_entries) filter. Per-entry:
    bridge check → flag if drifted → spawn synthesis per drifted
    tool. Honest response: scanned, eligible, drifted, newly_flagged,
    unknown, synthesis_spawned, details[].
  - /playbook_memory/doc_drift/check/{id}: slice 3 handler now also
    spawns synthesis per drifted tool. Response adds
    synthesis_spawned: bool.

mcp-server/context7_bridge.ts
  - Export normalizeTool + hashContent for testing.
  - Guard Bun.serve() behind `if (import.meta.main)` so imports
    don't double-bind :3900 (collides with systemd service).

mcp-server/context7_bridge.test.ts (NEW, 6 tests)
  - normalizeTool: lowercases + trims, preserves internal chars
  - hashContent: deterministic, sensitive to 1-char change,
    16 hex chars, differs for empty vs whitespace

Live verification (after gateway restart):

  seed playbook pb-seed-88abc7d1 with doc_refs[docker v23.0.0,
  stale hash]
  POST /doc_drift/scan {city:"Toledo", state:"OH", max_entries:5}
    → scanned=1 drifted=1 newly_flagged=1
       synthesis_spawned=1 unknown=0
  wait 30s
  cat data/_kb/doc_drift_corrections.jsonl
    → 1 record (603 bytes) with diff_summary + recommended_action
      from gpt-oss:120b. Model correctly noted "preview unavailable"
      rather than fabricating.

Tests: 6 bridge tests + 6 drift_synth tests + 51 pre-existing
vectord lib tests. All green. Release build clean.

NOT in this PR (deliberately — cohesion review pending):
  - Auditor's kb_query check consulting hybrid_search + context7
  - Auditor's inference check consuming KB neighbors + drift
    corrections as context
  - Observer → KB → auditor feedback loop beyond append
  - Integration test exercising the full smarter-DB loop
  - Python script (sidecar/*, scripts/*) inventory

Those are the cohesion work J flagged — handled on a separate
branch after this merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

profit commented

2026-04-22 22:17:00 +00:00

Auditor verdict: ✅ `approve`

One-liner: all checks passed (3 findings, all info)
Head SHA: 7fe47babd92a
Audited at: 2026-04-22T22:17:00.227Z

dynamic — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — dynamic check skipped — skipped by options

skipped by options

inference — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — cloud review completed (model=gpt-oss:120b, tokens=7313)

claim_verdicts: 1, unflagged_gaps: 0

kb_query — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — KB: 69 recent scenario runs, 209/289 events ok (fail rate 27.7%)

most recent: scenario-2026-04-21T05-29-34
recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009

Metrics

{
  "audit_duration_ms": 9699,
  "findings_total": 3,
  "findings_block": 0,
  "findings_warn": 0,
  "findings_info": 3,
  "claims_strong": 0,
  "claims_moderate": 1,
  "claims_weak": 0,
  "claims_total": 1,
  "diff_bytes": 22729
}

_{Lakehouse auditor · SHA 7fe47bab · re-audit on new commit flips the status automatically.}

## Auditor verdict: ✅ `approve` **One-liner:** all checks passed (3 findings, all info) **Head SHA:** `7fe47babd92a` **Audited at:** 2026-04-22T22:17:00.227Z <details><summary>dynamic — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — dynamic check skipped — skipped by options - `skipped by options` </details> <details><summary>inference — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — cloud review completed (model=gpt-oss:120b, tokens=7313) - `claim_verdicts: 1, unflagged_gaps: 0` </details> <details><summary>kb_query — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — KB: 69 recent scenario runs, 209/289 events ok (fail rate 27.7%) - `most recent: scenario-2026-04-21T05-29-34` - `recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009` </details> ### Metrics ```json { "audit_duration_ms": 9699, "findings_total": 3, "findings_block": 0, "findings_warn": 0, "findings_info": 3, "claims_strong": 0, "claims_moderate": 1, "claims_weak": 0, "claims_total": 1, "diff_bytes": 22729 } ``` Lakehouse auditor · SHA 7fe47bab · re-audit on new commit flips the status automatically.

profit commented

2026-04-22 22:20:40 +00:00

Auditor verdict: ✅ `approve`

One-liner: all checks passed (3 findings, all info)
Head SHA: 7fe47babd92a
Audited at: 2026-04-22T22:20:40.076Z

dynamic — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — dynamic check skipped — skipped by options

skipped by options

inference — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — cloud review completed (model=gpt-oss:120b, tokens=7103)

claim_verdicts: 1, unflagged_gaps: 0

kb_query — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — KB: 69 recent scenario runs, 209/289 events ok (fail rate 27.7%)

most recent: scenario-2026-04-21T05-29-34
recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009

Metrics

{
  "audit_duration_ms": 9151,
  "findings_total": 3,
  "findings_block": 0,
  "findings_warn": 0,
  "findings_info": 3,
  "claims_strong": 0,
  "claims_moderate": 1,
  "claims_weak": 0,
  "claims_total": 1,
  "diff_bytes": 22729
}

_{Lakehouse auditor · SHA 7fe47bab · re-audit on new commit flips the status automatically.}

## Auditor verdict: ✅ `approve` **One-liner:** all checks passed (3 findings, all info) **Head SHA:** `7fe47babd92a` **Audited at:** 2026-04-22T22:20:40.076Z <details><summary>dynamic — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — dynamic check skipped — skipped by options - `skipped by options` </details> <details><summary>inference — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — cloud review completed (model=gpt-oss:120b, tokens=7103) - `claim_verdicts: 1, unflagged_gaps: 0` </details> <details><summary>kb_query — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — KB: 69 recent scenario runs, 209/289 events ok (fail rate 27.7%) - `most recent: scenario-2026-04-21T05-29-34` - `recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009` </details> ### Metrics ```json { "audit_duration_ms": 9151, "findings_total": 3, "findings_block": 0, "findings_warn": 0, "findings_info": 3, "claims_strong": 0, "claims_moderate": 1, "claims_weak": 0, "claims_total": 1, "diff_bytes": 22729 } ``` Lakehouse auditor · SHA 7fe47bab · re-audit on new commit flips the status automatically.

profit commented

2026-04-22 22:22:01 +00:00

Auditor verdict: ⚠️ `request_changes`

One-liner: cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, leaving the actua
Head SHA: 7fe47babd92a
Audited at: 2026-04-22T22:22:01.111Z

dynamic — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — dynamic check skipped — skipped by options

skipped by options

inference — 2 findings (0 block, 1 warn, 1 info)

ℹ️ info — cloud review completed (model=gpt-oss:120b, tokens=7482)

claim_verdicts: 1, unflagged_gaps: 1
⚠️ warn — cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, leaving the actua
location: crates/vectord/src/drift_synth.rs:~150

kb_query — 1 findings (0 block, 0 warn, 1 info)

ℹ️ info — KB: 70 recent scenario runs, 210/290 events ok (fail rate 27.6%)

most recent: ?
recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009

Metrics

{
  "audit_duration_ms": 12700,
  "findings_total": 4,
  "findings_block": 0,
  "findings_warn": 1,
  "findings_info": 3,
  "claims_strong": 0,
  "claims_moderate": 1,
  "claims_weak": 0,
  "claims_total": 1,
  "diff_bytes": 22729
}

_{Lakehouse auditor · SHA 7fe47bab · re-audit on new commit flips the status automatically.}

## Auditor verdict: ⚠️ `request_changes` **One-liner:** cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, leaving the actua **Head SHA:** `7fe47babd92a` **Audited at:** 2026-04-22T22:22:01.111Z <details><summary>dynamic — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — dynamic check skipped — skipped by options - `skipped by options` </details> <details><summary>inference — 2 findings (0 block, 1 warn, 1 info)</summary> ℹ️ **info** — cloud review completed (model=gpt-oss:120b, tokens=7482) - `claim_verdicts: 1, unflagged_gaps: 1` ⚠️ **warn** — cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, leaving the actua - `location: crates/vectord/src/drift_synth.rs:~150` </details> <details><summary>kb_query — 1 findings (0 block, 0 warn, 1 info)</summary> ℹ️ **info** — KB: 70 recent scenario runs, 210/290 events ok (fail rate 27.6%) - `most recent: ?` - `recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009` </details> ### Metrics ```json { "audit_duration_ms": 12700, "findings_total": 4, "findings_block": 0, "findings_warn": 1, "findings_info": 3, "claims_strong": 0, "claims_moderate": 1, "claims_weak": 0, "claims_total": 1, "diff_bytes": 22729 } ``` Lakehouse auditor · SHA 7fe47bab · re-audit on new commit flips the status automatically.

profit referenced this pull request

2026-04-22 22:22:43 +00:00

Cohesion: Python inventory + integration plan + Phase A (auditor→observer+KB) #7

profit referenced this issue from a commit

2026-04-22 22:22:44 +00:00

Cohesion: Python inventory + integration plan + Phase A verdict indexing

profit referenced this pull request

2026-04-23 02:19:19 +00:00

Real-world pipelines + cohesion Phase C: scrum-master tree-split + auditor kb_query wire #8

profit referenced this pull request

2026-04-23 04:42:39 +00:00

Audit pipeline PR #9: determinism + fact extraction + verifier gate + KB stats #9

profit referenced this issue from a commit

2026-04-23 04:53:13 +00:00

claim_parser: history/proof claims join empirical class

profit referenced this issue from a commit

2026-04-27 04:19:59 +00:00

distillation: Phase 6 — acceptance gate suite

profit commented

2026-05-03 03:36:46 +00:00

Closing — superseded. The /playbook_memory/doc_drift/scan endpoint shipped on main via inline implementation in commit 6cafa7e (Phase 45 closure 2026-04-27). The crates/vectord/src/drift_synth.rs extraction in this PR was not the path that landed; the inline scan_doc_drift handler at service.rs:2608 is the canonical version. Different code, same surface — close.

Closing — superseded. The `/playbook_memory/doc_drift/scan` endpoint shipped on `main` via inline implementation in commit `6cafa7e` (Phase 45 closure 2026-04-27). The `crates/vectord/src/drift_synth.rs` extraction in this PR was not the path that landed; the inline `scan_doc_drift` handler at `service.rs:2608` is the canonical version. Different code, same surface — close.

profit closed this pull request

2026-05-03 03:36:47 +00:00

lakehouse/auditor cloud-flagged gap not in any claim: append_correction test writes a temp file directly instead of invoking the append_correction function, l

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: profit/lakehouse#6

Phase 45 slice 4: batch scan + T3 drift-correction synthesis #6

Closes Phase 45 per the PRD spec

What's in this PR

Live-verified

What this PR does NOT ship

Auditor verdict: ✅ approve

Metrics

Auditor verdict: ✅ approve

Metrics

Auditor verdict: ⚠️ request_changes

Metrics

Pull request closed

Auditor verdict: ✅ `approve`

Auditor verdict: ✅ `approve`

Auditor verdict: ⚠️ `request_changes`