Auditor: dynamic + inference checks
auditor/checks/dynamic.ts — wraps runHybridFixture, maps layer
results to Findings. Placeholder-style errors (404/unimplemented/
slice N) → info; other failures → warn. Always emits a summary
finding with real numbers (shipped/placeholder phase counts + per-
layer latency). Live-tested against current stack: 2 info findings,
0 warnings — all shipped layers actually work.
auditor/checks/inference.ts — wraps the run_codereview reviewer
pattern from llm_team_ui.py, adapted for claim-vs-diff verification.
Calls /v1/chat provider=ollama_cloud model=gpt-oss:120b. Requests
strict JSON response with claim_verdicts[] and unflagged_gaps[]. A
strong claim marked "not backed" by cloud → BLOCK severity; moderate
→ warn; weak → info. Cloud-unreachable or unparseable-output → info
(never blocks on the reviewer being down).
Live-tested against PR #1 (this PR, 20 claims, 39KB diff):
- 36.9s round-trip
- 7 block + 23 warn + 2 info findings
- gpt-oss:120b correctly flagged "Fully-functional auditor (tasks
1-9 complete)" as not-backed (only 6/10 tasks done at that
commit) — accurate catch
- Some false positives from the original 15KB truncation threshold
(cloud missed gitea.ts, flagged "no Gitea client present")
- Bumped MAX_DIFF_CHARS from 15000 to 40000 to fit the full PR
diff in context; reviewer precision improves accordingly
Tasks 5 + 6 completed. Remaining: #7 (KB query), #8 (verdict +
Gitea poster), #9 (poller), #10 (end-to-end proof), #12 (upsert
UPDATE-drops-doc_refs).