Auditor: PR-claim hard-block reviewer (scaffold) #1
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "auditor/scaffold"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Intent
All-Bun sub-agent that watches open PRs, reads ship-claims in commit
messages + PR bodies, runs four checks (static diff / dynamic hybrid
test / cloud inference / KB query), and hard-blocks merges when the
code doesn't back the claim.
What this PR contains
Scaffold only — types, Gitea client (live-proven), policy stub, README.
What this PR explicitly does NOT claim to do yet
Why this PR exists at all
Prior 9 commits in today's session went directly to
main. That'sexactly the placeholder-code problem this auditor is meant to prevent.
Starting with this branch, no more direct pushes. Every change lands
as a PR. Once this auditor is complete, it'll review its own remaining
completion PRs before merge — recursive self-test.
Merge criteria (what I should defend)
not a rubber-stamp
blocks — proving it actually catches the problem
Merging without these is exactly the kind of 'clicking past' the
auditor is designed to prevent.
🤖 Generated with Claude Code
auditor/claim_parser.ts — reads PR body + commit messages, extracts ship-claims. Regex-based, intentionally not LLM-driven: the parser's job is to surface claim substrates, not to judge them (that's the inference check's job, runs later with cloud model). Three strength tiers: - strong — "verified end-to-end", "live-proven", "production-ready", "phase N shipped", "proven" - moderate — "shipped", "landed", "green", "passing", "works", "complete", "done" - weak — "should work", "expected to", "probably" Live-proven against PR #1 (this PR): 4 claims extracted from 1 commit (2 strong, 2 moderate). "live-proven" correctly tagged as strong (it IS a stronger claim than "shipped"). Next: static diff check consumes these claims + the PR diff to find placeholder patterns — empty fns, TODO, unwired fields, etc.auditor/checks/static.ts — grep-style scan of PR diffs, no AST, no LLM. High-signal patterns only. Severity grading: - BLOCK — unimplemented!(), todo!(), panic!("not implemented"), throw new Error("not implemented") - WARN — TODO/FIXME/XXX/HACK in added lines; new pub struct fields with <2 mentions in the diff (added but nobody reads it — placeholder state) - INFO — hardcoded "placeholder"/"dummy"/"foobar"/"changeme"/"xxx" strings in added lines Live-proven — the existential test J asked for: vs PR #1 (scaffold): 0 findings (all scaffold fields cross- reference within the diff) vs commit 2a4b81b (Phase 5 WARN: every DocRef field (tool, 45 first slice — I version_seen, snippet_hash, source_url, half-admitted placeholder): seen_at) added with 0 read-sites in the diff That's the auditor flagging my own "Phase 45 first slice" commit as state-without-consumer, which is exactly what I half-admitted it was. If PR #1 had required auditor-pass (branch protection), the DocRef commit would have been blocked pre-merge. The auditor works because it agreed with the honest read. Next: dynamic hybrid test fixture (task #4) — the never-run multi- layer pipeline test.auditor/fixtures/hybrid_38_40_45.ts — the never-before-run hybrid test. Exercises Phase 38 /v1/chat → Phase 40 Langfuse → Phase 45 slice 1 seed+doc_refs → Phase 45 slice 2 bridge drift → (expected- fail) Phase 45 slice 3 drift-check endpoint. auditor/fixtures/cli.ts — standalone runner. Human-readable summary to stderr, machine-readable JSON to stdout, exit code 0/1/2 for pass / fail / partial_pass. Live run results — honest measurements, not hand-waved: ✓ Phase 38 /v1/chat returns 9 visible tokens, 6.7s latency ("docker run is a common Docker command.") ✓ Phase 40 Langfuse trace 18a8a0b7 landed in 2.5s ✗ Phase 45.1 seed endpoint returns empty reply — discovered a PRE-EXISTING BUG unrelated to doc_refs: playbook_memory.rs:257 UpsertOutcome has newtype variants Added(String) and Noop(String) under #[serde(tag="mode")] — serde panics on serialize. panicked at crates/vectord/src/service.rs:2323: Error("cannot serialize tagged newtype variant UpsertOutcome::Added containing a string") Reproduced: curl /seed with AND without doc_refs both get "Empty reply from server" (socket closed mid-response). This bug has existed since Phase 26 shipped (commit 640db8c, 2026-04-21). No test or caller in the repo exercised the response path live against the gateway until this fixture did. ✓ Phase 45.2 context7 bridge confirms drift: current hash 475a0396ca436bba vs our stale input, upstream last updated 2026-04-20 ✗ Phase 45.3 /doc_drift/check endpoint — correctly unreachable because layer 3 blocked us from getting a playbook_id; endpoint still doesn't exist independent of that Real numbers published: per-layer latency_ms, token counts, trace_age_ms, library_id, current_hash_length. All stored in the JSON output for downstream audit. Value delivered: the fixture's first live run found a bug that unit tests, compile checks, and my own "phase shipped" commits all missed. Exactly the gap J called out — the auditor is doing what it's supposed to do. Bug fix is a SEPARATE concern: new task #11 tracks a separate PR (fix/upsert-outcome-serde) so the audit finding and the fix stay cleanly attributed.Caught by running a side-test through LLM Team's run_codereview flow (gpt-oss:120b reviewer) against this fixture, 2026-04-22. BEFORE: const ourStart = Date.parse( l1.evidence.match(/tokens=/) ? result.ran_at : result.ran_at ); // Both branches return result.ran_at — the ternary is meaningless. // result.ran_at is the fixture start time, NOT the moment we fired // /v1/chat. Any trace created between fixture-start and chat-fetch // would false-negative. AFTER: const chat_request_sent_ms = Date.now(); // captured before layer 1 // ... const recent = items.filter(t => Date.parse(t.timestamp) >= chat_request_sent_ms ); Re-ran the fixture against the live stack — layers 1,2,4 still pass (no regression); layer 2 trace matched at age=2494ms which is within the chat-to-trace propagation window. Layers 3,5 still fail for the original unrelated reasons (UpsertOutcome serde panic + Phase 45 slice 3 endpoint not built). First concrete act-on-finding from a code-checker run. The process works.Auditor verdict: ✅
approveOne-liner: all checks passed (4 findings, all info)
Head SHA:
039ed3241111Audited at: 2026-04-22T09:01:49.718Z
static — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — suspicious hardcoded string in auditor/checks/static.ts
auditor/checks/static.ts:+16: // info — hardcoded "test" / "dummy" / "placeholder" strings indynamic — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — dynamic check skipped — skipped by options
skipped by optionsinference — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — cloud returned unparseable output — skipped
head: { "claim_verdicts": [ {"claim_idx": 0, "backed": false, "evidence": "README added, but types.ts, Gitea client (gitea.ts) and policy.ts are missing"}, {"claim_idx":tokens: 16103kb_query — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — KB: 69 recent scenario runs, 209/289 events ok (fail rate 27.7%)
most recent: scenario-2026-04-21T05-29-34recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009Metrics
Lakehouse auditor · SHA
039ed324· re-audit on new commit flips the status automatically.Auditor verdict: ✅
approveOne-liner: all checks passed (4 findings, all info)
Head SHA:
c33c1bcbc573Audited at: 2026-04-22T09:07:57.598Z
static — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — suspicious hardcoded string in auditor/checks/static.ts
auditor/checks/static.ts:+16: // info — hardcoded "test" / "dummy" / "placeholder" strings indynamic — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — dynamic check skipped — skipped by options
skipped by optionsinference — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — cloud returned unparseable output — skipped
head: { "claim_verdicts": [ { "claim_idx": 0, "backed": true, "evidence": "README added and scaffold files (audit.ts, policy stub, Gitea client imports) are present in the ditokens: 16178kb_query — 1 findings (0 block, 0 warn, 1 info)
ℹ️ info — KB: 69 recent scenario runs, 209/289 events ok (fail rate 27.7%)
most recent: scenario-2026-04-21T05-29-34recent failing sigs: 5745bcd5e4c68591, 5745bcd5e4c68591, caeeeffc69d36009Metrics
Lakehouse auditor · SHA
c33c1bcb· re-audit on new commit flips the status automatically.