lakehouse/auditor 1 blocking issue: cloud: claim not backed — "the proven escalation ladder with learning context, collects"
Observed on PR #8 audit (de11ac4): 7 warn findings, all from the
cloud inference check. Investigation showed two distinct bug classes
that weren't "ship bad code", they were "auditor misreads the diff":
1. Cloud flagged "X not defined in this diff / missing implementation"
for symbols like `tailJsonl` and `stubFinding` that ARE defined —
just not in the added lines of this diff. Fix: extract candidate
symbols from the cloud's gap summary, grep the repo for their
definitions (function/const/let/def/class/struct/enum/trait/fn).
If every named symbol resolves, drop the finding; if some do,
demote to info with the resolution in evidence.
2. Cloud flagged runtime metrics like "58 cloud calls, 306s
end-to-end" as unbacked claims. These are empirical outputs
from running the test, not things a static diff can prove.
Fix: claim_parser now has an `empirical` strength class
matching iteration counts, cloud-call counts, duration metrics,
attempt counts, tier-count phrases. Inference drops empirical
claims from its cloud prompt (verifiable[] subset only) and
claim-index mapping uses verifiable[] so cloud responses still
line up.
Added `claims_empirical` to audit metrics so the verdict is
introspectable: how many claims WERE runtime-only vs how many
are diff-verifiable?
Verified: unit tests confirm empirical classification on 5
sample commit messages; symbol resolver found both false-positive
symbols (tailJsonl + stubFinding) and correctly skipped a known-
fake symbol.
All-Bun sub-agent that watches open PRs on Gitea, reads ship-claims,
and hard-blocks merges when the code doesn't back the claim. First
commit of N; this is the skeleton. Dynamic/static/inference/kb checks
+ poller land in follow-up commits on this same branch.
- auditor/types.ts — Claim, Finding, Verdict, PrSnapshot shapes
- auditor/gitea.ts — minimal API client (listOpenPrs, getPrDiff,
postCommitStatus, postReview). Live-proven: returned 0 open PRs
against our repo (which IS the current state — every commit today
went to main directly, which is the problem this auditor is meant
to prevent)
- auditor/policy.ts — stub `assembleVerdict` + severity rules.
Intentionally conservative defaults: strong claim + zero evidence
= block, not warn.
- auditor/README.md — how to run + the hard-block mechanism
Workflow discipline change: starting with this branch, no more
direct pushes to main. Every change lands as a PR. When this
auditor is fully built and running, it'll review its own
completion PR — the recursive self-test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>