9-run empirical test showed 20 of 27 audit_lessons signatures were
singletons (count=1) — the cloud producing slightly-different summary
phrasings for the SAME underlying claim on each audit, each hashing
to a fresh signature. That's the creep J flagged — not explosive,
but steady ~2 new sigs per run, unbounded over hundreds of runs.
Root cause: temperature=0.2 + think=true was letting variable prose
leak into the classification output. Fix: temp=0 (greedy sample →
identical input yields identical output on same model version),
think=false (no reasoning trace variance), max_tokens 3000→1500
(tighter bound prevents tail wander).
The compounding policy itself was validated by the 9 runs:
- 7 recurring claims (the legitimate signals) all at conf 0.08-0.20
- ratingSeverity() correctly held them at info (below 0.3 threshold)
- cross-PR signal test separately confirmed conf=1.00 → sev=block
Also: LH_AUDIT_RUNS env so the test can validate with smaller N.