2026-05-03 cross-lineage scrum on the subjects_steps_1_to_4 wave
returned 14 distinct findings, 0 convergent. opus verdict was HOLD
with 3 BLOCKs around the audit-chain integrity. All real. Fixed:
──────────────────────────────────────────────────────────────────
BLOCK 1 — opus subject_audit.rs:172 + execution_loop.rs:391
Concurrency race: append_line is read-modify-write; the gateway
hook used tokio::spawn fan-out → two concurrent appends to the
same subject both read the same prev_hash, both compute their
HMAC from the same prev, second write silently overwrites first
→ row lost AND chain broken.
Fix:
- SubjectAuditWriter gains per-subject Mutex map. append() acquires
the subject's lock for the duration of the read-modify-write.
Different subjects still parallelize.
- Gateway hook switches from tokio::spawn to inline await. Per-row
cost is ~1ms (one object_store put); inline is correct AND cheap.
- New regression test: 50 concurrent appends to the same subject,
asserts all 50 land with intact chain.
BLOCK 2 — opus subject_audit.rs:108
Non-deterministic canonicalization: serde_json serializes struct
fields in declaration order. Schema evolution (adding/reordering
fields) silently changes the bytes verify_chain hashes → chain
breaks even when nothing was actually tampered with.
Fix:
- New canonical_json() free fn — recursive value rewrite to sort
object keys alphabetically (BTreeMap projection), arrays preserve
order, scalars pass through. Stable across struct evolution.
- Both append() and verify_chain() now compute HMAC over canonical
bytes, not declaration-order bytes.
- New regression tests: alphabetical-key + array-order-preserved.
WARN — opus execution_loop:401
Audit row's `result` was hardcoded to "success" for every Ok(result)
including payloads like {"error":"not found"}. Misleads compliance.
Fix:
- New audit_result_state() free fn that inspects the payload
top-level for error/denied/not_found/status signals (per spec
§3.2 enum). Defaults to "success" only when no error signal.
- 4 new tests covering each enum case + falsy-signals defense.
WARN — opus registry.rs:735
Storage-key collision: sanitize_view_name(id) is the disk key,
but the in-memory HashMap was keyed by raw candidate_id. Two
distinct ids that sanitize to the same key (e.g. "CAND/1" and
"CAND_1") would collide on disk while appearing distinct in
memory; second put silently overwrites first; rebuild loads only
one.
Fix:
- put_subject() / get_subject() / delete_subject() / rebuild()
all key the in-memory HashMap by sanitize_view_name(id), matching
the storage key shape.
- Collision guard: put_subject() refuses (with clear error) when
the sanitized key matches an EXISTING subject with a DIFFERENT
raw candidate_id.
- New regression test: put("CAND/1") then put("CAND_1") errors
+ first subject survives.
WARN — opus backfill_subjects.rs:189
trim_start_matches strips REPEATED prefixes; the spec wanted
one-shot semantics. Edge case unlikely in practice but real.
Fix:
- Switched to strip_prefix(&prefix).unwrap_or(&cid). One-shot.
INFO — opus subject_audit.rs:131
Per-byte format!("{:02x}", b) allocates each iteration. Hot path
on every append.
Fix:
- Replaced with const HEX lookup table + push() into preallocated
String. Same output bytes, no per-byte allocation.
──────────────────────────────────────────────────────────────────
Test summary post-fix:
catalogd subject_audit: 11/11 PASS (added 4 new — concurrency
race regression, parallel-different-subjects,
canonical-key sort, canonical-array order)
catalogd registry subject: 6/6 PASS (added 1 new — collision guard)
gateway execution_loop subject: 10/10 PASS (added 4 new —
audit_result_state enum coverage)
All 27 subject-related tests green. cargo build --release clean.
The convergent-zero scrum result was misleading on its face — opus
caught real BLOCKs that kimi/qwen missed. Per
feedback_cross_lineage_review.md: opus is the load-bearing reviewer;
single-opus BLOCKs warrant manual verification, which here confirmed
all three were correct.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>