Four threads landing together — all driven by the audit J asked for before
production cutover.
(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
flipped from "draft / awaits product" to DECIDED. Consent template + retention
schedule revised to remove all "automated facial-classification" / "deepface"
language so disclosed scope matches implemented scope.
(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
`BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
identityd daemon, never shipped) — corrected to actual shipped routes
`/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
(not the proposed Postgres `subjects` table).
(3) New operational artifacts:
- `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
(manifest cleared, uploads dir empty, audit row matches, chain verified).
Smoke-tested live against WORKER-2.
- `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
destruction-event aggregation. Smoke-tested clean.
- `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
packet with per-file SHA-256 manifest.
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
operationalized after the 2026-05-05 /tmp wipe incident.
- `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
checklist, recommended review sequence.
(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
`verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
no biometric content); audit timeline showed two consecutive uploads followed
by one erasure — the second upload had silently overwritten manifest.data_path,
orphaning the first file. Patched `process_upload` to refuse a second upload
with HTTP 409 + `error: "biometric_already_collected"` when
`biometric_collection.is_some()` on the manifest. Operator must explicitly
POST `/biometric/subject/{id}/erase` first.
Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
pointer unchanged + first file untouched on disk). Replaced
`repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
(covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
`content_type_with_parameters_accepted` to use 2 distinct subjects (was
using 1 subject with 2 uploads to test ct parsing — would now 409).
22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.
Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.
Counsel calendar is now the only remaining blocker for first real-photo intake.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reset gateway audit substrate after /tmp wipe disabled it on reboot:
- LH_SUBJECT_AUDIT_KEY moved /tmp/lakehouse_audit/ → /etc/lakehouse/
(canonical persistent path per spec line 112; /tmp wipes on reboot
and silently disabled /audit + /biometric endpoints)
- Fresh 32B HMAC + 44-char legal token at /etc/lakehouse/, mode 0400
- Systemd drop-in updated; gateway restarted; both endpoints 200
- Pre-rotation chains for WORKER-{1..5} (backfill data) will now
tamper-detect under the new key — expected and correct on rotation
Anchor wave-table backfilled with 3 commits that landed after the
last STATE_OF_PLAY refresh on 2026-05-03 evening:
- 7e0112b: retention_sweep stray indent fix
- 848a458: Phase 1.6 Gate 5 erasure endpoint POST /biometric/.../erase
- 8ec43e0: Phase 1.6 Gate 3b deepface integration design doc
Phase 1.6 status table: Gate 5 → eng-DONE; Gate 3b → design-doc-shipped
(recommends Option C defer). Calendar bottleneck text updated.
.gitignore extended for runtime ephemera that surfaced this session:
- data/biometric/ (BIPA-quarantined photos, regulated data)
- reports/scrum/ (local-only review forensics per feedback_audit_findings_log.md)
- experiments/ (per "experiments stay out of tracked tree" policy)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the 13-commit wave end-to-end:
- SUBJECT_MANIFESTS_ON_CATALOGD spec Steps 1-8 SHIPPED
- 5/7 Phase 1.6 BIPA gates engineering-complete or eng-staged
- 6th cross-runtime parity probe (subject_audit, 6/6 byte-identical)
Status table for Phase 1.6 with evidence pointers per item. Operational
state captures the LH_SUBJECT_AUDIT_KEY + LH_LEGAL_AUDIT_TOKEN_FILE
systemd configuration so next session knows what's in place.
Cataloged the three runtime-divergence classes the parity probe loop
caught + structurally killed (omitempty stripping, time.RFC3339Nano
truncation, json.Marshal HTML-escape). Future Go↔Rust work can
reference these patterns instead of rediscovering them.
Calendar bottleneck is now counsel review of items 1/2/5/6 — engineering
has staged everything it can without legal sign-off. Engineering long
pole is Gate 3b (deepface), deferred for design conversation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five new entries to prevent today's cleanup from being undone by future
sessions or future PRs that don't read the full context:
- PRD line 70 load-bearing — local-only on customer hot path. PR #13's
cloud-routing defaults reverted (d054c0b). Cloud is opt-in dev-only.
- /v1/usage by_provider=ollama is the canary. Anything else for
customer-shape traffic = regression.
- ./scrum is a TOOL, not architecture. Outputs to data/_kb/
scrum_findings.jsonl. Findings inform dev, do NOT auto-fold into
design docs.
- Test code in main is actively being cleaned. Today: 12 files / ~2900
LOC removed (commits 6aafd41 + f4ebd22). Surface more candidates,
don't auto-delete unless clearly orphaned.
The intent: future me (or future Claude session) reads STATE_OF_PLAY
on cold-start, sees these entries, and doesn't re-make the same
mistakes that drifted scope today.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
J flagged that smoke + parity tests prove the surface compiles, NOT
that an audit response can be produced for a specific person — and the
staffing client won't sign without defensible discrimination-claim
response capability.
New docs/AUDIT_TRAIL_PRD.md captures:
- worked example: John Martinez at Warehouse B requests audit
- subject audit response output format (per-decision row schema)
- surface map: where decisions happen today, where the gaps are
- PII handling rules (tokenization, protected-attribute exclusion,
inferred-attribute risk)
- identity service design intent (separate daemon, audited reads)
- retention + right-to-be-forgotten policy intent
- 9-phase implementation sequence with explicit per-phase exit criteria
- cross-runtime requirement (both Rust + Go must satisfy)
- 7 open questions blocking phase 2+ that need J's call
STATE_OF_PLAY + PRD updated with explicit "production-ready blocker"
section pointing at the new doc. The "substrate is shipped" framing
gets a caveat: substrate ≠ production-ready until audit phase 9 exits.
No code changes. This is the planning artifact J asked for before we
start building.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lakehouse/auditor 9 blocking issues: cloud: claim not backed — "Verified end-to-end via playwright on devop.live/lakehouse:"
Anchor was 5 days stale. Adds the 12-commit wave (Lance backend hardening,
sidecar drop, observability parity, gitignore cleanup, gray-zone content
add) with verification status for each. Updates DO NOT RELITIGATE with
the 4 new things this wave makes load-bearing:
- python sidecar dropped from hot path (don't wire it back)
- lance gauntlet shipped (don't re-discover the bugs we just fixed)
- 32/32 cross-runtime parity (don't build a 6th probe for already-covered surface)
- ARCHITECTURE_COMPARISON.md is the single source of truth for cross-runtime decisions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker cards now ship a real photo per person instead of monogram tiles:
- fetch_face_pool.py pulls 1000 faces from thispersondoesnotexist.com
- tag_face_pool.py runs deepface for gender/race/age, excludes <22yo
- manifest.jsonl: 952 servable, gender/race buckets populated
- /headshots/_thumbs/ pre-resized to 384px webp (587KB -> 11KB,
60x smaller; without this Chrome's parallel-connection budget
drops ~75% of tiles in a 40-card grid)
- /headshots/:key gender x race x age intersection bucketing with
gender-only fallback when intersection is sparse
- /headshots/generate/:key ComfyUI on-demand for the contractor
profile spotlight (cold ~1.5s, cached ~1ms; worker-derived
djb2 seed makes faces deterministic-per-worker but unique
across workers sharing the same prompt)
- serve_imagegen.py _cache_key() now includes seed (was caching
by prompt only -> 3 different worker seeds collapsed to 1
cached image; verified fix produces 3 distinct md5s)
- confidence-default name resolution: Xavier->man+hispanic,
Aisha->woman+black, etc. Every worker resolves to a bucket.
End-to-end: playwright run on /?q=forklift+operators+IL -> 21/21
cards loaded, 0 broken, all 384px webp.
Cache + binary pool gitignored; manifest tracked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>