lakehouse/docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md
root b2c34b80b3 phase 1.6: lock Gate 3b = C, reconcile docs to shipped state, fix double-upload file leak
Four threads landing together — all driven by the audit J asked for before
production cutover.

(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
    stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
    flipped from "draft / awaits product" to DECIDED. Consent template + retention
    schedule revised to remove all "automated facial-classification" / "deepface"
    language so disclosed scope matches implemented scope.

(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
    `BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
    references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
    identityd daemon, never shipped) — corrected to actual shipped routes
    `/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
    rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
    (not the proposed Postgres `subjects` table).

(3) New operational artifacts:
    - `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
      (manifest cleared, uploads dir empty, audit row matches, chain verified).
      Smoke-tested live against WORKER-2.
    - `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
      destruction-event aggregation. Smoke-tested clean.
    - `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
      packet with per-file SHA-256 manifest.
    - `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
      operationalized after the 2026-05-05 /tmp wipe incident.
    - `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
      all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
      checklist, recommended review sequence.

(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
    `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
    file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
    no biometric content); audit timeline showed two consecutive uploads followed
    by one erasure — the second upload had silently overwritten manifest.data_path,
    orphaning the first file. Patched `process_upload` to refuse a second upload
    with HTTP 409 + `error: "biometric_already_collected"` when
    `biometric_collection.is_some()` on the manifest. Operator must explicitly
    POST `/biometric/subject/{id}/erase` first.

    Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
    pointer unchanged + first file untouched on disk). Replaced
    `repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
    (covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
    `content_type_with_parameters_accepted` to use 2 distinct subjects (was
    using 1 subject with 2 uploads to test ct parsing — would now 409).

    22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.

Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.

Counsel calendar is now the only remaining blocker for first real-photo intake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:19:40 -05:00

261 lines
13 KiB
Markdown

# Counsel Review Packet — Phase 1.6 BIPA Pre-Launch
**Date assembled:** 2026-05-05
**For:** outside counsel
**From:** J, operator of record
**Scope:** documents that engineering has staged for legal sufficiency review
before the staffing platform begins collecting any real candidate
biometric data (BIPA §15(a)(b)).
> **What this packet is.** The Phase 1.6 BIPA gates outline what
> engineering must ship before real-photo intake. As of 2026-05-05,
> all engineering substrate is shipped and verified live (see §1
> below for the inventory). What remains is binding-text authoring
> + counsel sign-off on five documents, plus operational notification
> obligations counsel may want to layer on top.
>
> **What this packet is NOT.** Not a request for counsel to write
> binding text from scratch. The documents are eng-staged in
> reasonable plain language; the request is for counsel to render
> them into legally-sufficient text and attest where signatures
> are required.
---
## 1. Engineering substrate — shipped + verified
For factual context on what counsel is reviewing AGAINST. None of
this requires sign-off here; it's the system the documents bind to.
| Component | Where it lives | Verification |
|---|---|---|
| Subject manifest registry | `crates/catalogd/src/registry.rs`, `data/_catalog/subjects/<id>.json` | 17 unit tests + 100 backfilled WORKER manifests in production |
| Per-subject HMAC audit chain (SHA-256) | `crates/catalogd/src/subject_audit.rs`, `data/_catalog/subjects/<id>.audit.jsonl` | Tamper-detection + concurrent-append race tests pass |
| Photo upload (consent-gated) | `POST /biometric/subject/{id}/photo` | 11 unit tests + live roundtrip 200 |
| Erasure (two-scope) | `POST /biometric/subject/{id}/erase` (`biometric_only` / `full`) | 21 unit tests; transactional rollback on audit failure |
| Legal-tier audit read | `GET /audit/subject/{id}` (X-Lakehouse-Legal-Token header) | Constant-time auth, chain re-verification per request |
| Retention sweep (BIPA-aware clock) | `crates/catalogd/src/bin/retention_sweep` | 8 unit tests; live verified against 100 backfilled subjects |
| Cross-runtime parity (Rust ↔ Go) | `scripts/cutover/parity/subject_audit_parity.sh` | 6/6 byte-identical assertions pass |
**Key insight for counsel:** the audit chain is the BIPA-defensible
substrate. Every state-changing event (consent given, photo uploaded,
photo erased, legal-tier read) appends to a per-subject HMAC-chained
JSONL log. The chain verifies end-to-end on every legal-tier read.
A tampered chain is detectable; a forged chain requires the HMAC
signing key, which is held under root-only mode 0400 at
`/etc/lakehouse/subject_audit.key` and rotated per the runbook in
attachment §6 below.
**Gate 3b (deepface classification) — decided 2026-05-05: Option C
(defer).** The system collects only the photograph, not derived
demographic information. The consent template + retention schedule
in this packet were revised the same day to match.
---
## 2. Documents requiring counsel review + sign-off
In recommended review order:
| # | Document | Path | Counsel ask | Sign-off |
|---|---|---|---|---|
| A | Biometric Retention Schedule v1 | `docs/policies/consent/biometric_retention_schedule_v1.md` | Render into binding language; confirm 18-month operational ceiling vs. BIPA 3-year statutory cap | Counsel + J |
| B | Biometric Consent Template v1 | `docs/policies/consent/biometric_consent_template_v1.md` | Render Disclosures 1-3 into binding consent language; specify electronic vs. paper signature mechanism | Counsel + J |
| C | BIPA Destruction Runbook | `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` | Confirm 30-day SLA from trigger; confirm two-operator (operator + witness) requirement; confirm legal-hold check procedure | Counsel attestation |
| D | BIPA Pre-IdentityD Attestation | `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` | Sign as countersigning party; J signs as operator-of-record | Counsel + J |
| E | Legal-Tier Audit Key Rotation | `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` | Confirm rotation cadence; opine on candidate-notification obligation when rotation is compromise-driven | Counsel notes |
| F | Gate 3b Deepface Design (FYI) | `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` | Decision-of-record showing classifications were *deliberately deferred*, not omitted by oversight. No sign-off needed; provided for audit-trail completeness. | None |
The five documents requiring sign-off are A, B, C, D, E. Document F
is included so the audit trail shows the Gate 3b decision was
deliberate.
---
## 3. Specific questions for counsel — by document
### Document A — Retention Schedule
1. The schedule sets an **18-month** operational ceiling against the
BIPA 3-year statutory cap. Is the safety margin appropriate, or
should we move to a tighter window (12 months) given the
plaintiff-friendly Illinois posture?
2. The schedule references the **catalogd-local** storage substrate
rather than a separate identityd Postgres table. Does the
public-facing language need to mention the storage architecture
at all, or is "we keep the photo and a SHA-256 hash" sufficient?
3. Public publication URL — counsel to specify (placeholder marked
in §7 of the schedule).
4. Confirm whether existing consent under v1 carries forward when
a future v2 is published, or whether re-consent is required.
### Document B — Consent Template
1. Disclosure 1 says "we do NOT run automated facial-classification
in v1." Does that disclosure need to mention the *possibility* of
future classification, or is silence-with-supersession-clause
adequate?
2. Plain-language summary in §1 — counsel to confirm it's appropriate
to include alongside the binding disclosure, or recommend an
alternative comprehension aid.
3. Withdrawal SLA is set to **30 days** in §2. Counsel to confirm
against jurisdiction (Illinois primary; secondary deployments
would inherit).
4. Contact for withdrawal — counsel to specify the channel
(placeholder in §3).
5. Sign-off mechanism: electronic signature service, in-app
click-acceptance with timestamp, paper form? Each has different
evidentiary weight.
### Document C — Destruction Runbook
1. Confirm 30-day SLA from each of four triggers (retention expiry,
consent withdrawal, RTBF, court order). Some interpretations
prefer 7 or 14 days for withdrawal/RTBF.
2. Two-operator requirement (operator-of-record + witness): is the
witness role acceptable for counsel's defensibility view, or
should we elevate to dual-control with cryptographic split-key?
3. Legal-hold check procedure (§2 step 3) — counsel to specify the
actual procedure for confirming no hold is in force before
erasing.
4. Backup-window disclosure (§4) — confirm 30-day backup retention
is acceptable.
5. Candidate notification template (§3 step 4) — counsel to supply.
### Document D — Pre-IdentityD Attestation
1. Both signature lines blank — J signs as operator-of-record;
counsel signs as the countersigning legal party.
2. The attestation hash anchors the evidence; once signed, the
hash itself becomes a tamper-evident witness. Counsel to confirm
storage location for the signed copy (firm files?).
### Document E — Key Rotation Runbook
1. Recommended rotation cadence — 90 days suggested in §1.
Counsel to confirm or override.
2. Custody schedule for `/etc/lakehouse/_archived/` raw key files —
§7.2 question; suggested 1-year retention but counsel-driven.
3. Candidate-notification obligation when rotation is
compromise-driven (§7.3) — counsel call.
---
## 4. Engineering changes counsel should know about (recent)
These reconciled doc/code drift after a rapid wave on 2026-05-03:
- **Endpoint paths:** the original v1 spec proposed
`/v1/identity/subjects/*` under a separate identityd daemon. That
daemon was collapsed into catalogd; endpoints actually shipped at
`/biometric/subject/*` (catalogd-local). Documents in this packet
reference the catalogd-local routes; legacy references in
`IDENTITY_SERVICE_DESIGN.md` are flagged "do NOT implement
as-written" in that doc's deprecation header.
- **No identityd Postgres database:** the original spec proposed
encrypted-at-rest Postgres + HashiCorp Vault + S3 Object Lock for
PII storage. The shipped substrate is local JSON manifests +
per-subject HMAC-chained JSONL, sized for J's local-only
deployment per `PRD.md` line 70 ("Everything runs locally — no
cloud APIs").
- **Gate 3b deferral (Option C, 2026-05-05):** classifications
(gender / race / age inference) were deliberately deferred. The
consent template and retention schedule in this packet do NOT
disclose collection of derived demographic data, because we are
not collecting it. If a future product requirement reverses this,
we will publish a v2 consent + v2 retention with re-consent.
- **Key rotation 2026-05-05:** the prior `LH_SUBJECT_AUDIT_KEY` was
lost when a `/tmp` wipe on reboot disabled the audit and biometric
endpoints. The new key is at `/etc/lakehouse/subject_audit.key`
(mode 0400). Pre-rotation audit chains tamper-detect under the
new key — this is correct, expected behavior, not a bug.
---
## 5. Open eng items NOT awaiting counsel
For transparency. These are engineering work items, not legal items:
1. **Residual photo unlink on erasure.** During verification of the
one historical erasure event (`WORKER-2`), the verify script
surfaced a stranded photo file that was not unlinked when
`BiometricCollection` was cleared from the manifest. Engineering
investigates; if the bug is real, the fix is `crates/catalogd/src/biometric_endpoint.rs`
in the erasure handler. This does NOT affect the current packet —
no real candidate photos have been collected yet (per §1
attestation), so the residual is from a synthetic test event.
2. **Phase 1.6 §3 employee training.** Currently deferred per
acknowledgement coverage in §7 of the destruction runbook
(single-operator population). Re-promotes to blocking if the
operator population grows; counsel may want to opine on the
threshold.
---
## 6. Sign-off sequence
Recommended order so a hold-up on one doc doesn't block others:
1. **First wave (parallel):** A (retention schedule) + B (consent
template). These two have the tightest interdependence (consent
v1 references retention v1 by hash); review them together.
2. **Second wave:** C (destruction runbook). Depends on A's retention
period being fixed.
3. **Third wave:** D (pre-identityd attestation). Sign once A + B + C
are settled; the attestation snapshot is the boundary between
pre-Phase-1.6 and post-Phase-1.6 system state.
4. **Fourth wave:** E (key rotation). Independent of A-D; can be
reviewed in parallel any time.
---
## 7. After sign-off — engineering steps
Once each document is signed:
| Document | Engineering action | Trigger |
|---|---|---|
| A retention schedule | Hash + commit; reference in `consent_versions` table | Counsel signature |
| B consent template | Hash + commit; reference in candidate-facing intake UI | Counsel signature |
| C destruction runbook | Adopt; operator acknowledgment recorded in §7 | Counsel attestation |
| D pre-identityd attestation | Anchor hash to filesystem + git; counsel keeps original signature | Both signatures |
| E key rotation | Adopt; rotation event log seeded with counsel-approved cadence | Counsel notes |
The HARD blocker for first real-candidate photo collection is
A + B + D signed. C and E are operationally important but do not
block the *first* photo (they govern destruction + key handling
which apply to any state, not the boundary state).
---
## 8. Cover-note hash
This packet is itself a snapshot. Future-Claude / future-J will refer
back to this packet to know what counsel saw on 2026-05-05.
**Packet attached files (referenced by path):**
- `docs/policies/consent/biometric_retention_schedule_v1.md`
- `docs/policies/consent/biometric_consent_template_v1.md`
- `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`
- `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md`
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md`
- `docs/specs/GATE_3B_DEEPFACE_DESIGN.md`
- `docs/PHASE_1_6_BIPA_GATES.md` (the spec they all reference)
Per-file SHA-256 hashes are produced by the bundler script (next
section); the bundler also creates a tarball ready for transmission.
---
## 9. Generating the bundle for transmission
```bash
./scripts/staffing/bundle_counsel_packet.sh
```
Produces `reports/counsel/counsel_packet_<DATE>.tar.gz` with all
referenced documents + a manifest listing per-file SHA-256 hashes.
Counsel can verify file integrity on receipt by re-running
sha256sum against each file in the tarball.