6 Commits

Author SHA1 Message Date
root
fcd53168a0 phase 1.6: counsel handoff turnkey + seed_consent_version.sh + strict mode live
The remaining production blocker is counsel-calendar bottleneck
(review + sign-off). Engineering can't make counsel move faster,
but it CAN reduce the round-trip overhead:

(1) docs/counsel/COUNSEL_HANDOFF_EMAIL_2026-05-05.md — copy-paste
    email body J can send to outside counsel. Subject line + body
    + tarball attachment instructions + headline asks (A/B/C/D
    in priority order) + post-signature operator runbook. The
    pre-flight checklist + post-signature workflow turn what
    would have been "I'll figure out the email" into "click send."

(2) scripts/staffing/seed_consent_version.sh — turnkey
    post-signature deployment. Takes the path to a (presumably
    counsel-signed) consent template markdown, computes SHA-256,
    atomically merges into /etc/lakehouse/consent_versions.json
    (creating the file if absent, with per-seed audit metadata
    in _meta.seeded_at[]), restarts lakehouse.service, probes
    /biometric/health post-restart. Idempotent: re-running with
    the same hash is a no-op for the versions array but still
    appends a [reseed] entry to the audit metadata.
    Verified live against the eng-staged template — strict mode
    flipped clean, /biometric/health 200 post-restart.

(3) docs/PHASE_1_6_BIPA_GATES.md §6.5 — post-signature deployment
    runbook embedded in the gates doc. Three steps: counsel signs
    + commits → seed_consent_version.sh → strict-mode probe.
    Plus a "pre-counsel demo seed" subsection documenting how to
    exercise strict mode BEFORE counsel signs (using the
    eng-staged template hash) so the deployment workflow is
    proven before the legal critical path closes.

Strict mode flipped live — verified post-restart:
- /etc/lakehouse/consent_versions.json populated with the
  eng-staged template hash:
  8b09591a8dc15f59197affac48909ce943d575eee01705b42303acf3b32f5c56
- POST /biometric/subject/WORKER-1/consent with deadbeef hash:
  HTTP 400 + error="consent_version_unknown"
- POST with the known eng-staged hash: passes version check
  (then 404 subject_not_found on a ghost candidate, proving
  the gate is hash-aware not auth-broken)

The hash currently seeded is the ENG-STAGED template
(pre-counsel-signature). When counsel returns the signed text,
operator runs `seed_consent_version.sh` again with the
counsel-signed markdown — the new hash gets appended; the demo
hash stays in for backwards-compat with any consent records
collected during the pre-counsel demo period (none, today).

Production blocker is now genuinely just counsel calendar:
1. J transmits reports/counsel/counsel_packet_2026-05-05.tar.gz
   per the handoff email
2. Counsel reviews + signs (their billable time)
3. Counsel returns signed text → operator runs seed script
4. Strict mode flips to canonical hash → cutover complete

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:32:16 -05:00
root
b2c34b80b3 phase 1.6: lock Gate 3b = C, reconcile docs to shipped state, fix double-upload file leak
Four threads landing together — all driven by the audit J asked for before
production cutover.

(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
    stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
    flipped from "draft / awaits product" to DECIDED. Consent template + retention
    schedule revised to remove all "automated facial-classification" / "deepface"
    language so disclosed scope matches implemented scope.

(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
    `BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
    references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
    identityd daemon, never shipped) — corrected to actual shipped routes
    `/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
    rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
    (not the proposed Postgres `subjects` table).

(3) New operational artifacts:
    - `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
      (manifest cleared, uploads dir empty, audit row matches, chain verified).
      Smoke-tested live against WORKER-2.
    - `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
      destruction-event aggregation. Smoke-tested clean.
    - `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
      packet with per-file SHA-256 manifest.
    - `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
      operationalized after the 2026-05-05 /tmp wipe incident.
    - `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
      all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
      checklist, recommended review sequence.

(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
    `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
    file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
    no biometric content); audit timeline showed two consecutive uploads followed
    by one erasure — the second upload had silently overwritten manifest.data_path,
    orphaning the first file. Patched `process_upload` to refuse a second upload
    with HTTP 409 + `error: "biometric_already_collected"` when
    `biometric_collection.is_some()` on the manifest. Operator must explicitly
    POST `/biometric/subject/{id}/erase` first.

    Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
    pointer unchanged + first file untouched on disk). Replaced
    `repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
    (covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
    `content_type_with_parameters_accepted` to use 2 distinct subjects (was
    using 1 subject with 2 uploads to test ct parsing — would now 409).

    22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.

Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.

Counsel calendar is now the only remaining blocker for first real-photo intake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:19:40 -05:00
root
f1fa6e4e61 phase 1.6 Gate 3a: photo upload endpoint with consent gate
Per docs/PHASE_1_6_BIPA_GATES.md §1 Gate 3 (consent-gate substrate).
Deepface classification (Gate 3b) deferred to its own session — needs
Python subprocess design conversation after the 2026-05-02 sidecar drop.

What ships:

  shared/types.rs:
    - new BiometricCollection sub-struct: data_path, template_hash,
      collected_at, consent_version_hash, classifications (Option<JSON>)
    - SubjectManifest gains biometric_collection: Option<BiometricCollection>
      with #[serde(default)] so existing on-disk manifests parse and
      re-emit without drift

  catalogd/biometric_endpoint.rs (NEW, ~600 LOC):
    POST /subject/{candidate_id}/photo
      - Auth: X-Lakehouse-Legal-Token, constant-time-eq compared against
        same legal token file as /audit. Same 32-byte minimum.
      - Content-Type: must be image/jpeg or image/png (415 otherwise)
      - Body: raw image bytes, max 10MB
      - 401: missing or wrong token
      - 404: subject not registered
      - 403: consent.biometric.status != "given" (returns current status)
      - 403: subject status in {Withdrawn, Erased, RetentionExpired}
      - 200: writes photo to data/biometric/uploads/<sanitized_id>/<ts>.<ext>
        with mode 0700 dir + 0600 file, updates SubjectManifest with
        BiometricCollection record, appends audit row
        (kind="biometric_collection", purpose="photo_upload"), returns
        UploadResponse with template_hash + audit_row_hmac.

    Logic split: pure async fn process_upload() takes the headers-as-args
    so unit tests exercise every branch without HTTP machinery; the
    axum handler is just glue. 10 tests covering all 4 reject paths +
    happy path + repeated uploads chaining + structural assertion that
    the quarantine path is NOT under data/headshots/ (synthetic faces).

  gateway/main.rs:
    Mounts /biometric on the same condition as /audit — only when the
    SubjectAuditWriter is present AND the legal token loads. Storage
    root configurable via LH_BIOMETRIC_STORAGE_ROOT (default
    ./data/biometric/uploads).

Live verification on the running gateway (post-restart):
  - GET  /biometric/health          → "biometric endpoint ready"
  - POST without token              → 401 auth_failed
  - POST with token, no consent     → 403 consent_required (status=NeverCollected)
  - Flipped WORKER-2 to consent=given, POST → 200 with hash + path
  - File at data/biometric/uploads/WORKER-2/<ts>.jpg, mode 0600
  - Manifest biometric_collection field reflects the upload
  - Audit row chain links cleanly off the prior validator_lookup row
  - GET /audit/subject/WORKER-2 returns chain_verified=true, 2 rows
  - Cross-runtime parity probe still 6/6 byte-identical post-change

Phase 1.6 status table updated: Gate 3a DONE, Gate 3b (deepface)
deferred. Calendar bottleneck remains counsel review of items 1/2/5/6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:55:32 -05:00
root
c7aa607ae4 phase 1.6 BIPA: scrum-driven fixes
Per 2026-05-03 phase_1_6_bipa_gates scrum (13 findings, 0 convergent).
1 BLOCK verified false positive, 4 real fixes shipped:

False positive (verified):
- opus BLOCK on attest:55 — claimed `set -uo pipefail` without `-e`
  makes the post-python3 `if [ $? -ne 0 ]` check unreachable. Verified
  WRONG: `X=$(false); echo $?` prints 1. Bash propagates command-
  substitution exit through $? on the assignment line. The check IS
  the python3 exit gate. Inline comment added to the script noting
  the false positive so future scrums don't re-flag.

Real fixes:
1. opus WARN attestation:18 — schema fingerprint hashed names ONLY,
   missing column-type changes. A column repurposed to hold base64
   photo bytes under its existing name would pass undetected. Now
   hashes "name<TAB>type<TAB>nullable=bool" per row. Re-run produced
   evidence SHA-256 1fdcc9f1... (vs old 230fffeb..., reflecting the
   broader fingerprint scope).

2. opus WARN gate_4_test:60 — definition regex didn't catch
   object-literal property forms (`const t = { FEMALE_NAMES: [...] }`)
   or TypeScript class fields (`class L { public NAMES_X: string[] = [] }`).
   Added two new patterns + a regression test
   (Gate 4: object-literal and class-field bypasses are caught) that
   exercises 5 bypass forms. 4/4 tests green; 1 minor regex tweak
   needed mid-fix to handle single-line class bodies.

3. kimi WARN python3-reliance — script assumed pyarrow installed and
   would emit a stack trace into the attestation if not. Added
   `python3 -c "import pyarrow"` gate at top with clean install
   instructions on failure.

4. opus INFO PHASE_1_6:200 — item 7 (training) silently dropped from
   blocking set with bare "deferred" rationale. Now explicitly states
   the deferral is conditional on small operator population (J + 1-2
   named ops); item 7 re-promotes to blocking if population grows.
   ⚖ COUNSEL marker added.

Skipped (acceptable as ⚖ COUNSEL placeholders by design):
- kimi WARN consent template:30-day-SLA (counsel decides number)
- kimi WARN consent template:email-placeholder (counsel supplies)
- kimi WARN parquet absence (env override exists; redeployment-aware)
- kimi INFO runbook manual-erasure (marked TODO when /erase ships)
- qwen INFO doc path/status nits (already addressed by file moves)

Tests: 4/4 Gate 4 absence test (incl. new bypass-coverage), 3/3
attestation evidence checks pass on live data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:43:17 -05:00
root
4708717f6b phase 1.6 BIPA gates — engineering wave (4 of 7 staged)
Per docs/PHASE_1_6_BIPA_GATES.md. Status table now reflects:

  DONE (engineering-only, no counsel dependency):
  - Gate 4: name→ethnicity inference removed from mcp-server.
    Removal note in search.html:3372 + new Bun absence test
    (mcp-server/phase_1_6_gate_4.test.ts) with 3 assertions:
    walker actually scans files, regex catches synthetic positives,
    no offending DEFINITION patterns in any .html/.ts/.js source.
    3/3 pass.

  ENG-DONE, signature pending:
  - §2 attestation: scripts/staffing/attest_pre_identityd_biometric_state.sh
    runs three checks against the live state:
      1. workers_500k.parquet schema has no biometric/photo/face/image col
      2. data/_kb/*.jsonl + pathway state contain no base64 image magic
         bytes (JPEG /9j/, PNG iVBOR), no data:image/* MIME prefixes,
         no field-name patterns ("photo", "biometric", "deepface_*")
      3. data/headshots/manifest.jsonl is entirely synthetic-tagged
    3/3 evidence checks pass on the live data dir. Generates a
    signed-by-operator+counsel attestation document committed at
    docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md
    with SHA-256 of the evidence summary so post-signature tampering
    is detectable.

  ENG-STAGED, awaiting counsel review:
  - Gate 1 retention schedule scaffold at
    docs/policies/consent/biometric_retention_schedule_v1.md (BIPA
    §15(a)). Engineering facts (categories, 18-month operational
    ceiling vs 3-year statutory cap, destruction procedure pointer
    to Gate 5 runbook) plus ⚖ COUNSEL markers for the binding text.
  - Gate 2 consent template scaffold at
    docs/policies/consent/biometric_consent_template_v1.md (BIPA
    §15(b)(1)-(3)). Required disclosures + plain-language summary +
    withdrawal procedure + the structured fields the consent UI must
    post to identityd.
  - Gate 5 destruction runbook at docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md.
    Triggers, pre-destruction checks (incl. chain-verified gate via
    /audit/subject/{id}), procedure (legal-tier endpoint), automatic
    audit row append (subject_audit.v1 with kind=biometric_erasure),
    backup-window disclosure, monthly reporting cadence, audit-trail
    attestation procedure cross-referencing the cross-runtime parity
    probe.

  BLOCKED on engineering design:
  - Gate 3 photo-upload endpoint. Requires identityd photo intake
    design + deepface integration scope. Deferred to its own session.

  DEFERRED:
  - §3 employee training material. Gate 5 runbook §7 may serve as
    substrate; counsel decides whether a separate program is needed.

Calendar bottleneck is now counsel review. Engineering can stage no
further deliverables until either (a) Gate 3's design conversation
happens or (b) counsel completes review of items 1/2/5/6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:38:49 -05:00
root
cd440d4cee audit phase 1.6: BIPA pre-launch gates — block identity-service backfill
Per IDENTITY_SERVICE_DESIGN v3 §5 Step 0, Phase 1.6 is hard
prerequisite to identityd backfill. This doc specifies the 5 gates +
2 supporting deliverables that must ship before real-photo intake.

Five gates (BIPA §15 compliance):
1. Public retention schedule — counsel writes; engineering files+hash
2. Informed written consent — counsel writes template; engineering
   wires identityd consent-status enforcement
3. Photo-upload endpoint with consent enforcement — POST /v1/identity/
   subjects/{id}/photo with hard 403 when biometric_consent_status
   != 'given'; quarantined storage path; deepface output isolated
   to identityd subjects table (not synthetic-face manifest)
4. Deprecate name → ethnicity inference (mcp-server/search.html
   lookup tables removed; Phase 1.5 §1B finding closed)
5. Destruction runbook — operator-facing; ties to identityd
   /erase endpoint with biometric-specific erasure path; daily
   sweep job for biometric_retention_until expiry

Plus:
- Cryptographic attestation that no biometric data exists
  pre-identityd (per v3-B11) — defends against
  infrastructure-as-notice plaintiff argument
- Employee BIPA-handling training acknowledgment

Engineering effort: ~4-5 days (one week to stage everything ready).
Counsel effort: ~3-6 weeks calendar (review cycles dominate).
Calendar bottleneck is counsel, not engineering.

Phase 1.6 exit = 7 checked gates + signoffs. Until done, identityd
backfill cannot proceed (per identity service design v3 §5 Step 0).

5 open questions for J + counsel: photo-upload UX, consent
mechanism (DocuSign/click/paper), named operator list, named
counsel for sign-off, public privacy policy URL.

No code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:41:29 -05:00