lakehouse

Author	SHA1	Message	Date
root	f1fa6e4e61	phase 1.6 Gate 3a: photo upload endpoint with consent gate Per docs/PHASE_1_6_BIPA_GATES.md §1 Gate 3 (consent-gate substrate). Deepface classification (Gate 3b) deferred to its own session — needs Python subprocess design conversation after the 2026-05-02 sidecar drop. What ships: shared/types.rs: - new BiometricCollection sub-struct: data_path, template_hash, collected_at, consent_version_hash, classifications (Option<JSON>) - SubjectManifest gains biometric_collection: Option<BiometricCollection> with #[serde(default)] so existing on-disk manifests parse and re-emit without drift catalogd/biometric_endpoint.rs (NEW, ~600 LOC): POST /subject/{candidate_id}/photo - Auth: X-Lakehouse-Legal-Token, constant-time-eq compared against same legal token file as /audit. Same 32-byte minimum. - Content-Type: must be image/jpeg or image/png (415 otherwise) - Body: raw image bytes, max 10MB - 401: missing or wrong token - 404: subject not registered - 403: consent.biometric.status != "given" (returns current status) - 403: subject status in {Withdrawn, Erased, RetentionExpired} - 200: writes photo to data/biometric/uploads/<sanitized_id>/<ts>.<ext> with mode 0700 dir + 0600 file, updates SubjectManifest with BiometricCollection record, appends audit row (kind="biometric_collection", purpose="photo_upload"), returns UploadResponse with template_hash + audit_row_hmac. Logic split: pure async fn process_upload() takes the headers-as-args so unit tests exercise every branch without HTTP machinery; the axum handler is just glue. 10 tests covering all 4 reject paths + happy path + repeated uploads chaining + structural assertion that the quarantine path is NOT under data/headshots/ (synthetic faces). gateway/main.rs: Mounts /biometric on the same condition as /audit — only when the SubjectAuditWriter is present AND the legal token loads. Storage root configurable via LH_BIOMETRIC_STORAGE_ROOT (default ./data/biometric/uploads). Live verification on the running gateway (post-restart): - GET /biometric/health → "biometric endpoint ready" - POST without token → 401 auth_failed - POST with token, no consent → 403 consent_required (status=NeverCollected) - Flipped WORKER-2 to consent=given, POST → 200 with hash + path - File at data/biometric/uploads/WORKER-2/<ts>.jpg, mode 0600 - Manifest biometric_collection field reflects the upload - Audit row chain links cleanly off the prior validator_lookup row - GET /audit/subject/WORKER-2 returns chain_verified=true, 2 rows - Cross-runtime parity probe still 6/6 byte-identical post-change Phase 1.6 status table updated: Gate 3a DONE, Gate 3b (deepface) deferred. Calendar bottleneck remains counsel review of items 1/2/5/6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:55:32 -05:00
root	c7aa607ae4	phase 1.6 BIPA: scrum-driven fixes Per 2026-05-03 phase_1_6_bipa_gates scrum (13 findings, 0 convergent). 1 BLOCK verified false positive, 4 real fixes shipped: False positive (verified): - opus BLOCK on attest:55 — claimed `set -uo pipefail` without `-e` makes the post-python3 `if [ $? -ne 0 ]` check unreachable. Verified WRONG: `X=$(false); echo $?` prints 1. Bash propagates command- substitution exit through $? on the assignment line. The check IS the python3 exit gate. Inline comment added to the script noting the false positive so future scrums don't re-flag. Real fixes: 1. opus WARN attestation:18 — schema fingerprint hashed names ONLY, missing column-type changes. A column repurposed to hold base64 photo bytes under its existing name would pass undetected. Now hashes "name<TAB>type<TAB>nullable=bool" per row. Re-run produced evidence SHA-256 1fdcc9f1... (vs old 230fffeb..., reflecting the broader fingerprint scope). 2. opus WARN gate_4_test:60 — definition regex didn't catch object-literal property forms (`const t = { FEMALE_NAMES: [...] }`) or TypeScript class fields (`class L { public NAMES_X: string[] = [] }`). Added two new patterns + a regression test (Gate 4: object-literal and class-field bypasses are caught) that exercises 5 bypass forms. 4/4 tests green; 1 minor regex tweak needed mid-fix to handle single-line class bodies. 3. kimi WARN python3-reliance — script assumed pyarrow installed and would emit a stack trace into the attestation if not. Added `python3 -c "import pyarrow"` gate at top with clean install instructions on failure. 4. opus INFO PHASE_1_6:200 — item 7 (training) silently dropped from blocking set with bare "deferred" rationale. Now explicitly states the deferral is conditional on small operator population (J + 1-2 named ops); item 7 re-promotes to blocking if population grows. ⚖ COUNSEL marker added. Skipped (acceptable as ⚖ COUNSEL placeholders by design): - kimi WARN consent template:30-day-SLA (counsel decides number) - kimi WARN consent template:email-placeholder (counsel supplies) - kimi WARN parquet absence (env override exists; redeployment-aware) - kimi INFO runbook manual-erasure (marked TODO when /erase ships) - qwen INFO doc path/status nits (already addressed by file moves) Tests: 4/4 Gate 4 absence test (incl. new bypass-coverage), 3/3 attestation evidence checks pass on live data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:43:17 -05:00
root	4708717f6b	phase 1.6 BIPA gates — engineering wave (4 of 7 staged) Per docs/PHASE_1_6_BIPA_GATES.md. Status table now reflects: DONE (engineering-only, no counsel dependency): - Gate 4: name→ethnicity inference removed from mcp-server. Removal note in search.html:3372 + new Bun absence test (mcp-server/phase_1_6_gate_4.test.ts) with 3 assertions: walker actually scans files, regex catches synthetic positives, no offending DEFINITION patterns in any .html/.ts/.js source. 3/3 pass. ENG-DONE, signature pending: - §2 attestation: scripts/staffing/attest_pre_identityd_biometric_state.sh runs three checks against the live state: 1. workers_500k.parquet schema has no biometric/photo/face/image col 2. data/_kb/.jsonl + pathway state contain no base64 image magic bytes (JPEG /9j/, PNG iVBOR), no data:image/ MIME prefixes, no field-name patterns ("photo", "biometric", "deepface_*") 3. data/headshots/manifest.jsonl is entirely synthetic-tagged 3/3 evidence checks pass on the live data dir. Generates a signed-by-operator+counsel attestation document committed at docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md with SHA-256 of the evidence summary so post-signature tampering is detectable. ENG-STAGED, awaiting counsel review: - Gate 1 retention schedule scaffold at docs/policies/consent/biometric_retention_schedule_v1.md (BIPA §15(a)). Engineering facts (categories, 18-month operational ceiling vs 3-year statutory cap, destruction procedure pointer to Gate 5 runbook) plus ⚖ COUNSEL markers for the binding text. - Gate 2 consent template scaffold at docs/policies/consent/biometric_consent_template_v1.md (BIPA §15(b)(1)-(3)). Required disclosures + plain-language summary + withdrawal procedure + the structured fields the consent UI must post to identityd. - Gate 5 destruction runbook at docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md. Triggers, pre-destruction checks (incl. chain-verified gate via /audit/subject/{id}), procedure (legal-tier endpoint), automatic audit row append (subject_audit.v1 with kind=biometric_erasure), backup-window disclosure, monthly reporting cadence, audit-trail attestation procedure cross-referencing the cross-runtime parity probe. BLOCKED on engineering design: - Gate 3 photo-upload endpoint. Requires identityd photo intake design + deepface integration scope. Deferred to its own session. DEFERRED: - §3 employee training material. Gate 5 runbook §7 may serve as substrate; counsel decides whether a separate program is needed. Calendar bottleneck is now counsel review. Engineering can stage no further deliverables until either (a) Gate 3's design conversation happens or (b) counsel completes review of items 1/2/5/6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:38:49 -05:00
root	cd440d4cee	audit phase 1.6: BIPA pre-launch gates — block identity-service backfill Per IDENTITY_SERVICE_DESIGN v3 §5 Step 0, Phase 1.6 is hard prerequisite to identityd backfill. This doc specifies the 5 gates + 2 supporting deliverables that must ship before real-photo intake. Five gates (BIPA §15 compliance): 1. Public retention schedule — counsel writes; engineering files+hash 2. Informed written consent — counsel writes template; engineering wires identityd consent-status enforcement 3. Photo-upload endpoint with consent enforcement — POST /v1/identity/ subjects/{id}/photo with hard 403 when biometric_consent_status != 'given'; quarantined storage path; deepface output isolated to identityd subjects table (not synthetic-face manifest) 4. Deprecate name → ethnicity inference (mcp-server/search.html lookup tables removed; Phase 1.5 §1B finding closed) 5. Destruction runbook — operator-facing; ties to identityd /erase endpoint with biometric-specific erasure path; daily sweep job for biometric_retention_until expiry Plus: - Cryptographic attestation that no biometric data exists pre-identityd (per v3-B11) — defends against infrastructure-as-notice plaintiff argument - Employee BIPA-handling training acknowledgment Engineering effort: ~4-5 days (one week to stage everything ready). Counsel effort: ~3-6 weeks calendar (review cycles dominate). Calendar bottleneck is counsel, not engineering. Phase 1.6 exit = 7 checked gates + signoffs. Until done, identityd backfill cannot proceed (per identity service design v3 §5 Step 0). 5 open questions for J + counsel: photo-upload UX, consent mechanism (DocuSign/click/paper), named operator list, named counsel for sign-off, public privacy policy URL. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 01:41:29 -05:00

4 Commits