lakehouse/docs/specs/GATE_3B_DEEPFACE_DESIGN.md
root 8ec43e0721 phase 1.6 Gate 3b: deepface integration design doc (3 options + recommendation)
Per docs/PHASE_1_6_BIPA_GATES.md Gate 3b. Three viable paths for
populating BiometricCollection.classifications, sized + tradeoff'd:

  Option A — Python subprocess per upload (no daemon)
    ~80 LOC, 0.5-1 day. Smallest integration. Reintroduces a Python
    dependency the 2026-05-02 sidecar drop deliberately removed.

  Option B — ONNX models in Rust (no Python at all)
    ~200-400 LOC + model-build pipeline, 5-7 days. Fully consistent
    with sidecar drop. Need pre-trained models with appropriate
    licenses (or train ourselves, multi-week). Adds face detection
    preprocessing in Rust.

  Option C — Defer; classifications field stays None
    0.25 day. BIPA-safest position; substrate is forward-compatible.
    Forces the question "do we actually need classifications?" to be
    answered by a real product requirement, not by spec inertia.

Recommendation: **Option C (defer)**, conditional on confirming the
product requirement. Reasoning:
- All BIPA-load-bearing surfaces (consent + audit + retention +
  erasure) ship without classifications
- Riskiest BIPA position is collecting demographic-derived data
  without a documented business purpose
- Substrate accommodates A or B later in 1-3 days if real demand
  surfaces

Open questions for J at the bottom of the doc — pick A/B/C is the
gating decision before any engineering happens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 05:25:45 -05:00

11 KiB

Gate 3b — Deepface Classification Integration (Design)

Status: Design draft — 2026-05-03 morning · Companion to: PHASE_1_6_BIPA_GATES.md Gate 3 · Depends on: Gate 3a (photo upload) which is shipped (f1fa6e4)

What this is. Three options for how BiometricCollection.classifications (currently Option<JSON>, always None) gets populated by an automated facial-attribute classifier. Phase 1.6 Gate 3a ships the consent-gated upload + audit chain + transactional rollback; Gate 3b adds the classification step. The substrate is ready — what's missing is the design choice for HOW classification happens.

Why a design doc instead of building it. The Python sidecar that historically hosted deepface was deliberately dropped from the hot path on 2026-05-02 (commit ba928b1 in lakehouse: "drop Python sidecar from hot path; AiClient → direct Ollama"). Restoring an in-process Python pipeline would regress that decision. Three viable paths exist with different tradeoffs; this doc lays them out so the choice is deliberate, not accidental.


What deepface produces

deepface (https://github.com/serengil/deepface) is a Python library wrapping multiple face-analysis backends (VGG-Face, FaceNet, OpenFace, etc.). Given an image path, it returns a JSON-shaped result with keys for age, gender, dominant_race, and emotion (plus per-class probabilities). For Phase 1.6's BIPA-compliance scope we need age, gender, dominant_race — the labels that go into SubjectManifest.biometric_collection.classifications.

Per the runbook (Gate 5 destruction §2 step 3), classifications must be:

  • Stored in the SubjectManifest, NOT in a separate analytics table
  • Cleared during BIPA erasure (already wired — biometric_collection = None clears them along with data_path)
  • NEVER served to LLMs in conversation context (only to authenticated staffer UI)

Option A — Python subprocess per upload (one-shot, no daemon)

Spawn a python3 subprocess from the Rust biometric endpoint after the photo is written, run a thin script that imports deepface.DeepFace.analyze, captures the JSON, exits. No long-running Python process, no IPC, no shared state. The hot path stays Rust + tokio; the subprocess is invoked exactly once per upload.

Implementation shape (~50 LOC additional in biometric_endpoint.rs):

// After photo file is written + manifest committed + audit row appended:
let cls_path = state.storage_root.join(&safe_id).join("_classifications.json");
let _ = tokio::process::Command::new("python3")
    .arg("/etc/lakehouse/deepface_classify.py")
    .arg(&abs_path)              // input photo path
    .arg(&cls_path)              // output JSON path
    .kill_on_drop(true)          // don't leak subprocess on cancellation
    .status().await;
// Read cls_path back, populate biometric_collection.classifications.
// If subprocess failed/timed out: leave classifications=None, log warning.
// The upload itself succeeded; classification is best-effort.

The Python script (/etc/lakehouse/deepface_classify.py, ~30 LOC):

#!/usr/bin/env python3
import sys, json
from deepface import DeepFace
result = DeepFace.analyze(sys.argv[1], actions=['age','gender','race'], silent=True)
with open(sys.argv[2], 'w') as f:
    json.dump(result, f)

Pros:

  • Smallest integration: ~80 LOC total
  • Process isolation: a deepface crash can't take down the gateway
  • Easy to disable: rename the script or remove it; classifications just stay None
  • Matches Phase 1.6 spec literally (deepface IS the named classifier)
  • Keeps the hot path Rust-only — subprocess is fired AFTER the response goes out (we can spawn it without awaiting and let it complete async)

Cons:

  • Per-upload cold start (~3-5 seconds for Python interpreter + deepface model load)
  • Operator must keep pip install deepface working on the host (one more dependency to track)
  • The "no Python in hot path" constraint is bent if we await the subprocess inline
  • Reintroduces a Python-dependent code path that the 2026-05-02 sidecar drop was trying to eliminate

Cost estimate: 0.5 day to build + 0.5 day to harden timeout/error paths.


Option B — ONNX model in Rust (no Python at all)

Replace deepface entirely with a Rust-native ONNX runtime invocation. Pre-train (or download pre-trained) ONNX models for age + gender + race classification, ship them at /etc/lakehouse/models/{age,gender,race}.onnx, run them through tract or ort (ONNX Runtime Rust bindings). No Python anywhere.

Implementation shape (~200 LOC + a model-build pipeline):

use ort::{Environment, SessionBuilder, Value};
let env = Environment::builder().build()?;
let session = SessionBuilder::new(&env)?.with_model_from_file("/etc/lakehouse/models/age.onnx")?;
let preprocessed = preprocess_image(&abs_path)?;  // resize, normalize, etc.
let output = session.run(vec![Value::from_array(preprocessed)?])?;
let age = decode_age(&output[0])?;
// repeat for gender + race

Pros:

  • Zero Python dependency — fully consistent with the 2026-05-02 sidecar drop
  • Faster per-call: ~100ms inference (no interpreter cold start)
  • Model files are immutable artifacts that can be hashed + signed for provenance
  • Audit-friendly: deterministic output for a given input + model hash
  • BIPA-defensible: model provenance is auditable; no third-party Python library risk

Cons:

  • Need pre-trained ONNX models for our three tasks (age regression, gender classification, race classification). Either find existing public models with appropriate licenses OR train ourselves (multi-week)
  • Image preprocessing in Rust is non-trivial (face detection + alignment + crop + normalize). May need an additional model (e.g. RetinaFace ONNX) just for face detection
  • ~200-400 LOC of new code in a domain we haven't worked in before
  • Cross-runtime parity story: Go side would need its own ONNX integration if it ever wants the same surface, OR we define classifications as Rust-only (acceptable since the substrate is Rust-side anyway)

Cost estimate: 3-5 days to build + 2 days to validate output quality + license review for any pre-trained models we use. Largest of the three options.


Option C — Defer classifications until a real demand exists

Keep BiometricCollection.classifications as Option<JSON> permanently None until a concrete consumer needs it. The substrate is ready; we don't add the classifier writer.

What still works without classifications:

  • BIPA consent gate (Gate 3a) ✓
  • Photo storage with template_hash for integrity (Gate 3a) ✓
  • Audit chain proving collection happened (Gate 3a) ✓
  • Erasure flow (Gate 5) ✓
  • Retention sweep on biometric clock (Step 7) ✓

What's missing without classifications:

  • The "demographic data derived from photos" disclosure in the consent template (Gate 2 §1 Disclosure 1) becomes vestigial — we collect photos but don't actually derive any demographic information from them
  • The retention schedule (Gate 1) §2 lists "Facial geometry classifications" as a category we collect; without classifications, that category is empty in practice

Pros:

  • Zero engineering today
  • Forces the question "do we actually need classifications?" to be answered by a real product requirement, not by spec inertia
  • BIPA-safer in the most absolute sense: we collect strictly the photo, nothing derived from it
  • If the real product requirement is "let staffers see the photo at job-site verification," classifications add no value (the staffer's eyes do the verification)

Cons:

  • Requires updating Gates 1 + 2 to remove the "we derive classifications" language (counsel coordination needed)
  • Loses the structured demographic data that audit-of-discrimination workflows might want (per IDENTITY_SERVICE_DESIGN.md v3-A4 disparate-impact analysis)
  • If we later need classifications, we land on Option A or B then anyway — this is a deferral, not a permanent choice

Cost estimate: 0.25 day to update Gate 1 + 2 doc language + ⚖ counsel review of the change.


Recommendation

Option C (defer) is the right answer right now, conditional on confirming the real product requirement.

Reasoning:

  1. Phase 1.6's load-bearing claim is consent + audit + retention + erasure. Gate 3a + Gate 5 + retention sweep + audit chain provide ALL of those WITHOUT classifications.
  2. The riskiest BIPA position is collecting demographic-derived data without a documented business purpose. If staffers identify candidates by looking at the photo (the actual stated use case), classifications are decorative.
  3. Option A reintroduces the Python dependency the 2026-05-02 wave deliberately removed. Option B is multi-day work for unclear benefit.
  4. The Option<JSON> substrate is forward-compatible: if a real demand surfaces (say, disparate-impact analytics J actually wants), we can ship Option A or B then in 1-3 days respectively, and the audit chain naturally captures the introduction event.

Next action if Option C is chosen:

  • Update docs/policies/consent/biometric_consent_template_v1.md Disclosure 1 to remove the "facial geometry classifications" claim — adjust to "facial photograph for staff identification"
  • Update docs/policies/consent/biometric_retention_schedule_v1.md §2 to remove the classifications row
  • Mark Gate 3b in PHASE_1_6_BIPA_GATES.md as "deferred — classifications not collected in v1, see GATE_3B_DEEPFACE_DESIGN.md"
  • ⚖ counsel review of the doc changes (consent text changes need their attention regardless)

Next action if Option A is chosen: ~1 day to ship; document the Python dependency carve-out in STATE_OF_PLAY's "WHAT NOT TO RELITIGATE" so future sessions don't try to remove it again.

Next action if Option B is chosen: schedule a multi-day session, source/license pre-trained ONNX models for the three tasks, build face-detection preprocessing in Rust.

⚖ J — pick A / B / C. The substrate accommodates any choice; the cost is the design-doc → counsel-coordination → engineering loop, which differs by an order of magnitude across the options.


Open questions for J

  1. What's the actual product requirement for classifications? If the answer is "I don't know yet," that's a strong vote for Option C.
  2. If classifications: who is the consumer? Staffer UI? Disparate-impact dashboard? An LLM context (NO — Phase 1.6 §3 forbids this)?
  3. If Option A: are we OK with pip install deepface as a host requirement? Or does the operator-of-record need a containerized Python sidecar (which puts us close to where we were before 2026-05-02)?
  4. If Option B: who is the model provenance signer? ⚖ counsel will want a chain of custody from the model author to the ONNX file on disk.

What this doc is NOT

  • Not a final decision. The decision is J's after reading the options.
  • Not a substitute for testing. Whichever option ships, the parity probe subject_audit_parity.sh must continue to pass post-change (the audit chain is the BIPA-load-bearing surface; classifications are additive metadata).
  • Not the only Phase 1.6 work remaining. Counsel-side review of Gates 1/2/5 + the §2 attestation signature are the calendar bottleneck. Gate 3b is the engineering bottleneck.