root 8ec43e0721 phase 1.6 Gate 3b: deepface integration design doc (3 options + recommendation)

Per docs/PHASE_1_6_BIPA_GATES.md Gate 3b. Three viable paths for
populating BiometricCollection.classifications, sized + tradeoff'd:

  Option A — Python subprocess per upload (no daemon)
    ~80 LOC, 0.5-1 day. Smallest integration. Reintroduces a Python
    dependency the 2026-05-02 sidecar drop deliberately removed.

  Option B — ONNX models in Rust (no Python at all)
    ~200-400 LOC + model-build pipeline, 5-7 days. Fully consistent
    with sidecar drop. Need pre-trained models with appropriate
    licenses (or train ourselves, multi-week). Adds face detection
    preprocessing in Rust.

  Option C — Defer; classifications field stays None
    0.25 day. BIPA-safest position; substrate is forward-compatible.
    Forces the question "do we actually need classifications?" to be
    answered by a real product requirement, not by spec inertia.

Recommendation: **Option C (defer)**, conditional on confirming the
product requirement. Reasoning:
- All BIPA-load-bearing surfaces (consent + audit + retention +
  erasure) ship without classifications
- Riskiest BIPA position is collecting demographic-derived data
  without a documented business purpose
- Substrate accommodates A or B later in 1-3 days if real demand
  surfaces

Open questions for J at the bottom of the doc — pick A/B/C is the
gating decision before any engineering happens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 05:25:45 -05:00

11 KiB

Raw Blame History

Gate 3b — Deepface Classification Integration (Design)

Status: Design draft — 2026-05-03 morning · Companion to: PHASE_1_6_BIPA_GATES.md Gate 3 · Depends on: Gate 3a (photo upload) which is shipped (f1fa6e4)

What this is. Three options for how BiometricCollection.classifications (currently Option<JSON>, always None) gets populated by an automated facial-attribute classifier. Phase 1.6 Gate 3a ships the consent-gated upload + audit chain + transactional rollback; Gate 3b adds the classification step. The substrate is ready — what's missing is the design choice for HOW classification happens.

Why a design doc instead of building it. The Python sidecar that historically hosted deepface was deliberately dropped from the hot path on 2026-05-02 (commit ba928b1 in lakehouse: "drop Python sidecar from hot path; AiClient → direct Ollama"). Restoring an in-process Python pipeline would regress that decision. Three viable paths exist with different tradeoffs; this doc lays them out so the choice is deliberate, not accidental.

What deepface produces

deepface (https://github.com/serengil/deepface) is a Python library wrapping multiple face-analysis backends (VGG-Face, FaceNet, OpenFace, etc.). Given an image path, it returns a JSON-shaped result with keys for age, gender, dominant_race, and emotion (plus per-class probabilities). For Phase 1.6's BIPA-compliance scope we need age, gender, dominant_race — the labels that go into SubjectManifest.biometric_collection.classifications.

Per the runbook (Gate 5 destruction §2 step 3), classifications must be:

Stored in the SubjectManifest, NOT in a separate analytics table
Cleared during BIPA erasure (already wired — biometric_collection = None clears them along with data_path)
NEVER served to LLMs in conversation context (only to authenticated staffer UI)

Option A — Python subprocess per upload (one-shot, no daemon)

Spawn a python3 subprocess from the Rust biometric endpoint after the photo is written, run a thin script that imports deepface.DeepFace.analyze, captures the JSON, exits. No long-running Python process, no IPC, no shared state. The hot path stays Rust + tokio; the subprocess is invoked exactly once per upload.

Implementation shape (~50 LOC additional in biometric_endpoint.rs):

// After photo file is written + manifest committed + audit row appended:
let cls_path = state.storage_root.join(&safe_id).join("_classifications.json");
let _ = tokio::process::Command::new("python3")
    .arg("/etc/lakehouse/deepface_classify.py")
    .arg(&abs_path)              // input photo path
    .arg(&cls_path)              // output JSON path
    .kill_on_drop(true)          // don't leak subprocess on cancellation
    .status().await;
// Read cls_path back, populate biometric_collection.classifications.
// If subprocess failed/timed out: leave classifications=None, log warning.
// The upload itself succeeded; classification is best-effort.

The Python script (/etc/lakehouse/deepface_classify.py, ~30 LOC):

#!/usr/bin/env python3
import sys, json
from deepface import DeepFace
result = DeepFace.analyze(sys.argv[1], actions=['age','gender','race'], silent=True)
with open(sys.argv[2], 'w') as f:
    json.dump(result, f)

Pros:

Smallest integration: ~80 LOC total
Process isolation: a deepface crash can't take down the gateway
Easy to disable: rename the script or remove it; classifications just stay None
Matches Phase 1.6 spec literally (deepface IS the named classifier)
Keeps the hot path Rust-only — subprocess is fired AFTER the response goes out (we can spawn it without awaiting and let it complete async)

Cons:

Per-upload cold start (~3-5 seconds for Python interpreter + deepface model load)
Operator must keep pip install deepface working on the host (one more dependency to track)
The "no Python in hot path" constraint is bent if we await the subprocess inline
Reintroduces a Python-dependent code path that the 2026-05-02 sidecar drop was trying to eliminate

Cost estimate: 0.5 day to build + 0.5 day to harden timeout/error paths.

Option B — ONNX model in Rust (no Python at all)

Replace deepface entirely with a Rust-native ONNX runtime invocation. Pre-train (or download pre-trained) ONNX models for age + gender + race classification, ship them at /etc/lakehouse/models/{age,gender,race}.onnx, run them through tract or ort (ONNX Runtime Rust bindings). No Python anywhere.

Implementation shape (~200 LOC + a model-build pipeline):

use ort::{Environment, SessionBuilder, Value};
let env = Environment::builder().build()?;
let session = SessionBuilder::new(&env)?.with_model_from_file("/etc/lakehouse/models/age.onnx")?;
let preprocessed = preprocess_image(&abs_path)?;  // resize, normalize, etc.
let output = session.run(vec![Value::from_array(preprocessed)?])?;
let age = decode_age(&output[0])?;
// repeat for gender + race

Pros:

Zero Python dependency — fully consistent with the 2026-05-02 sidecar drop
Faster per-call: ~100ms inference (no interpreter cold start)
Model files are immutable artifacts that can be hashed + signed for provenance
Audit-friendly: deterministic output for a given input + model hash
BIPA-defensible: model provenance is auditable; no third-party Python library risk

Cons:

Need pre-trained ONNX models for our three tasks (age regression, gender classification, race classification). Either find existing public models with appropriate licenses OR train ourselves (multi-week)
Image preprocessing in Rust is non-trivial (face detection + alignment + crop + normalize). May need an additional model (e.g. RetinaFace ONNX) just for face detection
~200-400 LOC of new code in a domain we haven't worked in before
Cross-runtime parity story: Go side would need its own ONNX integration if it ever wants the same surface, OR we define classifications as Rust-only (acceptable since the substrate is Rust-side anyway)

Cost estimate: 3-5 days to build + 2 days to validate output quality + license review for any pre-trained models we use. Largest of the three options.

Option C — Defer classifications until a real demand exists

Keep BiometricCollection.classifications as Option<JSON> permanently None until a concrete consumer needs it. The substrate is ready; we don't add the classifier writer.

What still works without classifications:

BIPA consent gate (Gate 3a) ✓
Photo storage with template_hash for integrity (Gate 3a) ✓
Audit chain proving collection happened (Gate 3a) ✓
Erasure flow (Gate 5) ✓
Retention sweep on biometric clock (Step 7) ✓

What's missing without classifications:

The "demographic data derived from photos" disclosure in the consent template (Gate 2 §1 Disclosure 1) becomes vestigial — we collect photos but don't actually derive any demographic information from them
The retention schedule (Gate 1) §2 lists "Facial geometry classifications" as a category we collect; without classifications, that category is empty in practice

Pros:

Zero engineering today
Forces the question "do we actually need classifications?" to be answered by a real product requirement, not by spec inertia
BIPA-safer in the most absolute sense: we collect strictly the photo, nothing derived from it
If the real product requirement is "let staffers see the photo at job-site verification," classifications add no value (the staffer's eyes do the verification)

Cons:

Requires updating Gates 1 + 2 to remove the "we derive classifications" language (counsel coordination needed)
Loses the structured demographic data that audit-of-discrimination workflows might want (per IDENTITY_SERVICE_DESIGN.md v3-A4 disparate-impact analysis)
If we later need classifications, we land on Option A or B then anyway — this is a deferral, not a permanent choice

Cost estimate: 0.25 day to update Gate 1 + 2 doc language + ⚖ counsel review of the change.

Recommendation

Option C (defer) is the right answer right now, conditional on confirming the real product requirement.

Reasoning:

Phase 1.6's load-bearing claim is consent + audit + retention + erasure. Gate 3a + Gate 5 + retention sweep + audit chain provide ALL of those WITHOUT classifications.
The riskiest BIPA position is collecting demographic-derived data without a documented business purpose. If staffers identify candidates by looking at the photo (the actual stated use case), classifications are decorative.
Option A reintroduces the Python dependency the 2026-05-02 wave deliberately removed. Option B is multi-day work for unclear benefit.
The Option<JSON> substrate is forward-compatible: if a real demand surfaces (say, disparate-impact analytics J actually wants), we can ship Option A or B then in 1-3 days respectively, and the audit chain naturally captures the introduction event.

Next action if Option C is chosen:

Update docs/policies/consent/biometric_consent_template_v1.md Disclosure 1 to remove the "facial geometry classifications" claim — adjust to "facial photograph for staff identification"
Update docs/policies/consent/biometric_retention_schedule_v1.md §2 to remove the classifications row
Mark Gate 3b in PHASE_1_6_BIPA_GATES.md as "deferred — classifications not collected in v1, see GATE_3B_DEEPFACE_DESIGN.md"
⚖ counsel review of the doc changes (consent text changes need their attention regardless)

Next action if Option A is chosen: ~1 day to ship; document the Python dependency carve-out in STATE_OF_PLAY's "WHAT NOT TO RELITIGATE" so future sessions don't try to remove it again.

Next action if Option B is chosen: schedule a multi-day session, source/license pre-trained ONNX models for the three tasks, build face-detection preprocessing in Rust.

⚖ J — pick A / B / C. The substrate accommodates any choice; the cost is the design-doc → counsel-coordination → engineering loop, which differs by an order of magnitude across the options.

Open questions for J

What's the actual product requirement for classifications? If the answer is "I don't know yet," that's a strong vote for Option C.
If classifications: who is the consumer? Staffer UI? Disparate-impact dashboard? An LLM context (NO — Phase 1.6 §3 forbids this)?
If Option A: are we OK with pip install deepface as a host requirement? Or does the operator-of-record need a containerized Python sidecar (which puts us close to where we were before 2026-05-02)?
If Option B: who is the model provenance signer? ⚖ counsel will want a chain of custody from the model author to the ONNX file on disk.

What this doc is NOT

Not a final decision. The decision is J's after reading the options.
Not a substitute for testing. Whichever option ships, the parity probe subject_audit_parity.sh must continue to pass post-change (the audit chain is the BIPA-load-bearing surface; classifications are additive metadata).
Not the only Phase 1.6 work remaining. Counsel-side review of Gates 1/2/5 + the §2 attestation signature are the calendar bottleneck. Gate 3b is the engineering bottleneck.

11 KiB Raw Blame History

Gate 3b — Deepface Classification Integration (Design)

What deepface produces

Option A — Python subprocess per upload (one-shot, no daemon)

Option B — ONNX model in Rust (no Python at all)

Option C — Defer classifications until a real demand exists

Recommendation

Open questions for J

What this doc is NOT

11 KiB

Raw Blame History