root fd429f4185 audit phase 1.5: BIPA schema audit + outcomes.jsonl content sample
Two follow-up walks per AUDIT_PHASE_1_DISCOVERY §10/C4 + gemini scrum
flag. Read-only. No code changes.

BIPA findings:
- scripts/staffing/tag_face_pool.py uses deepface to extract gender +
  race + age from face images. Output persists to data/headshots/
  manifest.jsonl. For synthetic faces this is fine; for real candidate
  photos this becomes a regulated biometric database (740 ILCS 14/10).
- mcp-server/index.ts:1408 ComfyUI prompt EXPLICITLY embeds protected
  attributes (age + race + gender) into model prompt — system-level
  encoding of protected-attribute features into AI workflow.
- mcp-server/search.html:3375-3432 has hard-coded FEMALE_NAMES /
  MALE_NAMES / NAMES_HISPANIC / SURNAMES_* lookup tables — name-based
  ethnicity inference. Title VII / disparate-impact risk separate
  from BIPA.
- data/headshots/manifest.jsonl is TRACKED IN GIT today (synthetic
  classifications). For real photos, this would be biometric data
  in version control — serious failure.
- No consent flow, no public retention schedule, no deletion
  procedure, no employee training documented. All required by BIPA
  §15(a)/(b) before real-photo intake.

outcomes.jsonl sample:
- 39/101 rows persist candidate names in fills[*].name field today
- Sample names: "Carmen I. Garcia", "Jamal Z. Jones", "Jacob N. Patel"
  (synthetic but real shape)
- 0 hits for "culture fit" / "communication" / etc proxy phrases —
  synthetic data doesn't generate them. When real models reason about
  real candidates, they will. Append-only persistence makes RTBF
  cryptographic-erasure-only.

Recommends Phase 1.6 (NEW) — BIPA pre-launch gates between Phase 1.5
and Phase 2: BIPA_COMPLIANCE_POLICY.md, consent gate at upload
endpoint, quarantine real-photo classifications to data/biometric/,
deprecate name->ethnicity lookup tables, unit test that synthetic
manifest stays synthetic. 4-8 hours of design + one code commit.

5 open questions for J: where do real photos enter, will deepface
tagging path stay for real photos, consent UX, retention duration
floor, designated privacy officer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:22:53 -05:00
2026-04-22 02:41:15 -05:00
Description
Rust-first object storage system
6.3 GiB
Languages
TypeScript 38.4%
Rust 35.8%
HTML 13.9%
Python 7.8%
Shell 2.1%
Other 2%