fetch_face_pool was wiping 952 hand-classified rows when re-run from
a Python without deepface installed (it reset every gender to None).
Now:
- Loads existing manifest by id and overlays only fetch-owned fields,
so gender/race/age/excluded survive a refetch.
- deepface pass tags only records that don't already have a gender;
deepface unavailable means "leave existing tags alone" not "reset".
- New --shrink flag required to drop ids >= --count. Default refuses
to shrink the pool silently.
- Atomic write via tmp + os.replace so an interrupted run can't
corrupt the manifest.
- Dedupes duplicate id lines (root cause of the 2497-row manifest
backing a 1000-face pool).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker cards now ship a real photo per person instead of monogram tiles:
- fetch_face_pool.py pulls 1000 faces from thispersondoesnotexist.com
- tag_face_pool.py runs deepface for gender/race/age, excludes <22yo
- manifest.jsonl: 952 servable, gender/race buckets populated
- /headshots/_thumbs/ pre-resized to 384px webp (587KB -> 11KB,
60x smaller; without this Chrome's parallel-connection budget
drops ~75% of tiles in a 40-card grid)
- /headshots/:key gender x race x age intersection bucketing with
gender-only fallback when intersection is sparse
- /headshots/generate/:key ComfyUI on-demand for the contractor
profile spotlight (cold ~1.5s, cached ~1ms; worker-derived
djb2 seed makes faces deterministic-per-worker but unique
across workers sharing the same prompt)
- serve_imagegen.py _cache_key() now includes seed (was caching
by prompt only -> 3 different worker seeds collapsed to 1
cached image; verified fix produces 3 distinct md5s)
- confidence-default name resolution: Xavier->man+hispanic,
Aisha->woman+black, etc. Every worker resolves to a bucket.
End-to-end: playwright run on /?q=forklift+operators+IL -> 21/21
cards loaded, 0 broken, all 384px webp.
Cache + binary pool gitignored; manifest tracked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layers shipped:
1. SCRIPT — scripts/staffing/fetch_face_pool.py
Pulls N synthetic StyleGAN faces from thispersondoesnotexist.com
into data/headshots/face_NNNN.jpg, writes manifest.jsonl. Idempotent:
re-running skips existing files. Optional gender tagging via deepface
(currently unavailable on this box; the script handles ImportError
gracefully and tags everything as untagged). Fetched 198 faces with
concurrency=3 in ~67s.
2. SERVER — /headshots/:key route in mcp-server/index.ts
Loads manifest at first hit, caches in globalThis._faces. Hashes the
key with djb2-style mixing → pool index → returns the JPG. Same
key always gets the same face (deterministic). Accepts
?g=man|woman&e=caucasian|black|hispanic|south_asian|east_asian|middle_eastern
to bias pool selection — the gender/ethnicity buckets fall back to
the full pool when no tagged matches exist. Cache-Control:
86400 immutable so faces ride the browser cache after first hit.
/headshots/__reload re-reads the manifest without restart.
3. UI — search.html + console.html worker cards
Re-added overlay <img> on top of the monogram .av circle. img.src
= /headshots/<encoded-key>?g=<hint>&e=<hint>. img.onerror removes
the failed image so the monogram stays visible if the face pool
isn't fetched / CDN is blocked. .av now has overflow:hidden +
position:relative to clip the img to a perfect circle.
Forced-confident name resolution (J: "we're CREATING the profile,
created as though you truly have the information Xavier is more
likely Hispanic and he's a male"):
genderFor(name) — looks up MALE_NAMES + FEMALE_NAMES,
falls back to a deterministic hash split
so unknown names spread ~50/50. Sets now
include cross-cultural names: Alejandro/
Andres/Mateo/Santiago/Joaquin/Cesar/Hugo/
Felipe/Gerardo/Salvador/Ramon (Hispanic),
Raj/Anil/Vikram/Krishna/Pradeep (South
Asian), Wei/Yi/Hiroshi/Akira/Hyun (East
Asian), Demetrius/Kareem/DaQuan/Khalil
(Black), Omar/Khalid/Hassan/Ahmed/Bilal
(Middle Eastern). FEMALE_NAMES extended
in parallel.
guessEthnicityFromFirstName(name)
— confident default of 'caucasian' for any
name not in the cultural buckets so every
worker resolves to a category the face
pool can be biased toward. Order: ME → Black
→ Hispanic → South Asian → East Asian →
Caucasian (matters where names overlap,
e.g. Aisha appears in ME + Black, biases
toward ME for visual fit).
Both helpers also ported into console.html so the triage backfills
and try-it-yourself rendering get the same hint stack.
Privacy note in the script + route comments: the synthetic data uses
the worker's name as the seed; production should hash worker_id (not
name) to avoid leaking PII to a third-party CDN. The fetch URL itself
is referenced once per pool build, not per-worker.
.gitignore — added data/headshots/face_*.jpg (~100MB for 198 faces;
the manifest + script are tracked). Re-running the script on a fresh
checkout rebuilds the pool from scratch.
Verified end-to-end via playwright on devop.live/lakehouse:
forklift query → 10 worker cards
10/10 with face images (real synthetic headshots, not monograms)
0/10 broken
Alejandro G. Nelson → ?g=man&e=hispanic
Patricia K. Garcia → ?g=woman&e=caucasian
Each name → unique face, deterministic across loads.
Console triage backfills get the same treatment.