4 Commits

Author SHA1 Message Date
root
a3b65f314e Synthetic face pool — 1000 StyleGAN headshots, ComfyUI hot-swap, 60x smaller thumbs
Worker cards now ship a real photo per person instead of monogram tiles:

  - fetch_face_pool.py pulls 1000 faces from thispersondoesnotexist.com
  - tag_face_pool.py runs deepface for gender/race/age, excludes <22yo
  - manifest.jsonl: 952 servable, gender/race buckets populated
  - /headshots/_thumbs/ pre-resized to 384px webp (587KB -> 11KB,
    60x smaller; without this Chrome's parallel-connection budget
    drops ~75% of tiles in a 40-card grid)
  - /headshots/:key gender x race x age intersection bucketing with
    gender-only fallback when intersection is sparse
  - /headshots/generate/:key ComfyUI on-demand for the contractor
    profile spotlight (cold ~1.5s, cached ~1ms; worker-derived
    djb2 seed makes faces deterministic-per-worker but unique
    across workers sharing the same prompt)
  - serve_imagegen.py _cache_key() now includes seed (was caching
    by prompt only -> 3 different worker seeds collapsed to 1
    cached image; verified fix produces 3 distinct md5s)
  - confidence-default name resolution: Xavier->man+hispanic,
    Aisha->woman+black, etc. Every worker resolves to a bucket.

End-to-end: playwright run on /?q=forklift+operators+IL -> 21/21
cards loaded, 0 broken, all 384px webp.

Cache + binary pool gitignored; manifest tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:01:04 -05:00
root
10ed3bc630 demo: real synthetic headshots — fetch pool + serve route + UI wire
Three layers shipped:

1. SCRIPT — scripts/staffing/fetch_face_pool.py
   Pulls N synthetic StyleGAN faces from thispersondoesnotexist.com
   into data/headshots/face_NNNN.jpg, writes manifest.jsonl. Idempotent:
   re-running skips existing files. Optional gender tagging via deepface
   (currently unavailable on this box; the script handles ImportError
   gracefully and tags everything as untagged). Fetched 198 faces with
   concurrency=3 in ~67s.

2. SERVER — /headshots/:key route in mcp-server/index.ts
   Loads manifest at first hit, caches in globalThis._faces. Hashes the
   key with djb2-style mixing → pool index → returns the JPG. Same
   key always gets the same face (deterministic). Accepts
   ?g=man|woman&e=caucasian|black|hispanic|south_asian|east_asian|middle_eastern
   to bias pool selection — the gender/ethnicity buckets fall back to
   the full pool when no tagged matches exist. Cache-Control:
   86400 immutable so faces ride the browser cache after first hit.
   /headshots/__reload re-reads the manifest without restart.

3. UI — search.html + console.html worker cards
   Re-added overlay <img> on top of the monogram .av circle. img.src
   = /headshots/<encoded-key>?g=<hint>&e=<hint>. img.onerror removes
   the failed image so the monogram stays visible if the face pool
   isn't fetched / CDN is blocked. .av now has overflow:hidden +
   position:relative to clip the img to a perfect circle.

Forced-confident name resolution (J: "we're CREATING the profile,
created as though you truly have the information Xavier is more
likely Hispanic and he's a male"):

   genderFor(name)        — looks up MALE_NAMES + FEMALE_NAMES,
                            falls back to a deterministic hash split
                            so unknown names spread ~50/50. Sets now
                            include cross-cultural names: Alejandro/
                            Andres/Mateo/Santiago/Joaquin/Cesar/Hugo/
                            Felipe/Gerardo/Salvador/Ramon (Hispanic),
                            Raj/Anil/Vikram/Krishna/Pradeep (South
                            Asian), Wei/Yi/Hiroshi/Akira/Hyun (East
                            Asian), Demetrius/Kareem/DaQuan/Khalil
                            (Black), Omar/Khalid/Hassan/Ahmed/Bilal
                            (Middle Eastern). FEMALE_NAMES extended
                            in parallel.

   guessEthnicityFromFirstName(name)
                          — confident default of 'caucasian' for any
                            name not in the cultural buckets so every
                            worker resolves to a category the face
                            pool can be biased toward. Order: ME → Black
                            → Hispanic → South Asian → East Asian →
                            Caucasian (matters where names overlap,
                            e.g. Aisha appears in ME + Black, biases
                            toward ME for visual fit).

   Both helpers also ported into console.html so the triage backfills
   and try-it-yourself rendering get the same hint stack.

Privacy note in the script + route comments: the synthetic data uses
the worker's name as the seed; production should hash worker_id (not
name) to avoid leaking PII to a third-party CDN. The fetch URL itself
is referenced once per pool build, not per-worker.

.gitignore — added data/headshots/face_*.jpg (~100MB for 198 faces;
the manifest + script are tracked). Re-running the script on a fresh
checkout rebuilds the pool from scratch.

Verified end-to-end via playwright on devop.live/lakehouse:
   forklift query → 10 worker cards
   10/10 with face images (real synthetic headshots, not monograms)
   0/10 broken
   Alejandro G. Nelson  → ?g=man&e=hispanic
   Patricia K. Garcia    → ?g=woman&e=caucasian
   Each name → unique face, deterministic across loads.
   Console triage backfills get the same treatment.
2026-04-28 06:01:04 -05:00
root
78266fdd05 Add __pycache__ to gitignore
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:54:09 -05:00
root
a52ca841c6 Phase 0: bootstrap Rust workspace
- Cargo workspace with 6 crates: shared, storaged, catalogd, queryd, aibridge, gateway
- shared: types (DatasetId, ObjectRef, SchemaFingerprint, DatasetManifest) + error enum
- gateway: Axum HTTP entrypoint with nested service routers + tracing
- All services expose /health stubs
- justfile with build/test/run recipes
- PRD, phase tracker, and ADR docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 04:59:05 -05:00