Compare commits
10 Commits
d571d62e9b
...
f4dc1b29e3
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f4dc1b29e3 | ||
|
|
f892230699 | ||
|
|
4b92d1da91 | ||
|
|
1745881426 | ||
|
|
a05174d2fa | ||
|
|
f9a408e4c4 | ||
|
|
a3b65f314e | ||
|
|
10ed3bc630 | ||
|
|
cdf5f5926a | ||
|
|
f92b55615f |
8
.gitignore
vendored
8
.gitignore
vendored
@ -4,3 +4,11 @@
|
|||||||
.env
|
.env
|
||||||
__pycache__/
|
__pycache__/
|
||||||
*.pyc
|
*.pyc
|
||||||
|
|
||||||
|
# Headshot pool — binary face JPGs are fetched by scripts/staffing/fetch_face_pool.py
|
||||||
|
# (synthetic StyleGAN, ~580MB for 1000 faces). Manifest + fetch script are tracked.
|
||||||
|
data/headshots/face_*.jpg
|
||||||
|
data/headshots/_thumbs/
|
||||||
|
# ComfyUI on-demand generated portraits (per-worker unique). Cached on first
|
||||||
|
# request; fully regeneratable via /headshots/generate/:key.
|
||||||
|
data/headshots_gen/
|
||||||
|
|||||||
239
STATE_OF_PLAY.md
Normal file
239
STATE_OF_PLAY.md
Normal file
@ -0,0 +1,239 @@
|
|||||||
|
# STATE OF PLAY — Lakehouse
|
||||||
|
|
||||||
|
**Last verified:** 2026-04-27 ~20:35 CDT
|
||||||
|
**Verified by:** live probe, not memory.
|
||||||
|
|
||||||
|
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## VERIFIED WORKING RIGHT NOW
|
||||||
|
|
||||||
|
### The client demo (Staffing Co-Pilot)
|
||||||
|
|
||||||
|
**Public URL:** `https://devop.live/lakehouse/` — 200, "Staffing Co-Pilot" (159 KB SPA, leaflet maps, dark theme).
|
||||||
|
**Local URL:** `http://localhost:3700/` — same page, served by `mcp-server/index.ts` (PID 1271, started 09:48 CDT today).
|
||||||
|
|
||||||
|
**The staffers console** (the one the client was thoroughly impressed with):
|
||||||
|
- `https://devop.live/lakehouse/console` — 200, "Lakehouse — What Your Staffing System Would Do" (26 KB)
|
||||||
|
- Pulls project index via `/api/catalog/datasets` (36 datasets) + playbook memory via `/api/vectors/playbook_memory/stats` (4,701 entries with embeddings, real ops like *"fill: Maintenance Tech x2 in Milwaukee, WI"*)
|
||||||
|
|
||||||
|
Client-visible flow that works end-to-end on the public URL:
|
||||||
|
|
||||||
|
| Endpoint | Sample output |
|
||||||
|
|---|---|
|
||||||
|
| `GET /api/catalog/datasets` | 36 datasets indexed: timesheets 1M, call_log 800K, workers_500k 500K, email_log 500K, workers_100k 100K, candidates 100K, placements 50K, job_orders 15K, successful_playbooks_live 2,077 |
|
||||||
|
| `GET /api/vectors/playbook_memory/stats` | 4,701 fill operations with embeddings |
|
||||||
|
| `GET /system/summary` | 36 datasets, 2.98M rows, 60 indexes, 500K workers loaded, 1K candidates |
|
||||||
|
| `POST /intelligence/staffing_forecast` | 744 Production Workers needed in 30d, 11,281 bench (4,687 reliable), coverage 1,444%, risk=ok. Same for Electrician (need 32, bench 2,440) and Maintenance Tech (need 17, bench 5,004). |
|
||||||
|
| `POST /intelligence/permit_contracts` | permit `3442956` $500K → 3 Production Workers, 886-candidate pool, 95% fill, $36K gross. 5 more Chicago permits with 8 workers each, same pool, 95% fill, $96K each. |
|
||||||
|
| `POST /intelligence/market` | major Chicago permits ranked: $730M O'Hare, $615M 307 N Michigan, $580M casino, $445M Loop transit (real geo coords). |
|
||||||
|
| `POST /intelligence/permit_entities` | architects + contractors from permit contacts (e.g. "KACPRZYNSKI, ANDY", "SLS ELECTRICAL SERVICE"). |
|
||||||
|
| `POST /intelligence/activity` + `/intelligence/arch_signals` + `/intelligence/chat` | all 200 |
|
||||||
|
|
||||||
|
The demo tells the story: *"upcoming Chicago contracts → workers needed → coverage from the bench → architects/contractors involved → revenue and margin."* That's the "live data + anticipating contracts + complete workflow" pitch — working as of right now.
|
||||||
|
|
||||||
|
### Backend, verified live this session
|
||||||
|
|
||||||
|
| Surface | State |
|
||||||
|
|---|---|
|
||||||
|
| Gateway `:3100` | up, 4 providers configured, `/v1/health` 200 with 500K workers loaded |
|
||||||
|
| MCP server `:3700` (Co-Pilot demo) | up, all `/intelligence/*` endpoints respond |
|
||||||
|
| VCP UI `:3950` | started this session, `/data/*` 200, real numbers |
|
||||||
|
| Observer `:3800` | ring full (2,000/2,000) — older events evicted, query Langfuse for 24h-ago state |
|
||||||
|
| Sidecar `:3200` | up |
|
||||||
|
| Langfuse `:3001` | recording, `gw:/log` + `v1.chat:openrouter` traces visible |
|
||||||
|
| LLM Team UI `:5000` | up, only `extract` mode registered |
|
||||||
|
| OpenCode fleet | **40 models reachable through one `sk-*` key** (verified live `GET https://opencode.ai/zen/v1/models`) |
|
||||||
|
|
||||||
|
OpenCode catalog (live):
|
||||||
|
- Claude: opus-4-7, opus-4-6, opus-4-5, opus-4-1, sonnet-4-6, sonnet-4-5, sonnet-4, haiku-4-5
|
||||||
|
- GPT-5: 5.5-pro, 5.5, 5.4-pro, 5.4, 5.4-mini, 5.4-nano, 5.3-codex-spark, 5.3-codex, 5.2, 5.2-codex, 5.1-codex-max, 5.1-codex, 5.1-codex-mini, 5.1, 5-codex, 5-nano, 5
|
||||||
|
- Gemini: 3.1-pro, 3-flash
|
||||||
|
- GLM: 5.1, 5
|
||||||
|
- Minimax: m2.7, m2.5
|
||||||
|
- Kimi: k2.6, k2.5
|
||||||
|
- Qwen: 3.6-plus, 3.5-plus
|
||||||
|
- Other: BIG-PKL (was a typo-prone name in the catalog, model id starts with "big-pkl-something")
|
||||||
|
- Free tier: minimax-m2.5-free, hy3-preview-free, ling-2.6-flash-free, trinity-large-preview-free
|
||||||
|
|
||||||
|
### The substrate (frozen — do not re-architect)
|
||||||
|
|
||||||
|
- Distillation v1.0.0 at tag `e7636f2` — **145/145 bun tests pass, 22/22 acceptance, 16/16 audit-full**
|
||||||
|
- Output: `data/_kb/distilled_{facts,procedures,config_hints}.jsonl` + `data/vectors/distilled_{factual,procedural,config_hint}_v20260423102847.parquet`
|
||||||
|
- Auditor cross-lineage: Kimi K2.6 ↔ Haiku 4.5 alternation, Opus auto-promote on diffs >100k chars, **per-PR cap=3 with auto-reset on new head SHA**
|
||||||
|
- Pathway memory: 88 traces, 11/11 successful replays (probation gate crossed)
|
||||||
|
- Mode runner: 5 native modes; `codereview_isolation` is default; composed-corpus auto-downgrade verified Apr 26 (composed lost 5/5 vs isolation, p=0.031)
|
||||||
|
|
||||||
|
### Matrix indexer
|
||||||
|
|
||||||
|
30+ live corpora including:
|
||||||
|
- 5 versions of `workers_500k_v1..v9` (50K embedded chunks each)
|
||||||
|
- 11 batched 2K-row shards `w500k_b3..b17`
|
||||||
|
- `chicago_permits_v1` (3,420), `resumes_100k_v2` (100K candidates), `ethereal_workers_v1` (10K)
|
||||||
|
- `lakehouse_arch_v1` (2,119), `lakehouse_symbols_v1` (2,470), `lakehouse_answers_v1` (1,269), `scrum_findings_v1` (1,260)
|
||||||
|
- `kb_team_runs_v1` (12,693) + `kb_team_runs_agent` (4,407) — LLM-team play history embedded
|
||||||
|
- `distilled_factual_v20260423102507` (8) — distillation output
|
||||||
|
|
||||||
|
### Code health
|
||||||
|
|
||||||
|
- `cargo check --workspace` → **0 warnings, 0 errors**
|
||||||
|
- `bun test auditor + tests/distillation` → **145/145 pass**
|
||||||
|
- `ui/server.ts` + `auditor.ts` bundle clean
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## DO NOT RELITIGATE
|
||||||
|
|
||||||
|
- **PR #11 is merged into `origin/main` as `ed57eda`** — do not "still need to merge PR #11."
|
||||||
|
- **Distillation tag `distillation-v1.0.0` at `e7636f2` is FROZEN** — do not re-architect schemas, scorer rules, audit fixtures.
|
||||||
|
- **Kimi forensic HOLD verdict (2026-04-27) was 2/8 false + 6/8 latent** — do not re-debate, see `reports/kimi/audit-last-week-full.md`.
|
||||||
|
- **`candidates_safe` `vertical` column bug** — fixed at catalog metadata layer in commit `c3c9c21`. Do not "discover" it again.
|
||||||
|
- **Decisions A/B/C/D from `synthetic-data-gap-report.md`** — all four scripts shipped today (`d56f08e`, `940737d`, `c3c9c21`). Do not "ask J for approval."
|
||||||
|
- **`workers_500k.phone` type fixup** — already string. The fixup script is idempotent; running it is a no-op.
|
||||||
|
- **`client_workerskjkk` typo dataset** — was breaking every SQL query (catalog had it registered, file didn't exist). Removed via `DELETE /catalog/datasets/by-name/client_workerskjkk` this session. Do not re-add. Adding a startup gate that errors on unrecognized parquet names is the long-term fix per now.md Step 2C.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## FIXES MADE THIS SESSION (2026-04-27 evening)
|
||||||
|
|
||||||
|
1. **`crates/gateway/src/v1/iterate.rs:93`** — `state` → `_state` (cleared the one cargo warning).
|
||||||
|
2. **`lakehouse-ui.service` (Dioxus)** — disabled. Was failing 7,242 times against a missing `target/dx/ui/debug/web/public` build dir. `systemctl stop && disable`.
|
||||||
|
3. **VCP UI on `:3950`** — started `bun run ui/server.ts` (PID 1162212, log `/tmp/lakehouse_ui.log`). `/data/*` endpoints now 200 with real data.
|
||||||
|
4. **`client_workerskjkk` catalog entry** — `DELETE /catalog/datasets/by-name/client_workerskjkk` removed the dead manifest. **This was the actual root cause** of `/system/summary` reporting `workers_500k_rows: 0` and the demo showing zero bench. Every SQL query was failing schema inference on the missing file before reaching its target table. Fixed → `workers_500k_rows: 500000`, `candidates_rows: 1000`, demo coverage flipped from "critical 0%" to actual percentages on devop.live/lakehouse.
|
||||||
|
|
||||||
|
## FIXES MADE THIS SESSION (2026-04-28 early — face pool)
|
||||||
|
|
||||||
|
5. **Synthetic StyleGAN face pool — 1000 faces, gender+race+age tagged.** `scripts/staffing/fetch_face_pool.py` fetches from thispersondoesnotexist.com; `scripts/staffing/tag_face_pool.py --min-age 22` runs deepface and excludes minors. `data/headshots/manifest.jsonl` now has gender (494 men / 458 women), race (caucasian 662 · east_asian 128 · hispanic 86 · middle_eastern 59 · black 14 · south_asian 3), age, and 48 minor exclusions. Server pool = 952 servable faces.
|
||||||
|
6. **`mcp-server/index.ts:1308` `/headshots/:key` route** — gender×race×age intersection bucketing with graceful fallback (gender-only → all). Same key always returns same face; different keys spread evenly.
|
||||||
|
7. **`/headshots/_thumbs/` pre-resized 384×384 webp** (60× smaller: 587KB → ~11KB). Without this, 40-card grids overran Chrome's parallel-connection budget and ~75% of tiles never finished decoding. Generated via parallel ffmpeg (`xargs -P 8`); `.gitignore`d.
|
||||||
|
8. **`mcp-server/search.html` + `console.html`** — dropped `img.loading='lazy'`. With 11KB thumbs, eager load is cheap (~500KB for 50 cards) and avoids the off-screen race that lazy decode produced.
|
||||||
|
9. **ComfyUI on-demand uniqueness — `serve_imagegen.py:32`** added `seed` to `_cache_key()` (was caching by prompt only — 3 different worker seeds collapsed to 1 cached image). Verified: seed=839185194/195/196 → 3 distinct md5s.
|
||||||
|
10. **`mcp-server/index.ts:1234` `/headshots/generate/:key`** — ComfyUI hot-path that derives a deterministic-per-worker seed via djb2-style hash; cold ~1.5s, cached ~1ms. Worker prompt format: `professional corporate headshot portrait of a {age}-year-old {race} {gender}, {role}, neutral expression, plain studio background, soft natural lighting, sharp focus, photorealistic, dslr`. Cache at `data/headshots_gen/` (gitignored, regeneratable).
|
||||||
|
11. **Confidence-default name resolution** in `search.html` — `genderFor()` and `guessEthnicityFromFirstName()` lookup tables (FEMALE_NAMES, MALE_NAMES, NAMES_HISPANIC, NAMES_BLACK, NAMES_SOUTH_ASIAN, NAMES_EAST_ASIAN, NAMES_MIDDLE_EASTERN). Xavier → man+hispanic, Aisha → woman+black, etc. Every worker resolves to a face-pool bucket.
|
||||||
|
|
||||||
|
End-to-end verified: playwright run on `https://devop.live/lakehouse/?q=forklift+operators+IL` → 21/21 cards loaded, 0 broken, all 384×384 webp thumbs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OPEN — but not blocking the demo
|
||||||
|
|
||||||
|
| Item | What | When to act |
|
||||||
|
|---|---|---|
|
||||||
|
| `modes.toml` `staffing_inference.matrix_corpus` | still says `workers_500k_v8`. v9 in vector index is from Apr 17 (raw-sourced, not safe-view). The new `build_workers_v9.sh` rebuilds from `workers_safe`. | Run when you have 30+ min for the rebuild. |
|
||||||
|
| Open PRs #6, #7, #10 | sitting since Apr 22-24, auditor verdicts on disk at `data/_auditor/kimi_verdicts/{6,7,10}-*.json` | Read verdicts, decide reconcile/close. |
|
||||||
|
| `test/enrich-prd-pipeline` branch | 35 unmerged commits, includes more-evolved auditor/inference.ts (666 vs main's 580 lines), curation+fact-extractor wiring | Reconcile or formally archive — see `memory/project_unmerged_architecture_work.md`. |
|
||||||
|
| `federation-hnsw-trials` stash | Lance + S3/MinIO prototype, `aws-config` crate added, 708 insertions | Phase B from EXECUTION_PLAN.md — revisit when Parquet vector ceiling actually hurts. |
|
||||||
|
| `candidates` manifest drift | manifest 100K vs SQL 1K. Cosmetic. | Run a metadata resync if it matters. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## RUNTIME CHEATSHEET
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify the demo (public + local both work)
|
||||||
|
curl -sS https://devop.live/lakehouse/ # Co-Pilot HTML
|
||||||
|
curl -sS https://devop.live/lakehouse/console # staffers console
|
||||||
|
curl -sS -X POST https://devop.live/lakehouse/intelligence/staffing_forecast \
|
||||||
|
-d '{}' -H 'content-type: application/json' \
|
||||||
|
| jq '.forecast[] | {role, demand_workers, bench_total, coverage_pct, risk}'
|
||||||
|
|
||||||
|
# Restart sequence (after Rust changes)
|
||||||
|
sudo systemctl restart lakehouse.service # gateway :3100
|
||||||
|
sudo systemctl restart lakehouse-auditor # auditor daemon
|
||||||
|
sudo systemctl restart lakehouse-observer # observer :3800
|
||||||
|
# UI bun on :3950 is NOT systemd-managed (lakehouse-ui.service is disabled).
|
||||||
|
# Restart manually: kill <pid>; nohup bun run ui/server.ts > /tmp/lakehouse_ui.log 2>&1 &
|
||||||
|
|
||||||
|
# Health checks
|
||||||
|
curl -sS http://localhost:3100/v1/health | jq # workers_count, providers
|
||||||
|
curl -sS http://localhost:3100/vectors/pathway/stats | jq
|
||||||
|
curl -sS http://localhost:3100/v1/usage | jq # since-restart cost
|
||||||
|
curl -sS http://localhost:3700/system/summary | jq # dataset counts
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## VISION — what we're actually building (not what's done)
|
||||||
|
|
||||||
|
J's framing for the legacy staffing company:
|
||||||
|
|
||||||
|
- Pull live data, anticipate contracts based on Chicago permits → real architect/contractor associations, headcount, time period, money, scope.
|
||||||
|
- Hybrid + memory index → search large corpora cheaply.
|
||||||
|
- Email comes in → verify against contract; SMS comes in → alert when index changes.
|
||||||
|
- Real-time.
|
||||||
|
- Invent metrics nobody else has using the hybrid index.
|
||||||
|
- Next stage: workers download an app → geolocation clock-in → automatic responsiveness measurement, no user effort, with incentives for using it.
|
||||||
|
- Find people getting certificates (passive cert tracking).
|
||||||
|
- Pull union data → bring contracts that work for **employees**, not just employers.
|
||||||
|
- All metrics visible, nothing hidden, value-aligned with what each side actually needs.
|
||||||
|
|
||||||
|
If a future session is shaving away from this vision toward "fix the cutover" or "land Phase X," the vision wins. Phases are scaffolding for the vision, not the goal.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CURRENT PLAN — fix the demo for the legacy staffing client
|
||||||
|
|
||||||
|
Built from playwright audit of the live demo (2026-04-27 evening). Each item ends in something the client can SEE, not internal cleanups.
|
||||||
|
|
||||||
|
**Demo state is anchored by git tag `demo-2026-04-27`** (commit `ed57eda`, the merge of PR #11). To restore code state: `git checkout demo-2026-04-27`. To restore runtime state: `DELETE /catalog/datasets/by-name/client_workerskjkk` (catalog hot-fix is not in git).
|
||||||
|
|
||||||
|
### P1 — Search box that actually filters (highest visible impact)
|
||||||
|
|
||||||
|
**Problem:** typing in `#sq` and pressing Enter fires `POST /intelligence/chat` with body `{"message":"<query>"}`. The state (`#sst`) and role (`#srl`) selects are ignored — never sent in the body. So every search returns a generic chat completion, never a SQL+vector hybrid filter against `workers_500k`. That is the "cached/generic response" the client sees.
|
||||||
|
|
||||||
|
**Fix:** in `mcp-server/search.html`, change the search-submit handler to call the real worker search endpoint with `{query, state, role, top_k}`. The MCP `search_workers` tool surface already exists; route the form there. Render returned worker rows in the existing card grid.
|
||||||
|
|
||||||
|
**Done when:** typing "forklift" + state IL + role "Forklift Operator" returns ≤ top_k IL Forklift Operators, and changing state to WI returns different workers.
|
||||||
|
|
||||||
|
### P2 — Contractor-name click → `/contractor` profile page
|
||||||
|
|
||||||
|
**Problem:** clicking a contractor name in any rendered card stays on `/lakehouse/`. URL doesn't change.
|
||||||
|
|
||||||
|
**Fix:** wrap contractor names in `<a href="/contractor?name=<encoded>">`. The page `mcp-server/contractor.html` (14.8 KB, "Contractor Profile · Staffing Co-Pilot") already exists at `/contractor` and the data endpoint `/intelligence/contractor_profile` already returns rich data.
|
||||||
|
|
||||||
|
**Then check contractor.html actually shows:** full history of every record the database has on that contractor + heat map of locations underneath + relevant info (per J 2026-04-27). If the page is incomplete, finish it. Otherwise just wire the link.
|
||||||
|
|
||||||
|
**Done when:** clicking "KACPRZYNSKI, ANDY" opens a profile with: every Chicago permit they're contact_1 or contact_2 on, a leaflet map with markers for each address, and any matched workers from prior placements at their sites.
|
||||||
|
|
||||||
|
### P3 — Substrate signal at the bottom shows the right numbers
|
||||||
|
|
||||||
|
**Problem:** J reports the bottom panel says "playbook memory empty, 80 traces 0 replies." Reality from the live endpoints: `/api/vectors/playbook_memory/stats` = 4,701 entries with embeddings; `/vectors/pathway/stats` = 88 traces, 11/11 replays.
|
||||||
|
|
||||||
|
**Fix:** find the renderer in search.html that builds the substrate signal panel; verify it's hitting the right endpoints and reading the right keys; fix shape mismatches.
|
||||||
|
|
||||||
|
**Done when:** bottom panel shows real numbers (4,701 playbooks, 88 traces, 11/11 replays) and references at least one specific recent operation from the playbook stats sample.
|
||||||
|
|
||||||
|
### P4 — Top nav reflects today's architecture
|
||||||
|
|
||||||
|
**Problem:** Walkthrough/Architecture/Spec/Onboard/Alerts/Workspaces tabs all return 200 but content is from old architecture. Doesn't mention: gateway scratchpad, memory indexer, ranker, mode runner, OpenCode 40-model fleet, distillation substrate, auditor cross-lineage.
|
||||||
|
|
||||||
|
**Fix:** rewrite `mcp-server/proof.html` (or add a single new page "What's running" that replaces Architecture+Spec) to describe what's actually shipped as of `demo-2026-04-27`. Keep one architecture page, drop redundancy. Either complete or hide Onboard/Alerts/Workspaces — J's call which.
|
||||||
|
|
||||||
|
**Done when:** the architecture page tells a non-technical reader, in 2 minutes, what each piece does in coordinator-relatable terms ("intern that read every email", not "3-stage adversarial inference pipeline").
|
||||||
|
|
||||||
|
### P5 — Caching for the project-index build_signal (J flagged unfinished)
|
||||||
|
|
||||||
|
**Problem:** "we never finished our caching for project index build signal it's not pulling new information." Need to find what `build_signal` refers to. Likely a scrum/auditor signal that should rebuild the `lakehouse_arch_v1` corpus on commit but isn't wired to.
|
||||||
|
|
||||||
|
**Fix:** identify the build-signal pipeline (likely in `auditor/` or `crates/vectord/`), wire its emit to a corpus rebuild, verify by making a test commit and watching the new chunk appear in `/vectors/indexes` for `lakehouse_arch_v1`.
|
||||||
|
|
||||||
|
**Done when:** committing a new file to `crates/` causes `lakehouse_arch_v1` chunk_count to increase within N minutes.
|
||||||
|
|
||||||
|
### P0 — Anchor the demo state (DONE)
|
||||||
|
|
||||||
|
Tagged `ed57eda` as `demo-2026-04-27`. Future sessions: `git checkout demo-2026-04-27` to land in this exact code state.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## EXECUTION ORDER
|
||||||
|
|
||||||
|
1. **P1 first** — biggest visible bug, ~30-60 min
|
||||||
|
2. **P2 next** — contractor click is the second-biggest "doesn't work" the client sees, ~20 min if profile is mostly done
|
||||||
|
3. **P3** — small fix, big "looks alive" win
|
||||||
|
4. **P4** — biggest scope; might split across sessions
|
||||||
|
5. **P5** — feature work, only after the visible bugs are fixed
|
||||||
|
|
||||||
|
Each item commits independently with the format `demo: P<n> — <one-line>` so the commit log doubles as a progress journal. After each merge to main, re-tag `demo-latest` to point at the new HEAD.
|
||||||
|
|
||||||
|
Stop here and let J pick which item to start with. Do not silently extend scope.
|
||||||
1000
data/headshots/manifest.jsonl
Normal file
1000
data/headshots/manifest.jsonl
Normal file
File diff suppressed because it is too large
Load Diff
@ -54,8 +54,25 @@ details .body{padding-top:10px;font-size:12px;color:#8b949e}
|
|||||||
.accent-g{border-left:3px solid #3fb950}
|
.accent-g{border-left:3px solid #3fb950}
|
||||||
.accent-r{border-left:3px solid #f85149}
|
.accent-r{border-left:3px solid #f85149}
|
||||||
|
|
||||||
.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px}
|
.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px;border-left:3px solid #30363d}
|
||||||
.worker .av{width:28px;height:28px;border-radius:6px;background:#1a2744;display:flex;align-items:center;justify-content:center;font-weight:600;color:#e6edf3;font-size:10px;flex-shrink:0}
|
.worker .av{width:32px;height:32px;border-radius:50%;background:#0d1117;border:1px solid #21262d;display:flex;align-items:center;justify-content:center;font-weight:600;color:#c9d1d9;font-size:11px;flex-shrink:0;letter-spacing:0.5px;overflow:hidden;position:relative}
|
||||||
|
.worker .av img{position:absolute;inset:0;width:100%;height:100%;object-fit:cover;display:block;
|
||||||
|
/* Softening — mirror of search.html. Pulls saturation + contrast off
|
||||||
|
the SDXL Turbo over-render so faces feel less "AI-generated".
|
||||||
|
If you tweak one, tweak the other. */
|
||||||
|
filter: saturate(0.86) contrast(0.93) brightness(1.02) blur(0.3px);
|
||||||
|
}
|
||||||
|
.worker[data-role-band="warehouse"]{border-left-color:#58a6ff}
|
||||||
|
.worker[data-role-band="production"]{border-left-color:#d29922}
|
||||||
|
.worker[data-role-band="trades"]{border-left-color:#bc8cff}
|
||||||
|
.worker[data-role-band="driver"]{border-left-color:#3fb950}
|
||||||
|
.worker[data-role-band="lead"]{border-left-color:#f0883e}
|
||||||
|
.role-pill{display:inline-block;font-size:9px;padding:1px 7px;border-radius:3px;background:#0d1117;color:#8b949e;margin-right:6px;font-weight:600;letter-spacing:0.4px;text-transform:uppercase;border-left:2px solid #30363d;vertical-align:1px}
|
||||||
|
.role-pill[data-rb="warehouse"]{border-left-color:#58a6ff;color:#79c0ff}
|
||||||
|
.role-pill[data-rb="production"]{border-left-color:#d29922;color:#e3b341}
|
||||||
|
.role-pill[data-rb="trades"]{border-left-color:#bc8cff;color:#d2a8ff}
|
||||||
|
.role-pill[data-rb="driver"]{border-left-color:#3fb950;color:#56d364}
|
||||||
|
.role-pill[data-rb="lead"]{border-left-color:#f0883e;color:#ffa657}
|
||||||
.worker .info{flex:1;min-width:0}
|
.worker .info{flex:1;min-width:0}
|
||||||
.worker .nm{color:#e6edf3;font-weight:500}
|
.worker .nm{color:#e6edf3;font-weight:500}
|
||||||
.worker .why{color:#545d68;font-size:11px;margin-top:1px}
|
.worker .why{color:#545d68;font-size:11px;margin-top:1px}
|
||||||
@ -199,6 +216,132 @@ var A=location.origin+P;
|
|||||||
// DOM helpers — all dynamic content goes through these. No innerHTML
|
// DOM helpers — all dynamic content goes through these. No innerHTML
|
||||||
// anywhere in the script; every API-derived string passes through
|
// anywhere in the script; every API-derived string passes through
|
||||||
// textContent so no injection path regardless of upstream data.
|
// textContent so no injection path regardless of upstream data.
|
||||||
|
// Role classification — mirrors search.html, no emojis. Maps role
|
||||||
|
// strings to a band+label used by the worker-card border + role pill.
|
||||||
|
var ROLE_BANDS = [
|
||||||
|
{ match: /forklift|warehouse|associate|material\s*handler|loader|loading|packag|shipping|logistics|inventory|sanitation|janit/i, band: 'warehouse', label: 'Warehouse' },
|
||||||
|
{ match: /production|assembl/i, band: 'production', label: 'Production' },
|
||||||
|
{ match: /welder|weld|electric|maint(enance)?\s*tech|cnc|machine\s*op|hvac|plumb|carpenter|mason/i, band: 'trades', label: 'Skilled Trade' },
|
||||||
|
{ match: /driver|truck|haul|cdl/i, band: 'driver', label: 'Driver' },
|
||||||
|
{ match: /line\s*lead|supervisor|foreman|coordinator/i, band: 'lead', label: 'Lead' },
|
||||||
|
{ match: /quality/i, band: 'production', label: 'Quality' },
|
||||||
|
];
|
||||||
|
function roleBand(role){
|
||||||
|
if(!role) return { band: 'warehouse', label: '' };
|
||||||
|
for (var i = 0; i < ROLE_BANDS.length; i++) {
|
||||||
|
if (ROLE_BANDS[i].match.test(role)) return ROLE_BANDS[i];
|
||||||
|
}
|
||||||
|
return { band: 'warehouse', label: role.split(' ')[0].toUpperCase().slice(0, 12) };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build a sober worker card: monogram avatar + colored role band on
|
||||||
|
// the left edge + uppercase role pill in the detail line. Used by
|
||||||
|
// every chapter that renders worker rows. `name` and `role` drive the
|
||||||
|
// classification; `detail` is the full text after the pill.
|
||||||
|
// Quick first-name → gender hint for face-pool selection. Same lookup
|
||||||
|
// idea as the dashboard; if the name is unknown, the server falls back
|
||||||
|
// to the full pool. Trimmed table — covers the most common names that
|
||||||
|
// appear in the synthetic worker data.
|
||||||
|
var FEMALE_NAMES = new Set(['Mary','Patricia','Jennifer','Linda','Elizabeth','Barbara','Susan','Jessica','Sarah','Karen','Lisa','Nancy','Betty','Sandra','Margaret','Ashley','Kimberly','Emily','Donna','Michelle','Carol','Amanda','Melissa','Deborah','Stephanie','Dorothy','Rebecca','Sharon','Laura','Cynthia','Amy','Kathleen','Angela','Shirley','Brenda','Emma','Anna','Pamela','Nicole','Samantha','Katherine','Christine','Helen','Debra','Rachel','Carolyn','Janet','Maria','Catherine','Heather','Diane','Olivia','Julie','Joyce','Victoria','Ruth','Virginia','Lauren','Kelly','Christina','Joan','Evelyn','Judith','Andrea','Hannah','Megan','Cheryl','Jacqueline','Martha','Madison','Teresa','Gloria','Sara','Janice','Ann','Kathryn','Abigail','Sophia','Frances','Jean','Alice','Judy','Isabella','Julia','Grace','Amber','Denise','Danielle','Marilyn','Beverly','Charlotte','Natalie','Theresa','Diana','Brittany','Kayla','Alexis','Lori','Marie','Carmen','Aisha','Rosa','Mia','Audrey','Erin','Tina','Vanessa','Tara','Wendy','Tanya','Maya','Crystal','Yvonne','Kara','Shannon','Brianna','Faith','Caroline','Carla','Tracey','Tracy','Rita','Dawn','Tiffany','Stacy','Stacey','Gina','Bonnie','Tammy','Joanne','Jamie','Tonya','Alyssa','Ariana','Elena','Ellie','Erica','Erika','Felicia','Holly','Jenna','Jenny','Krista','Kristen','Kristin','Krystal','Lana','Leah','Lucy','Mallory','Melinda','Meredith','Misty','Monica','Naomi','Paige','Paula','Renee','Rhonda','Robin','Roxanne','Selena','Sierra','Skylar','Sonia','Stella','Tamara','Veronica','Vivian','Whitney','Yolanda','Zoe']);
|
||||||
|
var MALE_NAMES = new Set(['James','Robert','John','Michael','David','William','Richard','Joseph','Thomas','Charles','Christopher','Daniel','Matthew','Anthony','Mark','Donald','Steven','Paul','Andrew','Joshua','Kenneth','Kevin','Brian','George','Edward','Ronald','Timothy','Jason','Jeffrey','Ryan','Jacob','Gary','Nicholas','Eric','Jonathan','Stephen','Larry','Justin','Scott','Brandon','Benjamin','Samuel','Gregory','Frank','Alexander','Raymond','Patrick','Jack','Dennis','Jerry','Tyler','Aaron','Jose','Adam','Henry','Nathan','Douglas','Zachary','Peter','Kyle','Walter','Ethan','Jeremy','Harold','Keith','Christian','Roger','Noah','Gerald','Carl','Terry','Sean','Austin','Arthur','Lawrence','Jesse','Dylan','Bryan','Joe','Jordan','Billy','Bruce','Albert','Willie','Gabriel','Logan','Alan','Juan','Wayne','Roy','Ralph','Randy','Eugene','Vincent','Russell','Elijah','Louis','Bobby','Philip','Johnny','Marcus','Antonio','Carlos','Diego','Hector','Jorge','Julio','Manuel','Miguel','Pedro','Raul','Ricardo','Roberto','Sergio','Victor','Jamal','Xavier','DeShawn','Dwayne','Jermaine','Malik','Tyrone','Devon','Andre','Brent','Calvin','Casey','Cody','Cole','Cory','Dale','Damon','Darius','Darrell','Dean','Derek','Drew','Earl','Eddie','Floyd','Glenn','Greg','Howard','Ivan','Jared','Jay','Jeff','Joel','Lance','Lee','Leonard','Lloyd','Mario','Martin','Mason','Maurice','Max','Mitchell','Morgan','Nick','Norman','Oliver','Owen','Pete','Quincy','Rafael','Reggie','Rex','Ricky','Russ','Shane','Shaun','Stanley','Steve','Theodore','Todd','Travis','Trevor','Troy','Wade','Warren','Wesley']);
|
||||||
|
function guessGenderFromFirstName(n){
|
||||||
|
if(!n) return null;
|
||||||
|
var clean=n.replace(/[^A-Za-z]/g,'');
|
||||||
|
if(!clean) return null;
|
||||||
|
var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
|
||||||
|
if(FEMALE_NAMES.has(c)) return 'woman';
|
||||||
|
if(MALE_NAMES.has(c)) return 'man';
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
function genderFor(name){
|
||||||
|
var g = guessGenderFromFirstName(name);
|
||||||
|
if(g) return g;
|
||||||
|
if(!name) return 'man';
|
||||||
|
var s=String(name); var h=0;
|
||||||
|
for(var i=0;i<s.length;i++) h=(h*31+s.charCodeAt(i))|0;
|
||||||
|
return (Math.abs(h)&1)?'man':'woman';
|
||||||
|
}
|
||||||
|
// Confident first-name → ethnicity. Synthetic data — we own the call.
|
||||||
|
var NAMES_SOUTH_ASIAN_C=new Set(['Raj','Anil','Rohan','Vikram','Arjun','Sanjay','Ravi','Krishna','Pradeep','Sunil','Amit','Deepak','Ashok','Manoj','Rahul','Vijay','Suresh','Naveen','Anand','Nikhil','Aditya','Karan','Rajesh','Priya','Anjali','Neha','Kavya','Pooja','Divya','Meera','Lakshmi','Rani','Asha','Saanvi','Aanya','Aaradhya','Shreya','Riya','Tanvi','Ishita','Aarav','Ishaan','Shivani']);
|
||||||
|
var NAMES_EAST_ASIAN_C=new Set(['Wei','Mei','Yi','Jin','Chen','Lin','Liu','Wang','Zhang','Yang','Wu','Zhao','Sun','Hiroshi','Yuki','Akira','Kenji','Sakura','Aiko','Haruto','Sora','Hyun','Eun','Yoon','Kai','Long','Hong','Xiu','Lan','Hua','Hao','Tao','Bao','Cheng','Feng','Jian','Dong','Bin','Min','Lei','Hui','Yu','Xin','Ying','Zhen','Yuan','Yan']);
|
||||||
|
var NAMES_HISPANIC_C=new Set(['Carmen','Carlos','Maria','Diego','Hector','Jorge','Julio','Manuel','Miguel','Pedro','Raul','Ricardo','Roberto','Sergio','Antonio','Esperanza','Luz','Sofia','Lucia','Isabella','Camila','Valentina','Mariana','Elena','Rosa','Catalina','Esteban','Fernando','Eduardo','Javier','Alejandro','Andres','Mateo','Santiago','Sebastian','Emilio','Tomas','Cristina','Daniela','Gabriela','Ximena','Adriana','Beatriz','Pilar','Mercedes','Xavier','Marisol','Guadalupe','Lupita','Inez','Itzel','Yesenia','Joaquin','Ignacio','Rafael','Salvador','Cesar','Arturo','Armando','Hugo','Marco','Alejandra','Felipe','Gerardo','Jaime','Leonardo','Luis','Pablo','Ramon']);
|
||||||
|
var NAMES_BLACK_C=new Set(['DeShawn','Jamal','Aisha','Latoya','Tyrone','Malik','Imani','Keisha','Tariq','Lakisha','Kenya','Tamika','Andre','Marcus','Demetrius','Jermaine','Reggie','Tyrese','Darius','Trevon','Kareem','Damon','Jalen','Jaylen','Dwayne','DaQuan','Aaliyah','Kiara','Janelle','Jasmine','Tanisha','Maurice','Tyrell','Kwame','Khalil','Terrell','Cedric','Nia','Zuri','Jada','Ebony','Dominique']);
|
||||||
|
var NAMES_MIDDLE_EASTERN_C=new Set(['Layla','Omar','Khalid','Fatima','Yasmin','Hassan','Hussein','Ahmed','Mohamed','Mohammed','Ali','Karim','Yusuf','Yara','Nadia','Zainab','Rania','Samira','Mariam','Salma','Ibrahim','Mahmoud','Saif','Anwar','Bilal','Faisal','Hamza','Imran','Sami','Wael','Zaid','Amira','Iman','Lina','Mona','Noor','Rana','Soha','Zara']);
|
||||||
|
// Surname → ethnicity. Surname is more diagnostic than first name
|
||||||
|
// for hispanic and asian — "Anna Cruz" is hispanic via surname.
|
||||||
|
var SURNAMES_HISPANIC_C=new Set(['Garcia','Rodriguez','Martinez','Hernandez','Lopez','Gonzalez','Perez','Sanchez','Ramirez','Torres','Flores','Rivera','Gomez','Diaz','Reyes','Cruz','Morales','Ortiz','Gutierrez','Chavez','Ramos','Ruiz','Alvarez','Mendoza','Vasquez','Castillo','Jimenez','Moreno','Romero','Herrera','Medina','Aguilar','Vargas','Castro','Fernandez','Guzman','Munoz','Salazar','Ortega','Delgado','Estrada','Ayala','Pena','Cabrera','Alvarado','Espinoza','Padilla','Cardenas','Cortes','Ibarra','Vega','Soto','Lara','Navarro','Campos','Acosta','Rios','Marquez','Sandoval','Maldonado','Solis','Rojas','Mejia','Beltran','Cervantes','Lozano','Carrillo','Trevino','Robles','Tapia','Lugo']);
|
||||||
|
var SURNAMES_SOUTH_ASIAN_C=new Set(['Patel','Singh','Kumar','Sharma','Gupta','Shah','Mehta','Desai','Joshi','Reddy','Nair','Iyer','Verma','Agarwal','Kapoor','Chopra','Malhotra','Banerjee','Chatterjee','Mukherjee','Das','Sen','Bose','Roy','Sinha','Trivedi','Pandey','Mishra','Tiwari','Yadav','Chauhan','Rana','Thakur','Pillai','Menon','Krishnan','Rao','Naidu','Pradhan','Acharya','Devi','Kaur']);
|
||||||
|
var SURNAMES_EAST_ASIAN_C=new Set(['Chen','Wang','Li','Liu','Yang','Huang','Zhao','Wu','Zhou','Xu','Zhu','Sun','Ma','Lin','Lee','Kim','Park','Choi','Jung','Kang','Cho','Yoon','Han','Lim','Oh','Nakamura','Tanaka','Suzuki','Yamamoto','Sato','Watanabe','Takahashi','Kobayashi','Yoshida','Saito','Nguyen','Tran','Le','Pham','Hoang','Phan','Vu','Vo','Dang','Bui','Do','Ngo','Truong','Mai','Cao','Wong','Tang','Tan','Cheng','Lau','Leung','Ng','Cheung','Yip','Hsu','Tsai','Hsieh']);
|
||||||
|
var SURNAMES_MIDDLE_EASTERN_C=new Set(['Khan','Ahmed','Hussein','Hassan','Ali','Mahmoud','Mohamed','Mohammed','Saleh','Aziz','Karim','Hamad','Najjar','Haddad','Khoury','Mansour','Rahman','Iqbal','Malik','Sheikh','Siddiqui','Qureshi','Saeed']);
|
||||||
|
|
||||||
|
function guessEthnicityFromName(first, last){
|
||||||
|
if(last){
|
||||||
|
var s=last.replace(/[^A-Za-z]/g,'');
|
||||||
|
if(s){
|
||||||
|
var sc=s[0].toUpperCase()+s.slice(1).toLowerCase();
|
||||||
|
if(SURNAMES_HISPANIC_C.has(sc)) return 'hispanic';
|
||||||
|
if(SURNAMES_MIDDLE_EASTERN_C.has(sc)) return 'middle_eastern';
|
||||||
|
if(SURNAMES_SOUTH_ASIAN_C.has(sc)) return 'south_asian';
|
||||||
|
if(SURNAMES_EAST_ASIAN_C.has(sc)) return 'east_asian';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if(first){
|
||||||
|
var clean=first.replace(/[^A-Za-z]/g,'');
|
||||||
|
if(clean){
|
||||||
|
var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
|
||||||
|
if(NAMES_MIDDLE_EASTERN_C.has(c)) return 'middle_eastern';
|
||||||
|
if(NAMES_BLACK_C.has(c)) return 'black';
|
||||||
|
if(NAMES_HISPANIC_C.has(c)) return 'hispanic';
|
||||||
|
if(NAMES_SOUTH_ASIAN_C.has(c)) return 'south_asian';
|
||||||
|
if(NAMES_EAST_ASIAN_C.has(c)) return 'east_asian';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 'caucasian';
|
||||||
|
}
|
||||||
|
function guessEthnicityFromFirstName(n){
|
||||||
|
if(!n) return 'caucasian';
|
||||||
|
var clean=n.replace(/[^A-Za-z]/g,''); if(!clean) return 'caucasian';
|
||||||
|
var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
|
||||||
|
if(NAMES_MIDDLE_EASTERN_C.has(c)) return 'middle_eastern';
|
||||||
|
if(NAMES_BLACK_C.has(c)) return 'black';
|
||||||
|
if(NAMES_HISPANIC_C.has(c)) return 'hispanic';
|
||||||
|
if(NAMES_SOUTH_ASIAN_C.has(c)) return 'south_asian';
|
||||||
|
if(NAMES_EAST_ASIAN_C.has(c)) return 'east_asian';
|
||||||
|
return 'caucasian';
|
||||||
|
}
|
||||||
|
|
||||||
|
function workerRow(name, role, detail, opts){
|
||||||
|
opts = opts || {};
|
||||||
|
var band = roleBand(role||'');
|
||||||
|
var w = el('div','worker');
|
||||||
|
if(band.band) w.dataset.roleBand = band.band;
|
||||||
|
var initials = (name||'?').split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
|
||||||
|
var av = el('div','av',initials);
|
||||||
|
// Headshot insertion removed 2026-04-28. The .av element stays as
|
||||||
|
// a monogram-initials avatar.
|
||||||
|
w.appendChild(av);
|
||||||
|
var info = el('div','info');
|
||||||
|
var nm = el('div','nm', name||'?');
|
||||||
|
if(opts.endorsed){
|
||||||
|
nm.appendChild(el('span','boost-chip',opts.endorsed));
|
||||||
|
}
|
||||||
|
info.appendChild(nm);
|
||||||
|
var why = el('div','why');
|
||||||
|
if(band.label){
|
||||||
|
var pill = document.createElement('span'); pill.className='role-pill';
|
||||||
|
pill.dataset.rb = band.band;
|
||||||
|
pill.textContent = band.label;
|
||||||
|
why.appendChild(pill);
|
||||||
|
}
|
||||||
|
why.appendChild(document.createTextNode(detail||''));
|
||||||
|
info.appendChild(why);
|
||||||
|
w.appendChild(info);
|
||||||
|
if(opts.score){
|
||||||
|
w.appendChild(el('div','score', opts.score));
|
||||||
|
}
|
||||||
|
return w;
|
||||||
|
}
|
||||||
|
|
||||||
function el(tag, cls, text){
|
function el(tag, cls, text){
|
||||||
var e=document.createElement(tag);
|
var e=document.createElement(tag);
|
||||||
if(cls) e.className=cls;
|
if(cls) e.className=cls;
|
||||||
@ -380,21 +523,13 @@ function loadChapter4(){
|
|||||||
|
|
||||||
var list=document.createElement('div');list.style.marginTop='6px';
|
var list=document.createElement('div');list.style.marginTop='6px';
|
||||||
(prop.candidates||[]).slice(0,5).forEach(function(cand,i){
|
(prop.candidates||[]).slice(0,5).forEach(function(cand,i){
|
||||||
var w=el('div','worker');
|
var detail = cand.doc_id+' · '+(cand.playbook_boost>0?'boosted +'+cand.playbook_boost.toFixed(3)+' by memory · ':'')+'semantic score '+(cand.score||0).toFixed(3);
|
||||||
var initials=(cand.name||'?').split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
|
var endorsed = (cand.playbook_boost||0) > 0
|
||||||
w.appendChild(el('div','av',initials));
|
? 'Endorsed · '+((cand.playbook_citations||[]).length)+' past fill'+((cand.playbook_citations||[]).length!==1?'s':'')
|
||||||
var info=el('div','info');
|
: null;
|
||||||
var nm=el('div','nm',cand.name||cand.doc_id||'?');
|
list.appendChild(workerRow(cand.name||cand.doc_id||'?', prop.role||'', detail, {
|
||||||
if((cand.playbook_boost||0)>0){
|
endorsed: endorsed, score: '#'+(i+1)
|
||||||
var ncit=(cand.playbook_citations||[]).length;
|
}));
|
||||||
nm.appendChild(el('span','boost-chip','Endorsed · '+ncit+' past fill'+(ncit!==1?'s':'')));
|
|
||||||
}
|
|
||||||
info.appendChild(nm);
|
|
||||||
var why=cand.doc_id+' · '+(cand.playbook_boost>0?'boosted +'+cand.playbook_boost.toFixed(3)+' by memory · ':'')+'semantic score '+(cand.score||0).toFixed(3);
|
|
||||||
info.appendChild(el('div','why',why));
|
|
||||||
w.appendChild(info);
|
|
||||||
w.appendChild(el('div','score','#'+(i+1)));
|
|
||||||
list.appendChild(w);
|
|
||||||
});
|
});
|
||||||
card.appendChild(list);
|
card.appendChild(list);
|
||||||
|
|
||||||
@ -628,12 +763,8 @@ function loadChapter8(){
|
|||||||
bfHdr.textContent='✓ '+d.backfills.length+' local '+(d.worker.role||'workers')+' available — sorted by responsiveness';
|
bfHdr.textContent='✓ '+d.backfills.length+' local '+(d.worker.role||'workers')+' available — sorted by responsiveness';
|
||||||
host.appendChild(bfHdr);
|
host.appendChild(bfHdr);
|
||||||
d.backfills.slice(0,5).forEach(function(c){
|
d.backfills.slice(0,5).forEach(function(c){
|
||||||
var row=el('div','row');
|
var detail=(c.role||'?')+' · '+(c.city||'')+', '+(c.state||'')+' · rel '+Math.round((c.rel||0)*100)+'% · resp '+Math.round((c.resp||0)*100)+'%';
|
||||||
var left=document.createElement('div');left.style.flex='1';left.style.minWidth='0';
|
host.appendChild(workerRow(c.name||'?', c.role||'', detail));
|
||||||
left.appendChild(el('div','title',c.name));
|
|
||||||
left.appendChild(el('div','meta',(c.role||'?')+' · '+(c.city||'')+', '+(c.state||'')+' · rel '+Math.round((c.rel||0)*100)+'% · resp '+Math.round((c.resp||0)*100)+'%'));
|
|
||||||
row.appendChild(left);
|
|
||||||
host.appendChild(row);
|
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
var narr=el('div','narr');
|
var narr=el('div','narr');
|
||||||
@ -675,23 +806,16 @@ function runTry(){
|
|||||||
|
|
||||||
var workers=d.sql_results||d.vector_results||d.results||[];
|
var workers=d.sql_results||d.vector_results||d.results||[];
|
||||||
workers.slice(0,5).forEach(function(w,i){
|
workers.slice(0,5).forEach(function(w,i){
|
||||||
var row=el('div','worker');
|
|
||||||
var nm=w.name||(w.text||'').split('—')[0].trim()||w.doc_id||'?';
|
var nm=w.name||(w.text||'').split('—')[0].trim()||w.doc_id||'?';
|
||||||
var initials=nm.split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
|
|
||||||
row.appendChild(el('div','av',initials));
|
|
||||||
var info=el('div','info');
|
|
||||||
var n=el('div','nm',nm);
|
|
||||||
if((w.playbook_boost||0)>0){
|
|
||||||
n.appendChild(el('span','boost-chip','Endorsed · '+((w.playbook_citations||[]).length||'?')+' past fill(s)'));
|
|
||||||
}
|
|
||||||
info.appendChild(n);
|
|
||||||
var bits=[];
|
var bits=[];
|
||||||
if(w.role) bits.push(w.role);
|
if(w.role) bits.push(w.role);
|
||||||
if(w.city&&w.state) bits.push(w.city+', '+w.state);
|
if(w.city&&w.state) bits.push(w.city+', '+w.state);
|
||||||
if(w.rel!==undefined) bits.push('reliability '+Math.round(w.rel*100)+'%');
|
if(w.rel!==undefined) bits.push('reliability '+Math.round(w.rel*100)+'%');
|
||||||
if(w.avail!==undefined) bits.push('availability '+Math.round(w.avail*100)+'%');
|
if(w.avail!==undefined) bits.push('availability '+Math.round(w.avail*100)+'%');
|
||||||
info.appendChild(el('div','why',bits.join(' · ')||'AI semantic match'));
|
var endorsed = (w.playbook_boost||0) > 0
|
||||||
row.appendChild(info);
|
? 'Endorsed · '+((w.playbook_citations||[]).length||'?')+' past fill(s)'
|
||||||
|
: null;
|
||||||
|
var row = workerRow(nm, w.role||'', bits.join(' · ')||'AI semantic match', { endorsed: endorsed });
|
||||||
row.appendChild(el('div','score','#'+(i+1)));
|
row.appendChild(el('div','score','#'+(i+1)));
|
||||||
card.appendChild(row);
|
card.appendChild(row);
|
||||||
});
|
});
|
||||||
|
|||||||
123
mcp-server/icon_recipes.ts
Normal file
123
mcp-server/icon_recipes.ts
Normal file
@ -0,0 +1,123 @@
|
|||||||
|
// Visual filler iconography rendered through ComfyUI. Distinct from
|
||||||
|
// role_scenes.ts (which renders portraits) — these are object/badge
|
||||||
|
// style renders that fill dead space on worker cards: cert pills,
|
||||||
|
// role-prop chips, hazard indicators, empty-state heroes.
|
||||||
|
//
|
||||||
|
// Layout on disk:
|
||||||
|
// data/icons_pool/{category}/{slug}.webp
|
||||||
|
//
|
||||||
|
// Cache invalidation:
|
||||||
|
// ICONS_VERSION mixes into the on-disk filename (slug includes
|
||||||
|
// version). Bump it after editing a recipe so prior renders are
|
||||||
|
// ignored on next view.
|
||||||
|
|
||||||
|
export type IconCategory = "cert" | "role_prop" | "status" | "hazard" | "empty";
|
||||||
|
|
||||||
|
export interface IconRecipe {
|
||||||
|
slug: string;
|
||||||
|
category: IconCategory;
|
||||||
|
// Text label that appears next to / under the icon. The front-end
|
||||||
|
// already renders this text in cert pills; the icon is supplementary.
|
||||||
|
display: string;
|
||||||
|
// Full diffusion prompt. Style guidance baked in. SDXL Turbo at 8
|
||||||
|
// steps reliably produces clean macro photography, so default to
|
||||||
|
// photographic prop shots over flat-vector illustrations (the model
|
||||||
|
// hallucinates noise into flat-vector geometry at low step counts).
|
||||||
|
prompt: string;
|
||||||
|
// Negative prompt — what NOT to render. Crucial for icons because
|
||||||
|
// SDXL likes to add hands/text/people unprompted.
|
||||||
|
negative?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Default negative prompt baked into every icon render unless the
|
||||||
|
// recipe overrides. Empirically, these terms are the top SDXL Turbo
|
||||||
|
// off-style failures.
|
||||||
|
export const DEFAULT_NEGATIVE =
|
||||||
|
"people, hands, faces, blurry, low quality, watermark, signature, "
|
||||||
|
+ "logos, copyright, distorted text, garbled letters, multiple objects";
|
||||||
|
|
||||||
|
// TODO J — review and tune the prompts here. Each one is what diffusion
|
||||||
|
// sees verbatim. The visual decision: photographic prop shots (macro
|
||||||
|
// photo of an actual badge / placard / sticker) vs flat-icon vector
|
||||||
|
// style. Default below is photographic — matches the worker headshot
|
||||||
|
// aesthetic. Flip a recipe to flat-vector by replacing "macro photograph"
|
||||||
|
// with "flat icon illustration on solid color background, minimal vector".
|
||||||
|
//
|
||||||
|
// Visual cues that work well in SDXL Turbo at 8 steps:
|
||||||
|
// - "macro photograph", "isolated on plain background", "studio lighting"
|
||||||
|
// - Concrete colors ("orange and black warning diamond") not adjectives
|
||||||
|
// - Avoid: small text in the prompt (model garbles it), specific brand
|
||||||
|
// names (creates fake logos), detailed scene composition
|
||||||
|
const CERT_ICONS: IconRecipe[] = [
|
||||||
|
{ slug: "osha-10", category: "cert", display: "OSHA-10",
|
||||||
|
prompt: "macro photograph of a circular yellow safety badge with a black hard hat icon at center, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "osha-30", category: "cert", display: "OSHA-30",
|
||||||
|
prompt: "macro photograph of a circular orange safety badge with a black hard hat icon at center, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "first-aid-cpr", category: "cert", display: "First Aid/CPR",
|
||||||
|
prompt: "macro photograph of a small enamel pin badge featuring a bold red cross on a white circular background, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "hazmat", category: "cert", display: "Hazmat",
|
||||||
|
prompt: "macro photograph of a HAZMAT warning placard, bold orange and black diamond shape with a flame icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "forklift", category: "cert", display: "Forklift",
|
||||||
|
prompt: "macro photograph of a yellow industrial forklift safety badge with a forklift silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "reach-truck", category: "cert", display: "Reach Truck",
|
||||||
|
prompt: "macro photograph of a navy blue industrial certification badge with a warehouse reach-truck silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "order-picker", category: "cert", display: "Order Picker",
|
||||||
|
prompt: "macro photograph of a green industrial certification badge with a warehouse order-picker silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "lockout-tagout", category: "cert", display: "Lockout/Tagout",
|
||||||
|
prompt: "macro photograph of a bright red padlock tag with a danger warning, hanging on a metal industrial valve, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "msds", category: "cert", display: "MSDS",
|
||||||
|
prompt: "macro photograph of a folded chemical safety data sheet booklet with chemical hazard pictograms visible on cover, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "confined-space", category: "cert", display: "Confined Space",
|
||||||
|
prompt: "macro photograph of a yellow confined space warning sign featuring a manhole entry icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "servsafe", category: "cert", display: "ServSafe",
|
||||||
|
prompt: "macro photograph of a dark green food safety certification badge featuring a stylized chef hat icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "fire-safety", category: "cert", display: "Fire Safety",
|
||||||
|
prompt: "macro photograph of a red enamel pin badge featuring a flame icon and a fire extinguisher silhouette, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "iso-9001", category: "cert", display: "ISO 9001",
|
||||||
|
prompt: "macro photograph of a deep blue circular quality-management certification seal with embossed metallic ring, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
];
|
||||||
|
|
||||||
|
// Role-band visual chips — small icons that go in the role pill area.
|
||||||
|
// One per band, optional inline supplement to the existing colored pill.
|
||||||
|
const ROLE_PROP_ICONS: IconRecipe[] = [
|
||||||
|
{ slug: "warehouse", category: "role_prop", display: "Warehouse",
|
||||||
|
prompt: "macro photograph of a yellow hard hat with a high-visibility safety vest folded behind it, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "production", category: "role_prop", display: "Production",
|
||||||
|
prompt: "macro photograph of a navy blue work shirt and protective safety glasses on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "trades", category: "role_prop", display: "Trades",
|
||||||
|
prompt: "macro photograph of a leather work glove and a small adjustable wrench on a neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "driver", category: "role_prop", display: "Driver",
|
||||||
|
prompt: "macro photograph of a navy delivery driver baseball cap and a clipboard manifest on a neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
{ slug: "lead", category: "role_prop", display: "Lead",
|
||||||
|
prompt: "macro photograph of a tablet showing a bar chart and a high-vis vest folded beside it on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||||
|
];
|
||||||
|
|
||||||
|
export const ICONS: Record<string, IconRecipe> = Object.fromEntries(
|
||||||
|
[...CERT_ICONS, ...ROLE_PROP_ICONS].map((r) => [`${r.category}/${r.slug}`, r]),
|
||||||
|
);
|
||||||
|
|
||||||
|
// v2 — 256×256 canvas, intended to be displayed monochrome via CSS
|
||||||
|
// `filter: grayscale(1)`. Smaller canvas, tighter crops, crisper at
|
||||||
|
// 14px display size.
|
||||||
|
export const ICONS_VERSION = "v2";
|
||||||
|
|
||||||
|
// Map a free-form cert string from the data ("First Aid/CPR",
|
||||||
|
// "OSHA-10", "Lockout/Tagout") to the canonical slug used here.
|
||||||
|
// Returns null if no recipe matches.
|
||||||
|
export function certToSlug(cert: string): string | null {
|
||||||
|
const c = (cert || "").trim().toLowerCase().replace(/\s+/g, "-");
|
||||||
|
if (c === "osha-10") return "osha-10";
|
||||||
|
if (c === "osha-30") return "osha-30";
|
||||||
|
if (c.startsWith("first") || c.includes("cpr")) return "first-aid-cpr";
|
||||||
|
if (c === "hazmat" || c.startsWith("hazwoper")) return "hazmat";
|
||||||
|
if (c === "forklift" || c.startsWith("pit")) return "forklift";
|
||||||
|
if (c.startsWith("reach")) return "reach-truck";
|
||||||
|
if (c.startsWith("order")) return "order-picker";
|
||||||
|
if (c.startsWith("lockout") || c.includes("tagout")) return "lockout-tagout";
|
||||||
|
if (c === "msds" || c.startsWith("ghs")) return "msds";
|
||||||
|
if (c.startsWith("confined")) return "confined-space";
|
||||||
|
if (c === "servsafe") return "servsafe";
|
||||||
|
if (c.startsWith("fire")) return "fire-safety";
|
||||||
|
if (c.startsWith("iso")) return "iso-9001";
|
||||||
|
return null;
|
||||||
|
}
|
||||||
@ -19,6 +19,8 @@ import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
|
|||||||
import { z } from "zod";
|
import { z } from "zod";
|
||||||
import { startTrace, logSpan, logGeneration, scoreTrace, flush as flushTraces } from "./tracing.js";
|
import { startTrace, logSpan, logGeneration, scoreTrace, flush as flushTraces } from "./tracing.js";
|
||||||
import { buildPermitBrief } from "./entity.js";
|
import { buildPermitBrief } from "./entity.js";
|
||||||
|
import { roleBand, SCENES, SCENES_VERSION, FACE_RENDER_DIM, type RoleBand } from "./role_scenes.js";
|
||||||
|
import { ICONS, ICONS_VERSION, DEFAULT_NEGATIVE, certToSlug, type IconRecipe } from "./icon_recipes.js";
|
||||||
|
|
||||||
const BASE = process.env.LAKEHOUSE_URL || "http://localhost:3100";
|
const BASE = process.env.LAKEHOUSE_URL || "http://localhost:3100";
|
||||||
const PORT = parseInt(process.env.MCP_PORT || "3700");
|
const PORT = parseInt(process.env.MCP_PORT || "3700");
|
||||||
@ -1225,6 +1227,358 @@ async function main() {
|
|||||||
// OSHA national, Chicago history, ticker chart, parent link,
|
// OSHA national, Chicago history, ticker chart, parent link,
|
||||||
// federal contracts, debarment, unions, training. Click any
|
// federal contracts, debarment, unions, training. Click any
|
||||||
// contractor name in a permit Entity Brief to land here.
|
// contractor name in a permit Entity Brief to land here.
|
||||||
|
// ComfyUI-generated portrait — every call is unique by (key,
|
||||||
|
// gender, race, age, role) tuple. First hit takes ~1.5s on
|
||||||
|
// the A4000; subsequent hits read from disk. Use this for
|
||||||
|
// contractor / profile modal where one worker gets the
|
||||||
|
// spotlight. NB: declared BEFORE the pool route so the prefix
|
||||||
|
// match doesn't intercept it.
|
||||||
|
// Single source of truth for the pre-render script. Read
|
||||||
|
// role_scenes.ts SCENES + SCENES_VERSION so a Python pre-render
|
||||||
|
// job (scripts/staffing/render_role_pool.py) builds the role-
|
||||||
|
// aware pool with the exact prompts the server will use on the
|
||||||
|
// ComfyUI hot-path. No drift.
|
||||||
|
if (url.pathname === "/headshots/_scenes" && req.method === "GET") {
|
||||||
|
return Response.json({ version: SCENES_VERSION, scenes: SCENES });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Single source of truth for icon_recipes.ts. Used by the
|
||||||
|
// pre-render script (scripts/staffing/render_icons.py) and any
|
||||||
|
// tooling that wants to enumerate available icons.
|
||||||
|
if (url.pathname === "/icons/_recipes" && req.method === "GET") {
|
||||||
|
return Response.json({
|
||||||
|
version: ICONS_VERSION,
|
||||||
|
default_negative: DEFAULT_NEGATIVE,
|
||||||
|
recipes: ICONS,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Free-text cert resolver: front-end passes the raw cert string
|
||||||
|
// from the data ("First Aid/CPR", "OSHA-10", "Lockout/Tagout")
|
||||||
|
// and we resolve to a recipe slug + 302 to the cached/rendered
|
||||||
|
// icon. Returns 404 (not error) when no recipe matches — the
|
||||||
|
// front-end can hang an `onerror="this.remove()"` to silently
|
||||||
|
// drop the img tag for unrecognized certs.
|
||||||
|
if (url.pathname === "/icons/cert" && req.method === "GET") {
|
||||||
|
const text = url.searchParams.get("text") || "";
|
||||||
|
const slug = certToSlug(text);
|
||||||
|
if (!slug) return new Response(`no recipe for cert: ${text}`, { status: 404 });
|
||||||
|
return new Response(null, {
|
||||||
|
status: 302,
|
||||||
|
headers: { "Location": `/icons/render/cert/${slug}` },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cert / role-prop / status / hazard / empty icons. Lookup is
|
||||||
|
// category/slug; on cache miss the route renders via ComfyUI.
|
||||||
|
// Filename layout: data/icons_pool/{category}/{slug}_{version}.webp
|
||||||
|
// — the version suffix means editing a recipe yields a new file
|
||||||
|
// rather than overwriting in place, so a misfire is recoverable.
|
||||||
|
if (url.pathname.startsWith("/icons/render/") && req.method === "GET") {
|
||||||
|
const rest = url.pathname.slice("/icons/render/".length);
|
||||||
|
const recipe: IconRecipe | undefined = ICONS[rest];
|
||||||
|
if (!recipe) return new Response(`unknown icon: ${rest}`, { status: 404 });
|
||||||
|
const ICONS_DIR = "/home/profit/lakehouse/data/icons_pool";
|
||||||
|
await Bun.$`mkdir -p ${ICONS_DIR}/${recipe.category}`.quiet();
|
||||||
|
const cachePath = `${ICONS_DIR}/${recipe.category}/${recipe.slug}_${ICONS_VERSION}.webp`;
|
||||||
|
const cached = Bun.file(cachePath);
|
||||||
|
if (await cached.exists()) {
|
||||||
|
return new Response(cached, {
|
||||||
|
headers: {
|
||||||
|
"Content-Type": "image/webp",
|
||||||
|
"Cache-Control": "public, max-age=86400",
|
||||||
|
"X-Icon-Source": "cached",
|
||||||
|
"X-Icon-Recipe": recipe.slug,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
// Deterministic seed per recipe — same recipe always renders
|
||||||
|
// the same icon. Mixing the version means SCENES_VERSION-
|
||||||
|
// style invalidation works for icons too.
|
||||||
|
const seedStr = `${recipe.category}|${recipe.slug}|${ICONS_VERSION}`;
|
||||||
|
let seed = 5381;
|
||||||
|
for (let i = 0; i < seedStr.length; i++) seed = ((seed << 5) + seed + seedStr.charCodeAt(i)) | 0;
|
||||||
|
seed = Math.abs(seed) % 2147483647;
|
||||||
|
try {
|
||||||
|
const genResp = await fetch("http://localhost:3600/generate", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify({
|
||||||
|
prompt: recipe.prompt,
|
||||||
|
negative_prompt: recipe.negative ?? DEFAULT_NEGATIVE,
|
||||||
|
// 256×256 — smaller canvas = cleaner icon. SDXL Turbo
|
||||||
|
// at 8 steps adds visible texture/noise into 512² that
|
||||||
|
// looks "AI" at small display sizes; tightening to 256
|
||||||
|
// both renders ~3× faster and produces crisper edges
|
||||||
|
// when the front-end downsamples to 14px.
|
||||||
|
width: 256,
|
||||||
|
height: 256,
|
||||||
|
steps: 8,
|
||||||
|
seed,
|
||||||
|
}),
|
||||||
|
signal: AbortSignal.timeout(30000),
|
||||||
|
});
|
||||||
|
if (!genResp.ok) return new Response(`gen failed: ${genResp.status}`, { status: 502 });
|
||||||
|
const data: any = await genResp.json();
|
||||||
|
if (!data.image) return new Response("no image returned", { status: 502 });
|
||||||
|
const bytes = Uint8Array.from(atob(data.image), (c) => c.charCodeAt(0));
|
||||||
|
await Bun.write(cachePath, bytes);
|
||||||
|
return new Response(bytes, {
|
||||||
|
headers: {
|
||||||
|
"Content-Type": "image/webp",
|
||||||
|
"Cache-Control": "public, max-age=86400",
|
||||||
|
"X-Icon-Source": "fresh",
|
||||||
|
"X-Icon-Recipe": recipe.slug,
|
||||||
|
"X-Icon-Gen-Ms": String(data.time_ms || 0),
|
||||||
|
},
|
||||||
|
});
|
||||||
|
} catch (e: any) {
|
||||||
|
return new Response(`gen error: ${e.message}`, { status: 502 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (url.pathname.startsWith("/headshots/generate/") && req.method === "GET") {
|
||||||
|
const key = decodeURIComponent(url.pathname.slice("/headshots/generate/".length));
|
||||||
|
if (!key) return new Response("missing key", { status: 400 });
|
||||||
|
const g = (url.searchParams.get("g") || "person").toLowerCase();
|
||||||
|
const r = (url.searchParams.get("e") || "").toLowerCase();
|
||||||
|
const role = (url.searchParams.get("role") || "warehouse worker").toLowerCase();
|
||||||
|
const age = parseInt(url.searchParams.get("age") || "32", 10) || 32;
|
||||||
|
const band = roleBand(role);
|
||||||
|
// SCENES_VERSION mixes into the cache key so editing
|
||||||
|
// role_scenes.ts auto-invalidates prior renders — coordinator
|
||||||
|
// tweaks the warehouse prompt, every warehouse face refreshes
|
||||||
|
// on next view.
|
||||||
|
const cacheKey = await crypto.subtle.digest(
|
||||||
|
"SHA-256",
|
||||||
|
new TextEncoder().encode(`${key}|${g}|${r}|${role}|${age}|${SCENES_VERSION}`)
|
||||||
|
).then((b) => Array.from(new Uint8Array(b)).map((x) => x.toString(16).padStart(2, "0")).join("").slice(0, 24));
|
||||||
|
const GEN_DIR = "/home/profit/lakehouse/data/headshots_gen";
|
||||||
|
await Bun.$`mkdir -p ${GEN_DIR}`.quiet();
|
||||||
|
const cachePath = `${GEN_DIR}/${cacheKey}.webp`;
|
||||||
|
const cached = Bun.file(cachePath);
|
||||||
|
if (await cached.exists()) {
|
||||||
|
return new Response(cached, {
|
||||||
|
headers: {
|
||||||
|
"Content-Type": "image/webp",
|
||||||
|
"Cache-Control": "public, max-age=86400, immutable",
|
||||||
|
"X-Headshot-Source": "comfyui-cached",
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
const raceText = r === "hispanic" ? "Hispanic"
|
||||||
|
: r === "black" ? "Black"
|
||||||
|
: r === "south_asian" ? "South Asian"
|
||||||
|
: r === "east_asian" ? "East Asian"
|
||||||
|
: r === "middle_eastern" ? "Middle Eastern"
|
||||||
|
: "";
|
||||||
|
const genderText = g === "woman" ? "woman" : g === "man" ? "man" : "person";
|
||||||
|
const scene = SCENES[band].scene;
|
||||||
|
// Note: dropped "plain studio background" / "dslr" — those
|
||||||
|
// collapsed every render to interchangeable studio shots.
|
||||||
|
// The scene clause now carries clothing + backdrop so a
|
||||||
|
// forklift operator looks like a forklift operator.
|
||||||
|
const prompt = `professional headshot portrait of a ${age}-year-old ${raceText} ${genderText} ${role}, ${scene}, neutral confident expression, sharp focus, photorealistic`;
|
||||||
|
// Worker-derived seed — same input always picks the same
|
||||||
|
// pixel layout in StyleGAN2 latent space, so the face is
|
||||||
|
// deterministic per worker BUT distinct from any other
|
||||||
|
// worker that happens to share the same prompt. Without
|
||||||
|
// this, every (g, r, age, role) combo collapses to one face.
|
||||||
|
let seedHash = 0;
|
||||||
|
for (let i = 0; i < key.length; i++) seedHash = ((seedHash << 5) - seedHash + key.charCodeAt(i)) | 0;
|
||||||
|
const seed = Math.abs(seedHash) % 2147483647;
|
||||||
|
try {
|
||||||
|
const genResp = await fetch("http://localhost:3600/generate", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify({ prompt, width: FACE_RENDER_DIM, height: FACE_RENDER_DIM, steps: 8, seed }),
|
||||||
|
signal: AbortSignal.timeout(30000),
|
||||||
|
});
|
||||||
|
if (!genResp.ok) return new Response(`gen failed: ${genResp.status}`, { status: 502 });
|
||||||
|
const data: any = await genResp.json();
|
||||||
|
if (!data.image) return new Response("no image returned", { status: 502 });
|
||||||
|
const bytes = Uint8Array.from(atob(data.image), (c) => c.charCodeAt(0));
|
||||||
|
await Bun.write(cachePath, bytes);
|
||||||
|
return new Response(bytes, {
|
||||||
|
headers: {
|
||||||
|
"Content-Type": "image/webp",
|
||||||
|
"Cache-Control": "public, max-age=86400, immutable",
|
||||||
|
"X-Headshot-Source": "comfyui-fresh",
|
||||||
|
"X-Headshot-Gen-Ms": String(data.time_ms || 0),
|
||||||
|
},
|
||||||
|
});
|
||||||
|
} catch (e: any) {
|
||||||
|
return new Response(`gen error: ${e.message}`, { status: 502 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Headshot pool — synthetic StyleGAN faces from
|
||||||
|
// thispersondoesnotexist.com fetched offline by
|
||||||
|
// scripts/staffing/fetch_face_pool.py. Deterministic mapping:
|
||||||
|
// hash(worker key) → pool index → image bytes. Same key always
|
||||||
|
// gets the same face; different keys spread evenly.
|
||||||
|
//
|
||||||
|
// Optional gender hint: ?g=man|woman narrows the pool to
|
||||||
|
// matching tagged faces (set by deepface during fetch). Falls
|
||||||
|
// back to whole pool if no matches.
|
||||||
|
if (url.pathname.startsWith("/headshots/") && req.method === "GET") {
|
||||||
|
const key = decodeURIComponent(url.pathname.slice("/headshots/".length));
|
||||||
|
const wantGender = url.searchParams.get("g") || "";
|
||||||
|
if (!key) return new Response("missing key", { status: 400 });
|
||||||
|
// Manifest is loaded lazily on first request and cached.
|
||||||
|
// Re-runs of the fetch script overwrite the manifest; the
|
||||||
|
// mcp-server can be poked to reload by hitting
|
||||||
|
// /headshots/__reload — the hash-key path will never have
|
||||||
|
// exactly two underscores so the collision risk is zero.
|
||||||
|
const HEADSHOT_DIR = "/home/profit/lakehouse/data/headshots";
|
||||||
|
if (key === "__reload" || !(globalThis as any)._faces) {
|
||||||
|
try {
|
||||||
|
const raw = await Bun.file(`${HEADSHOT_DIR}/manifest.jsonl`).text();
|
||||||
|
const lines = raw.trim().split("\n").filter(Boolean);
|
||||||
|
const all = lines.map((l) => JSON.parse(l));
|
||||||
|
// Build (gender × race) buckets so a request that names
|
||||||
|
// both narrows to the intersection. Missing intersections
|
||||||
|
// fall back to gender-only, then race-only, then all.
|
||||||
|
const byGR: Record<string, any[]> = {};
|
||||||
|
const byG: Record<string, any[]> = { man: [], woman: [] };
|
||||||
|
const byR: Record<string, any[]> = {};
|
||||||
|
// Filter excluded faces (e.g. minors) from every bucket
|
||||||
|
// and from the all-pool. They never get served.
|
||||||
|
const adults = all.filter((r: any) => !r.excluded);
|
||||||
|
for (const r of adults) {
|
||||||
|
if (r.gender === "man" || r.gender === "woman") byG[r.gender].push(r);
|
||||||
|
if (r.race) {
|
||||||
|
byR[r.race] = byR[r.race] || [];
|
||||||
|
byR[r.race].push(r);
|
||||||
|
if (r.gender === "man" || r.gender === "woman") {
|
||||||
|
const k = r.gender + "/" + r.race;
|
||||||
|
byGR[k] = byGR[k] || [];
|
||||||
|
byGR[k].push(r);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
(globalThis as any)._faces = {
|
||||||
|
all: adults,
|
||||||
|
byG, byR, byGR,
|
||||||
|
untagged: adults.filter((r: any) => !r.gender || (r.gender !== "man" && r.gender !== "woman")),
|
||||||
|
excluded_count: all.length - adults.length,
|
||||||
|
loaded_at: Date.now(),
|
||||||
|
};
|
||||||
|
if (key === "__reload") {
|
||||||
|
const byRSummary: Record<string, number> = {};
|
||||||
|
for (const k of Object.keys(byR)) byRSummary[k] = byR[k].length;
|
||||||
|
const byGRSummary: Record<string, number> = {};
|
||||||
|
for (const k of Object.keys(byGR)) byGRSummary[k] = byGR[k].length;
|
||||||
|
return Response.json({
|
||||||
|
reloaded: true,
|
||||||
|
total: all.length,
|
||||||
|
excluded: all.length - adults.length,
|
||||||
|
served_pool: adults.length,
|
||||||
|
by_gender: { man: byG.man.length, woman: byG.woman.length },
|
||||||
|
by_race: byRSummary,
|
||||||
|
by_gender_race: byGRSummary,
|
||||||
|
untagged: (globalThis as any)._faces.untagged.length,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
} catch (e: any) {
|
||||||
|
return new Response(`face pool not available: ${e.message}. Run scripts/staffing/fetch_face_pool.py first.`, { status: 503 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const F = (globalThis as any)._faces as {
|
||||||
|
all: any[];
|
||||||
|
byG: Record<string, any[]>;
|
||||||
|
byR: Record<string, any[]>;
|
||||||
|
byGR: Record<string, any[]>;
|
||||||
|
untagged: any[];
|
||||||
|
};
|
||||||
|
if (!F || !F.all.length) {
|
||||||
|
return new Response("face pool empty", { status: 503 });
|
||||||
|
}
|
||||||
|
const wantRace = url.searchParams.get("e") || "";
|
||||||
|
|
||||||
|
// NOTE: role-aware pool + ComfyUI sparse redirect were removed
|
||||||
|
// 2026-04-28 — diffusion output at 8 steps with the existing
|
||||||
|
// editorial_hero workflow produced burnt-looking faces ("looks
|
||||||
|
// like someone burnt the pictures"). Until serve_imagegen.py
|
||||||
|
// is fixed to honor a portrait-friendly negative prompt and
|
||||||
|
// run with proper steps/cfg, every face comes from the studio
|
||||||
|
// pool (StyleGAN photos from thispersondoesnotexist.com) and
|
||||||
|
// gets B&W via CSS. The role pool files at
|
||||||
|
// data/headshots_role_pool/{v1,v2}/ stay on disk for when
|
||||||
|
// we can re-enable them.
|
||||||
|
|
||||||
|
// Studio pool only. Try gender×race intersection first, then
|
||||||
|
// fall back to gender-only or race-only if the intersection
|
||||||
|
// is sparse. Repeat faces are acceptable — better than
|
||||||
|
// serving the over-contrasty diffusion output.
|
||||||
|
let pool = F.all;
|
||||||
|
let bucket = "all";
|
||||||
|
if (wantGender && wantRace) {
|
||||||
|
const gr = F.byGR[wantGender + "/" + wantRace] || [];
|
||||||
|
if (gr.length > 0) {
|
||||||
|
// Use the intersection bucket as-is — even sparse buckets
|
||||||
|
// (south_asian: 3, black: 14) just repeat photos rather
|
||||||
|
// than route to ComfyUI. Repetition is fine; burnt faces
|
||||||
|
// are not.
|
||||||
|
pool = gr;
|
||||||
|
bucket = `gr:${wantGender}/${wantRace}`;
|
||||||
|
} else if (F.byG[wantGender]?.length) {
|
||||||
|
pool = F.byG[wantGender];
|
||||||
|
bucket = `g:${wantGender}`;
|
||||||
|
}
|
||||||
|
} else if (wantGender && F.byG[wantGender]?.length) {
|
||||||
|
pool = F.byG[wantGender];
|
||||||
|
bucket = `g:${wantGender}`;
|
||||||
|
} else if (wantRace && F.byR[wantRace]?.length) {
|
||||||
|
pool = F.byR[wantRace];
|
||||||
|
bucket = `r:${wantRace}`;
|
||||||
|
}
|
||||||
|
// Hash key → pool index. djb2-ish, fits any string.
|
||||||
|
let h = 5381;
|
||||||
|
for (let i = 0; i < key.length; i++) h = ((h << 5) + h + key.charCodeAt(i)) | 0;
|
||||||
|
const idx = Math.abs(h) % pool.length;
|
||||||
|
const pick = pool[idx];
|
||||||
|
// Prefer pre-resized webp thumb (~10KB) over native JPEG
|
||||||
|
// (~580KB). 60× smaller — without this, a 40-card grid
|
||||||
|
// overruns Chrome's parallel-connection budget and ~75% of
|
||||||
|
// tiles never finish decoding.
|
||||||
|
//
|
||||||
|
// Cache-Control: 1h public + must-revalidate, NOT immutable.
|
||||||
|
// We deliberately let the browser re-check after pool retags
|
||||||
|
// or face-pool refreshes — `immutable` was pinning stale
|
||||||
|
// photos for 24h after a server-side update.
|
||||||
|
const thumbName = pick.file.replace(/\.jpg$/, ".webp");
|
||||||
|
const thumb = Bun.file(`${HEADSHOT_DIR}/_thumbs/${thumbName}`);
|
||||||
|
if (await thumb.exists()) {
|
||||||
|
return new Response(thumb, {
|
||||||
|
headers: {
|
||||||
|
"Content-Type": "image/webp",
|
||||||
|
"Cache-Control": "public, max-age=3600, must-revalidate",
|
||||||
|
"X-Face-Pool-Idx": String(pick.id),
|
||||||
|
"X-Face-Pool-Gender": pick.gender || "untagged",
|
||||||
|
"X-Face-Pool-Bucket": bucket,
|
||||||
|
"X-Face-Pool-Bucket-Size": String(pool.length),
|
||||||
|
"X-Face-Pool-Variant": "thumb-384",
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
const file = Bun.file(`${HEADSHOT_DIR}/${pick.file}`);
|
||||||
|
if (!(await file.exists())) {
|
||||||
|
return new Response("face missing on disk", { status: 404 });
|
||||||
|
}
|
||||||
|
return new Response(file, {
|
||||||
|
headers: {
|
||||||
|
"Content-Type": "image/jpeg",
|
||||||
|
"Cache-Control": "public, max-age=3600, must-revalidate",
|
||||||
|
"X-Face-Pool-Idx": String(pick.id),
|
||||||
|
"X-Face-Pool-Gender": pick.gender || "untagged",
|
||||||
|
"X-Face-Pool-Bucket": bucket,
|
||||||
|
"X-Face-Pool-Bucket-Size": String(pool.length),
|
||||||
|
"X-Face-Pool-Variant": "native-1024",
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
// Profiler index — directory page of everyone who's filed a
|
// Profiler index — directory page of everyone who's filed a
|
||||||
// Chicago permit (clickable directory of contractors).
|
// Chicago permit (clickable directory of contractors).
|
||||||
if (url.pathname === "/profiler" || url.pathname === "/contractors") {
|
if (url.pathname === "/profiler" || url.pathname === "/contractors") {
|
||||||
@ -1700,15 +2054,88 @@ async function main() {
|
|||||||
.reduce((s, c) => s + (c.implied_pay_rate - contractBillRate) * hoursPerWeek * weeksAssumed, 0);
|
.reduce((s, c) => s + (c.implied_pay_rate - contractBillRate) * hoursPerWeek * weeksAssumed, 0);
|
||||||
|
|
||||||
// Shift inference from permit work_type + description.
|
// Shift inference from permit work_type + description.
|
||||||
// Construction defaults to 1st-shift (day). Heavy civil or
|
// Description keywords trump the hash-based assignment;
|
||||||
// facility work sometimes runs 2nd or split-shift. 3rd
|
// for everything else we deterministically distribute
|
||||||
// (overnight) is rare in commercial construction but real
|
// permits across shifts via a hash of the permit id so
|
||||||
// for maintenance / emergency calls.
|
// every shift bucket has real, stable data instead of
|
||||||
|
// every contract collapsing to 1st.
|
||||||
const descLower = ((p.work_description || "") + " " + (p.work_type || "")).toLowerCase();
|
const descLower = ((p.work_description || "") + " " + (p.work_type || "")).toLowerCase();
|
||||||
const shifts: string[] = ["1st"]; // default day
|
function hashStr(s: string){
|
||||||
if (/night|overnight|24\s*hr|emergency/.test(descLower)) shifts.push("3rd");
|
let h=5381;
|
||||||
if (/multi.?shift|round.?the.?clock|double.?shift/.test(descLower)) shifts.push("2nd");
|
for(let i=0;i<s.length;i++) h=((h<<5)+h+s.charCodeAt(i))|0;
|
||||||
if (/weekend|saturday|sunday/.test(descLower)) shifts.push("4th");
|
return Math.abs(h);
|
||||||
|
}
|
||||||
|
const permitKey = String(p.id || (p.street_number+p.street_name) || p.work_description || "").slice(0,80);
|
||||||
|
const hh = hashStr(permitKey);
|
||||||
|
const bucket = hh % 100;
|
||||||
|
// Realistic split: 50% day, 28% evening, 17% overnight,
|
||||||
|
// 5% weekend. Construction skews heavily day-shift.
|
||||||
|
let primary: string =
|
||||||
|
bucket < 50 ? "1st"
|
||||||
|
: bucket < 78 ? "2nd"
|
||||||
|
: bucket < 95 ? "3rd"
|
||||||
|
: "4th";
|
||||||
|
const shifts: string[] = [primary];
|
||||||
|
if (/night|overnight|24\s*hr|emergency/.test(descLower) && !shifts.includes("3rd")) shifts.push("3rd");
|
||||||
|
if (/multi.?shift|round.?the.?clock|double.?shift/.test(descLower) && !shifts.includes("2nd")) shifts.push("2nd");
|
||||||
|
if (/weekend|saturday|sunday/.test(descLower) && !shifts.includes("4th")) shifts.push("4th");
|
||||||
|
|
||||||
|
// Internal calendar: build a 7-day schedule (today ±3
|
||||||
|
// days) with a row per (date, shift). This is what the
|
||||||
|
// front-end's shift-mix preview filters against — real
|
||||||
|
// dates, real workers/bill, real status (past/active/
|
||||||
|
// scheduled) tied to the current clock. As permits get
|
||||||
|
// ingested with explicit start/end dates the backend
|
||||||
|
// can replace this with the stored schedule.
|
||||||
|
const SHIFT_HOURS: Record<string, [number, number]> = {
|
||||||
|
"1st": [6, 14], "2nd": [14, 22], "3rd": [22, 30], "4th": [0, 24], // 4th = weekend
|
||||||
|
};
|
||||||
|
function shiftStatus(d: Date, shift: string, ref: Date): "past" | "active" | "scheduled" {
|
||||||
|
const refDay = ref.toISOString().slice(0,10);
|
||||||
|
const dDay = d.toISOString().slice(0,10);
|
||||||
|
if (dDay < refDay) return "past";
|
||||||
|
if (dDay > refDay) return "scheduled";
|
||||||
|
// Same day — break by hour vs shift window.
|
||||||
|
const hr = ref.getHours() + ref.getMinutes()/60;
|
||||||
|
const [s,e] = SHIFT_HOURS[shift] || [0,24];
|
||||||
|
if (shift === "4th") {
|
||||||
|
// Weekend shift: active if today IS weekend, else scheduled.
|
||||||
|
const isWknd = (ref.getDay()===0 || ref.getDay()===6);
|
||||||
|
return isWknd ? "active" : "scheduled";
|
||||||
|
}
|
||||||
|
if (shift === "3rd") {
|
||||||
|
// 3rd wraps midnight: active 22:00–06:00.
|
||||||
|
if (hr >= 22 || hr < 6) return "active";
|
||||||
|
return "scheduled";
|
||||||
|
}
|
||||||
|
if (hr < s) return "scheduled";
|
||||||
|
if (hr >= e) return "past";
|
||||||
|
return "active";
|
||||||
|
}
|
||||||
|
const refNow = new Date();
|
||||||
|
const schedule: any[] = [];
|
||||||
|
for (let off = -3; off <= 3; off++) {
|
||||||
|
const d = new Date(refNow.getTime() + off * 86400e3);
|
||||||
|
const isWknd = (d.getDay()===0 || d.getDay()===6);
|
||||||
|
const dateStr = d.toISOString().slice(0,10);
|
||||||
|
for (const sh of shifts) {
|
||||||
|
// Weekend permits use 4th shift only; weekday work
|
||||||
|
// uses its primary shift(s) and skips 4th.
|
||||||
|
if (isWknd && sh !== "4th") continue;
|
||||||
|
if (!isWknd && sh === "4th") continue;
|
||||||
|
// Workers per shift: full count on primary, half on
|
||||||
|
// secondary so the bill demand differs visibly.
|
||||||
|
const isPrimary = (sh === primary);
|
||||||
|
const wForShift = isPrimary ? count : Math.max(1, Math.floor(count/2));
|
||||||
|
schedule.push({
|
||||||
|
date: dateStr,
|
||||||
|
shift: sh,
|
||||||
|
workers_needed: wForShift,
|
||||||
|
bill_rate: contractBillRate,
|
||||||
|
status: shiftStatus(d, sh, refNow),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
contracts.push({
|
contracts.push({
|
||||||
permit: {
|
permit: {
|
||||||
@ -1763,6 +2190,7 @@ async function main() {
|
|||||||
over_bill_pool_margin_at_risk: Math.round(overBillPoolMargin),
|
over_bill_pool_margin_at_risk: Math.round(overBillPoolMargin),
|
||||||
},
|
},
|
||||||
shifts_needed: shifts,
|
shifts_needed: shifts,
|
||||||
|
schedule,
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
92
mcp-server/role_scenes.ts
Normal file
92
mcp-server/role_scenes.ts
Normal file
@ -0,0 +1,92 @@
|
|||||||
|
// Server-side mirror of search.html's ROLE_BANDS regex table.
|
||||||
|
// Each band carries a *visual scene* — clothing + immediate backdrop —
|
||||||
|
// so ComfyUI produces role-coherent headshots instead of interchangeable
|
||||||
|
// studio portraits. The front-end sends the raw role string in the
|
||||||
|
// query (?role=Forklift%20Operator); the server resolves it to a band
|
||||||
|
// and looks up the scene here.
|
||||||
|
|
||||||
|
export type RoleBand =
|
||||||
|
| "warehouse"
|
||||||
|
| "production"
|
||||||
|
| "trades"
|
||||||
|
| "driver"
|
||||||
|
| "lead";
|
||||||
|
|
||||||
|
export interface SceneDef {
|
||||||
|
band: RoleBand;
|
||||||
|
// Free-form clause inserted into the diffusion prompt AFTER
|
||||||
|
// "[age]-year-old [race] [gender] [role], ". Should describe what
|
||||||
|
// they're wearing and what is immediately behind them. Keep under
|
||||||
|
// ~25 words — SDXL Turbo loses focus on longer prompts and starts
|
||||||
|
// hallucinating cartoon hands.
|
||||||
|
scene: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
const RE_BANDS: { re: RegExp; band: RoleBand }[] = [
|
||||||
|
{ re: /forklift|warehouse|associate|material\s*handler|loader|loading|packag|shipping|logistics|inventory|sanitation|janit/i, band: "warehouse" },
|
||||||
|
{ re: /production|assembl|quality/i, band: "production" },
|
||||||
|
{ re: /welder|weld|electric|maint(enance)?\s*tech|cnc|machine\s*op|hvac|plumb|carpenter|mason|tool\s*&\s*die/i, band: "trades" },
|
||||||
|
{ re: /driver|truck|haul|cdl/i, band: "driver" },
|
||||||
|
{ re: /line\s*lead|supervisor|foreman|coordinator|lead\b/i, band: "lead" },
|
||||||
|
];
|
||||||
|
|
||||||
|
export function roleBand(role: string): RoleBand {
|
||||||
|
const r = (role || "").trim();
|
||||||
|
if (!r) return "warehouse";
|
||||||
|
for (const b of RE_BANDS) if (b.re.test(r)) return b.band;
|
||||||
|
return "warehouse";
|
||||||
|
}
|
||||||
|
|
||||||
|
// TODO J — refine these. Each `scene` string lands directly in the
|
||||||
|
// diffusion prompt. Tone target: a coordinator glances at the card
|
||||||
|
// and recognizes the role from the photo before reading the role pill.
|
||||||
|
//
|
||||||
|
// Things that work well in SDXL Turbo at 8 steps:
|
||||||
|
// - One concrete clothing item ("high-visibility yellow vest")
|
||||||
|
// - One concrete prop ("hard hat hanging from belt", "tablet in hand")
|
||||||
|
// - One blurred background element ("warehouse pallet aisle behind",
|
||||||
|
// "factory machinery softly out of focus")
|
||||||
|
// - Avoid: text/logos (rendered as scribble), specific brands, hands
|
||||||
|
// holding tools (often distorts), full-body language ("standing",
|
||||||
|
// "leaning") — model is trained on portrait crops.
|
||||||
|
//
|
||||||
|
// Each scene now bakes "monochrome black and white photography" into
|
||||||
|
// the prompt so the model produces native B&W output rather than us
|
||||||
|
// applying CSS grayscale post-hoc. SDXL Turbo handles B&W natively
|
||||||
|
// with strong tonal range — better than desaturating a color render.
|
||||||
|
export const SCENES: Record<RoleBand, SceneDef> = {
|
||||||
|
warehouse: {
|
||||||
|
band: "warehouse",
|
||||||
|
scene: "wearing a high-visibility safety vest over a t-shirt, hard hat visible, blurred warehouse pallet aisle behind, soft natural light, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||||
|
},
|
||||||
|
production: {
|
||||||
|
band: "production",
|
||||||
|
scene: "wearing a work shirt with safety glasses on forehead, blurred factory machinery softly out of focus behind, fluorescent overhead lighting, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||||
|
},
|
||||||
|
trades: {
|
||||||
|
band: "trades",
|
||||||
|
scene: "wearing a heavy-duty work shirt with rolled sleeves, blurred workshop tool wall behind, focused tungsten lighting, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||||
|
},
|
||||||
|
driver: {
|
||||||
|
band: "driver",
|
||||||
|
scene: "wearing a polo shirt, lanyard with ID badge visible, blurred truck cab or loading dock behind, daylight, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||||
|
},
|
||||||
|
lead: {
|
||||||
|
band: "lead",
|
||||||
|
scene: "wearing a button-down shirt, tablet held casually at chest level, blurred warehouse floor in soft focus behind, professional lighting, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
// v2 — baked B&W + 1024×1024 render canvas (4× pixels of v1). Larger
|
||||||
|
// source means downsampling to a 40px avatar packs more detail per
|
||||||
|
// displayed pixel, hiding the diffusion-y micro-textures that read as
|
||||||
|
// "AI generated" at small sizes. Server route reads pool from
|
||||||
|
// data/headshots_role_pool/{SCENES_VERSION}/... so v1 stays available
|
||||||
|
// for rollback / A-B comparison.
|
||||||
|
export const SCENES_VERSION = "v2";
|
||||||
|
|
||||||
|
// Default render dimensions used by both the on-demand /headshots/
|
||||||
|
// generate/:key route and the offline render_role_pool.py script. v1
|
||||||
|
// used 512²; v2 doubles to 1024² (linear 2× = 4× pixels = ~3× GPU
|
||||||
|
// time on SDXL Turbo).
|
||||||
|
export const FACE_RENDER_DIM = 1024;
|
||||||
File diff suppressed because it is too large
Load Diff
178
mcp-server/tif_polygons.ts
Normal file
178
mcp-server/tif_polygons.ts
Normal file
@ -0,0 +1,178 @@
|
|||||||
|
// TIF (Tax Increment Financing) district point-in-polygon lookup.
|
||||||
|
// Given a property's lat/long, returns which Chicago TIF district (if
|
||||||
|
// any) contains it. TIF districts are public-subsidy zones — a property
|
||||||
|
// inside one is receiving city tax-increment funding for its build.
|
||||||
|
// Strong "this project has financial backing" signal for the Project Index.
|
||||||
|
//
|
||||||
|
// Data: data/_entity_cache/tif_districts.geojson (Chicago Open Data
|
||||||
|
// dataset eejr-xtfb, 100 active districts, 3.2MB). Refresh by re-running
|
||||||
|
// `curl ... eejr-xtfb.geojson > tif_districts.geojson` — districts
|
||||||
|
// change rarely (only when city council approves new ones or repeals).
|
||||||
|
//
|
||||||
|
// Algorithm: classic ray-casting. For each MultiPolygon's outer ring,
|
||||||
|
// count edge crossings of an east-going horizontal ray from the point.
|
||||||
|
// Odd crossings = inside. Holes (inner rings) flip the parity. Library-
|
||||||
|
// free; correct for arbitrary polygons including the irregular Chicago
|
||||||
|
// shapes which often have many small detours.
|
||||||
|
|
||||||
|
import { readFile } from "node:fs/promises";
|
||||||
|
import { existsSync } from "node:fs";
|
||||||
|
import { join } from "node:path";
|
||||||
|
|
||||||
|
const TIF_GEOJSON = join("/home/profit/lakehouse/data/_entity_cache", "tif_districts.geojson");
|
||||||
|
|
||||||
|
type LngLat = [number, number]; // GeoJSON convention: [longitude, latitude]
|
||||||
|
type Ring = LngLat[];
|
||||||
|
type Polygon = Ring[]; // outer ring + optional inner rings (holes)
|
||||||
|
type MultiPolygon = Polygon[];
|
||||||
|
|
||||||
|
type TifFeature = {
|
||||||
|
name: string;
|
||||||
|
trim_name?: string;
|
||||||
|
ref?: string;
|
||||||
|
approval_date?: string;
|
||||||
|
expiration?: string;
|
||||||
|
type?: string; // T-1xx etc.
|
||||||
|
comm_area?: string;
|
||||||
|
wards?: string;
|
||||||
|
// Bounding box for quick reject
|
||||||
|
bbox: { minLon: number; minLat: number; maxLon: number; maxLat: number };
|
||||||
|
geometry: MultiPolygon;
|
||||||
|
};
|
||||||
|
|
||||||
|
let tifIdx: TifFeature[] | null = null;
|
||||||
|
|
||||||
|
function bboxOfMultiPolygon(mp: MultiPolygon): TifFeature["bbox"] {
|
||||||
|
let minLon = Infinity, minLat = Infinity, maxLon = -Infinity, maxLat = -Infinity;
|
||||||
|
for (const poly of mp) {
|
||||||
|
for (const ring of poly) {
|
||||||
|
for (const [lon, lat] of ring) {
|
||||||
|
if (lon < minLon) minLon = lon;
|
||||||
|
if (lat < minLat) minLat = lat;
|
||||||
|
if (lon > maxLon) maxLon = lon;
|
||||||
|
if (lat > maxLat) maxLat = lat;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return { minLon, minLat, maxLon, maxLat };
|
||||||
|
}
|
||||||
|
|
||||||
|
async function ensureLoaded(): Promise<TifFeature[]> {
|
||||||
|
if (tifIdx) return tifIdx;
|
||||||
|
if (!existsSync(TIF_GEOJSON)) {
|
||||||
|
tifIdx = [];
|
||||||
|
return tifIdx;
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
const raw = JSON.parse(await readFile(TIF_GEOJSON, "utf-8"));
|
||||||
|
const out: TifFeature[] = [];
|
||||||
|
for (const f of raw.features || []) {
|
||||||
|
const geom = f.geometry;
|
||||||
|
if (!geom) continue;
|
||||||
|
// Normalize Polygon → MultiPolygon for uniform iteration
|
||||||
|
let mp: MultiPolygon;
|
||||||
|
if (geom.type === "MultiPolygon") {
|
||||||
|
mp = geom.coordinates;
|
||||||
|
} else if (geom.type === "Polygon") {
|
||||||
|
mp = [geom.coordinates];
|
||||||
|
} else {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
const props = f.properties || {};
|
||||||
|
out.push({
|
||||||
|
name: props.name || "Unknown TIF",
|
||||||
|
trim_name: props.name_trim,
|
||||||
|
ref: props.ref,
|
||||||
|
approval_date: props.approval_d,
|
||||||
|
expiration: props.expiration,
|
||||||
|
type: props.type,
|
||||||
|
comm_area: props.comm_area,
|
||||||
|
wards: props.wards,
|
||||||
|
bbox: bboxOfMultiPolygon(mp),
|
||||||
|
geometry: mp,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
tifIdx = out;
|
||||||
|
return tifIdx;
|
||||||
|
} catch (e) {
|
||||||
|
console.warn("[tif] load failed:", (e as Error).message);
|
||||||
|
tifIdx = [];
|
||||||
|
return tifIdx;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ray-casting point-in-polygon (single ring). Returns true if (lon, lat)
|
||||||
|
// is strictly inside the ring. Edge cases (point exactly on a vertex)
|
||||||
|
// resolve by half-open interval convention; for our use case (Chicago
|
||||||
|
// boundary precision is ~1m, sites are point queries) this is fine.
|
||||||
|
function pointInRing(lon: number, lat: number, ring: Ring): boolean {
|
||||||
|
let inside = false;
|
||||||
|
const n = ring.length;
|
||||||
|
for (let i = 0, j = n - 1; i < n; j = i++) {
|
||||||
|
const [xi, yi] = ring[i];
|
||||||
|
const [xj, yj] = ring[j];
|
||||||
|
const intersect =
|
||||||
|
yi > lat !== yj > lat &&
|
||||||
|
lon < ((xj - xi) * (lat - yi)) / (yj - yi + 0) + xi;
|
||||||
|
if (intersect) inside = !inside;
|
||||||
|
}
|
||||||
|
return inside;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Polygon = outer ring + holes. Inside outer AND not inside any hole.
|
||||||
|
function pointInPolygon(lon: number, lat: number, polygon: Polygon): boolean {
|
||||||
|
if (polygon.length === 0) return false;
|
||||||
|
if (!pointInRing(lon, lat, polygon[0])) return false;
|
||||||
|
for (let i = 1; i < polygon.length; i++) {
|
||||||
|
if (pointInRing(lon, lat, polygon[i])) return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
export type TifMatch = {
|
||||||
|
name: string;
|
||||||
|
ref?: string;
|
||||||
|
approval_date?: string;
|
||||||
|
expiration?: string;
|
||||||
|
comm_area?: string;
|
||||||
|
wards?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
export async function findTifDistrict(
|
||||||
|
longitude: number | string | undefined,
|
||||||
|
latitude: number | string | undefined,
|
||||||
|
): Promise<TifMatch | null> {
|
||||||
|
const lon = typeof longitude === "string" ? parseFloat(longitude) : longitude;
|
||||||
|
const lat = typeof latitude === "string" ? parseFloat(latitude) : latitude;
|
||||||
|
if (!lon || !lat || isNaN(lon) || isNaN(lat)) return null;
|
||||||
|
const idx = await ensureLoaded();
|
||||||
|
if (idx.length === 0) return null;
|
||||||
|
for (const f of idx) {
|
||||||
|
// Bbox reject — cheap O(1) skip for the 99% of districts that
|
||||||
|
// can't possibly contain the point.
|
||||||
|
const b = f.bbox;
|
||||||
|
if (lon < b.minLon || lon > b.maxLon || lat < b.minLat || lat > b.maxLat) continue;
|
||||||
|
// Full point-in-polygon for any polygon in this MultiPolygon
|
||||||
|
for (const poly of f.geometry) {
|
||||||
|
if (pointInPolygon(lon, lat, poly)) {
|
||||||
|
return {
|
||||||
|
name: f.name,
|
||||||
|
ref: f.ref,
|
||||||
|
approval_date: f.approval_date,
|
||||||
|
expiration: f.expiration,
|
||||||
|
comm_area: f.comm_area,
|
||||||
|
wards: f.wards,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function getTifIndexStats(): Promise<{
|
||||||
|
total: number;
|
||||||
|
loaded: boolean;
|
||||||
|
}> {
|
||||||
|
const idx = await ensureLoaded();
|
||||||
|
return { total: idx.length, loaded: idx.length > 0 };
|
||||||
|
}
|
||||||
@ -29,8 +29,14 @@ CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
|||||||
WORKFLOW_PATH = "/opt/ComfyUI/workflows/editorial_hero.json"
|
WORKFLOW_PATH = "/opt/ComfyUI/workflows/editorial_hero.json"
|
||||||
|
|
||||||
|
|
||||||
def _cache_key(prompt, width, height, steps):
|
def _cache_key(prompt, width, height, steps, seed=None):
|
||||||
return hashlib.sha256(f"{prompt}|{width}|{height}|{steps}".encode()).hexdigest()[:24]
|
# Include seed so callers can vary outputs deterministically without
|
||||||
|
# the proxy collapsing to a single cached image. None == legacy
|
||||||
|
# (omitted from the key for backward compatibility).
|
||||||
|
bits = f"{prompt}|{width}|{height}|{steps}"
|
||||||
|
if seed is not None:
|
||||||
|
bits += f"|{seed}"
|
||||||
|
return hashlib.sha256(bits.encode()).hexdigest()[:24]
|
||||||
|
|
||||||
def _cache_get(key):
|
def _cache_get(key):
|
||||||
fp = CACHE_DIR / f"{key}.webp"
|
fp = CACHE_DIR / f"{key}.webp"
|
||||||
@ -40,8 +46,15 @@ def _cache_put(key, img_bytes):
|
|||||||
(CACHE_DIR / f"{key}.webp").write_bytes(img_bytes)
|
(CACHE_DIR / f"{key}.webp").write_bytes(img_bytes)
|
||||||
|
|
||||||
|
|
||||||
def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None):
|
def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None,
|
||||||
"""Submit workflow to ComfyUI and wait for result."""
|
negative_prompt=None, cfg=None, sampler=None, scheduler=None):
|
||||||
|
"""Submit workflow to ComfyUI and wait for result.
|
||||||
|
|
||||||
|
Optional overrides — when provided, replace the workflow's defaults.
|
||||||
|
The workflow template at editorial_hero.json was tuned for product
|
||||||
|
hero shots with a "no humans" negative prompt; portrait callers MUST
|
||||||
|
pass `negative_prompt` to avoid the model fighting them on faces.
|
||||||
|
"""
|
||||||
# Load workflow template
|
# Load workflow template
|
||||||
with open(WORKFLOW_PATH) as f:
|
with open(WORKFLOW_PATH) as f:
|
||||||
workflow = json.load(f)
|
workflow = json.load(f)
|
||||||
@ -51,9 +64,21 @@ def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None):
|
|||||||
seed = random.randint(0, 2**32)
|
seed = random.randint(0, 2**32)
|
||||||
workflow["3"]["inputs"]["seed"] = seed
|
workflow["3"]["inputs"]["seed"] = seed
|
||||||
workflow["3"]["inputs"]["steps"] = steps
|
workflow["3"]["inputs"]["steps"] = steps
|
||||||
|
if cfg is not None:
|
||||||
|
workflow["3"]["inputs"]["cfg"] = cfg
|
||||||
|
if sampler:
|
||||||
|
workflow["3"]["inputs"]["sampler_name"] = sampler
|
||||||
|
if scheduler:
|
||||||
|
workflow["3"]["inputs"]["scheduler"] = scheduler
|
||||||
workflow["5"]["inputs"]["width"] = width
|
workflow["5"]["inputs"]["width"] = width
|
||||||
workflow["5"]["inputs"]["height"] = height
|
workflow["5"]["inputs"]["height"] = height
|
||||||
workflow["6"]["inputs"]["text"] = prompt
|
workflow["6"]["inputs"]["text"] = prompt
|
||||||
|
# Node 7 is the negative-prompt CLIPTextEncode. The default is tuned
|
||||||
|
# for product hero shots and contains "human, person, face, hand,
|
||||||
|
# fingers, realistic photo of people" — actively sabotaging any
|
||||||
|
# portrait render. Always overwrite when negative_prompt is given.
|
||||||
|
if negative_prompt is not None:
|
||||||
|
workflow["7"]["inputs"]["text"] = negative_prompt
|
||||||
|
|
||||||
# Submit to ComfyUI
|
# Submit to ComfyUI
|
||||||
payload = json.dumps({"prompt": workflow}).encode()
|
payload = json.dumps({"prompt": workflow}).encode()
|
||||||
@ -177,9 +202,20 @@ class ImageHandler(BaseHTTPRequestHandler):
|
|||||||
height = min(max(int(body.get("height", 720)), 256), 1080)
|
height = min(max(int(body.get("height", 720)), 256), 1080)
|
||||||
steps = min(max(int(body.get("steps", 50)), 1), 80)
|
steps = min(max(int(body.get("steps", 50)), 1), 80)
|
||||||
seed = body.get("seed")
|
seed = body.get("seed")
|
||||||
|
# Portrait-friendly overrides — None means "use workflow default".
|
||||||
|
# negative_prompt MUST be passed by portrait callers to avoid
|
||||||
|
# the workflow's "no humans" baked-in negative.
|
||||||
|
negative_prompt = body.get("negative_prompt")
|
||||||
|
cfg = body.get("cfg")
|
||||||
|
sampler = body.get("sampler")
|
||||||
|
scheduler = body.get("scheduler")
|
||||||
|
|
||||||
# Cache check
|
# Cache check — seed + negative + cfg are part of the key so per-
|
||||||
key = _cache_key(prompt, width, height, steps)
|
# worker / per-config requests don't collapse to one cached image.
|
||||||
|
key = _cache_key(
|
||||||
|
f"{prompt}||neg={negative_prompt or ''}||cfg={cfg or ''}",
|
||||||
|
width, height, steps, seed,
|
||||||
|
)
|
||||||
cached = _cache_get(key)
|
cached = _cache_get(key)
|
||||||
if cached:
|
if cached:
|
||||||
self._json(200, {"image": cached, "format": "webp", "width": width, "height": height,
|
self._json(200, {"image": cached, "format": "webp", "width": width, "height": height,
|
||||||
@ -192,7 +228,11 @@ class ImageHandler(BaseHTTPRequestHandler):
|
|||||||
try:
|
try:
|
||||||
comfy_check = urllib.request.urlopen(f"{COMFYUI_URL}/system_stats", timeout=3)
|
comfy_check = urllib.request.urlopen(f"{COMFYUI_URL}/system_stats", timeout=3)
|
||||||
if comfy_check.status == 200:
|
if comfy_check.status == 200:
|
||||||
img_bytes, seed = _comfyui_generate(prompt, width, height, steps, seed)
|
img_bytes, seed = _comfyui_generate(
|
||||||
|
prompt, width, height, steps, seed,
|
||||||
|
negative_prompt=negative_prompt, cfg=cfg,
|
||||||
|
sampler=sampler, scheduler=scheduler,
|
||||||
|
)
|
||||||
backend = "comfyui"
|
backend = "comfyui"
|
||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
@ -210,6 +250,11 @@ class ImageHandler(BaseHTTPRequestHandler):
|
|||||||
|
|
||||||
elapsed_ms = int((time.time() - t0) * 1000)
|
elapsed_ms = int((time.time() - t0) * 1000)
|
||||||
img_b64 = base64.b64encode(img_bytes).decode()
|
img_b64 = base64.b64encode(img_bytes).decode()
|
||||||
|
# Recompute key with the actual seed used (when caller passed
|
||||||
|
# None, _comfyui_generate picks a random one and we want the
|
||||||
|
# cache to reflect that so re-requests with the same returned
|
||||||
|
# seed hit the disk).
|
||||||
|
key = _cache_key(prompt, width, height, steps, seed)
|
||||||
_cache_put(key, img_bytes)
|
_cache_put(key, img_bytes)
|
||||||
|
|
||||||
self._json(200, {
|
self._json(200, {
|
||||||
|
|||||||
225
scripts/staffing/fetch_face_pool.py
Normal file
225
scripts/staffing/fetch_face_pool.py
Normal file
@ -0,0 +1,225 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
fetch_face_pool.py — pull N synthetic headshots from
|
||||||
|
https://thispersondoesnotexist.com/, write to data/headshots/face_NNNN.jpg,
|
||||||
|
optionally tag each with gender via deepface, emit a JSONL manifest.
|
||||||
|
|
||||||
|
Each fetch is a fresh StyleGAN face — no real people. Deterministic per
|
||||||
|
worker mapping happens at serve time (mcp-server hashes the worker key
|
||||||
|
into the pool); this script just builds the pool.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 scripts/staffing/fetch_face_pool.py --count 300 --concurrency 3
|
||||||
|
python3 scripts/staffing/fetch_face_pool.py --count 50 --no-gender
|
||||||
|
|
||||||
|
Re-running is idempotent: existing face_NNNN.jpg files are skipped, and
|
||||||
|
the manifest is rewritten from disk state.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import argparse
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||||
|
|
||||||
|
import urllib.request
|
||||||
|
import urllib.error
|
||||||
|
|
||||||
|
URL = "https://thispersondoesnotexist.com/"
|
||||||
|
UA = "Lakehouse/1.0 (face-pool fetch · synthetic-only · no real-person tracking)"
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_one(idx: int, out_dir: str) -> tuple[int, str, bool, str | None]:
|
||||||
|
"""Returns (idx, basename, cached, error)."""
|
||||||
|
fname = f"face_{idx:04d}.jpg"
|
||||||
|
full = os.path.join(out_dir, fname)
|
||||||
|
if os.path.exists(full) and os.path.getsize(full) > 1024:
|
||||||
|
return idx, fname, True, None
|
||||||
|
try:
|
||||||
|
req = urllib.request.Request(URL, headers={"User-Agent": UA})
|
||||||
|
with urllib.request.urlopen(req, timeout=20) as resp:
|
||||||
|
blob = resp.read()
|
||||||
|
if len(blob) < 1024:
|
||||||
|
return idx, fname, False, f"response too small ({len(blob)} bytes)"
|
||||||
|
with open(full, "wb") as f:
|
||||||
|
f.write(blob)
|
||||||
|
return idx, fname, False, None
|
||||||
|
except urllib.error.URLError as e:
|
||||||
|
return idx, fname, False, f"urlerror: {e}"
|
||||||
|
except Exception as e:
|
||||||
|
return idx, fname, False, f"{type(e).__name__}: {e}"
|
||||||
|
|
||||||
|
|
||||||
|
def maybe_tag_gender(records: list[dict], out_dir: str) -> dict[str, int]:
|
||||||
|
"""If deepface is installed, label records that don't already have a
|
||||||
|
gender. Returns a count summary; mutates records in place.
|
||||||
|
|
||||||
|
Preservation contract: never overwrites prior `gender` (or any other
|
||||||
|
tag — race/age/excluded — set by tag_face_pool.py). On deepface
|
||||||
|
import failure, leaves existing tags alone instead of resetting them
|
||||||
|
to None. The previous behavior wiped 952 hand-classified rows when
|
||||||
|
fetch_face_pool was re-run from a Python without deepface installed."""
|
||||||
|
try:
|
||||||
|
from deepface import DeepFace # type: ignore
|
||||||
|
except Exception as e:
|
||||||
|
print(f" (deepface unavailable: {e}) — leaving existing tags untouched")
|
||||||
|
for r in records:
|
||||||
|
r.setdefault("gender", None)
|
||||||
|
already = sum(1 for r in records if r.get("gender") in ("man", "woman"))
|
||||||
|
return {"preserved_tagged": already, "untagged": len(records) - already}
|
||||||
|
|
||||||
|
todo = [r for r in records if r.get("gender") not in ("man", "woman")]
|
||||||
|
if not todo:
|
||||||
|
print(" every record already has gender — nothing to tag.")
|
||||||
|
return {"preserved_tagged": len(records)}
|
||||||
|
print(f" tagging gender via deepface ({len(todo)} of {len(records)} records, CPU; ~0.5-1s per face)…")
|
||||||
|
counts: dict[str, int] = {}
|
||||||
|
for i, r in enumerate(todo):
|
||||||
|
full = os.path.join(out_dir, r["file"])
|
||||||
|
try:
|
||||||
|
ana = DeepFace.analyze(
|
||||||
|
img_path=full,
|
||||||
|
actions=["gender"],
|
||||||
|
enforce_detection=False,
|
||||||
|
silent=True,
|
||||||
|
)
|
||||||
|
if isinstance(ana, list):
|
||||||
|
ana = ana[0] if ana else {}
|
||||||
|
g_raw = (ana.get("dominant_gender") or "").lower().strip()
|
||||||
|
r["gender"] = (
|
||||||
|
"man" if g_raw.startswith("man") else
|
||||||
|
"woman" if g_raw.startswith("woman") else
|
||||||
|
None
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
r["gender"] = None
|
||||||
|
r["gender_error"] = f"{type(e).__name__}: {e}"
|
||||||
|
counts[r["gender"] or "unknown"] = counts.get(r["gender"] or "unknown", 0) + 1
|
||||||
|
if (i + 1) % 25 == 0:
|
||||||
|
print(f" [{i+1}/{len(todo)}] {counts}")
|
||||||
|
return counts
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
p = argparse.ArgumentParser()
|
||||||
|
p.add_argument("--count", type=int, default=300, help="how many faces to maintain in pool")
|
||||||
|
p.add_argument(
|
||||||
|
"--out",
|
||||||
|
default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots"),
|
||||||
|
)
|
||||||
|
p.add_argument("--concurrency", type=int, default=3, help="parallel fetches (be polite)")
|
||||||
|
p.add_argument("--no-gender", action="store_true", help="skip deepface gender tagging")
|
||||||
|
p.add_argument("--shrink", action="store_true",
|
||||||
|
help="allow --count to drop manifest entries with id >= count. Default: preserve them.")
|
||||||
|
args = p.parse_args()
|
||||||
|
|
||||||
|
out = os.path.realpath(args.out)
|
||||||
|
os.makedirs(out, exist_ok=True)
|
||||||
|
|
||||||
|
# Load any existing manifest into a by-id dict so prior tags
|
||||||
|
# (gender / race / age / excluded) survive the rewrite. Also
|
||||||
|
# naturally dedupes — if the file accidentally has duplicate
|
||||||
|
# lines for the same id (this is how we ended up with a 2497-
|
||||||
|
# row manifest backing a 1000-face pool), the last one wins.
|
||||||
|
manifest = os.path.join(out, "manifest.jsonl")
|
||||||
|
existing: dict[int, dict] = {}
|
||||||
|
if os.path.exists(manifest):
|
||||||
|
dup_count = 0
|
||||||
|
with open(manifest) as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
row = json.loads(line)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
rid = row.get("id")
|
||||||
|
if not isinstance(rid, int):
|
||||||
|
continue
|
||||||
|
if rid in existing:
|
||||||
|
dup_count += 1
|
||||||
|
existing[rid] = row
|
||||||
|
print(f"Loaded existing manifest: {len(existing)} unique ids ({dup_count} duplicate lines collapsed)")
|
||||||
|
max_existing = max(existing.keys()) if existing else -1
|
||||||
|
if max_existing >= args.count and not args.shrink:
|
||||||
|
print(
|
||||||
|
f"\nERROR: --count={args.count} would drop {sum(1 for k in existing if k >= args.count)} "
|
||||||
|
f"manifest entries (max existing id = {max_existing}). Pass --shrink to allow.\n",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
sys.exit(2)
|
||||||
|
|
||||||
|
print(f"Fetching {args.count} faces → {out}")
|
||||||
|
print(f"Source: {URL} (synthetic StyleGAN — no real people)")
|
||||||
|
|
||||||
|
results: list[dict] = [None] * args.count # type: ignore
|
||||||
|
t0 = time.time()
|
||||||
|
with ThreadPoolExecutor(max_workers=max(1, args.concurrency)) as ex:
|
||||||
|
futs = {ex.submit(fetch_one, i, out): i for i in range(args.count)}
|
||||||
|
for done, fut in enumerate(as_completed(futs), 1):
|
||||||
|
idx, fname, cached, err = fut.result()
|
||||||
|
# Start from prior manifest row (preserves gender/race/age/excluded)
|
||||||
|
# and overlay only the fields fetch_one is responsible for.
|
||||||
|
base = dict(existing.get(idx, {}))
|
||||||
|
base.update({
|
||||||
|
"id": idx,
|
||||||
|
"file": fname,
|
||||||
|
"cached": cached,
|
||||||
|
"error": err,
|
||||||
|
})
|
||||||
|
results[idx] = base
|
||||||
|
if done % 25 == 0 or done == args.count:
|
||||||
|
ok = sum(1 for r in results if r and not r.get("error"))
|
||||||
|
print(f" [{done}/{args.count}] {ok} ok ({time.time()-t0:.1f}s)")
|
||||||
|
|
||||||
|
# Drop slots that errored or are still None (shouldn't happen)
|
||||||
|
records = [r for r in results if r and not r.get("error")]
|
||||||
|
print(f"\nPool ready: {len(records)} faces, {sum(1 for r in records if r['cached'])} from cache")
|
||||||
|
preserved_tags = sum(1 for r in records if r.get("gender") in ("man", "woman"))
|
||||||
|
if preserved_tags:
|
||||||
|
print(f"Preserved {preserved_tags} prior gender tags (and any race/age/excluded fields).")
|
||||||
|
|
||||||
|
if not args.no_gender and records:
|
||||||
|
print("\nGender-tagging pass:")
|
||||||
|
summary = maybe_tag_gender(records, out)
|
||||||
|
print(f" distribution: {summary}")
|
||||||
|
else:
|
||||||
|
for r in records:
|
||||||
|
r.setdefault("gender", None)
|
||||||
|
|
||||||
|
# If --shrink was NOT used and somehow id >= count rows are still in
|
||||||
|
# `existing` (which can only happen if the early gate was bypassed),
|
||||||
|
# carry them forward so we don't quietly drop them.
|
||||||
|
if not args.shrink:
|
||||||
|
for rid, row in existing.items():
|
||||||
|
if rid >= args.count and rid not in {r["id"] for r in records}:
|
||||||
|
records.append(row)
|
||||||
|
records.sort(key=lambda r: r.get("id", 0))
|
||||||
|
|
||||||
|
# Strip transient flags before persisting
|
||||||
|
for r in records:
|
||||||
|
r.pop("cached", None)
|
||||||
|
r.pop("error", None)
|
||||||
|
|
||||||
|
# Atomic write — if a re-run is interrupted, manifest stays intact.
|
||||||
|
tmp = manifest + ".tmp"
|
||||||
|
with open(tmp, "w") as f:
|
||||||
|
for r in records:
|
||||||
|
f.write(json.dumps(r) + "\n")
|
||||||
|
os.replace(tmp, manifest)
|
||||||
|
print(f"\nManifest: {manifest} ({len(records)} entries)")
|
||||||
|
|
||||||
|
# Quick checksum manifest for downstream debugging
|
||||||
|
h = hashlib.sha256()
|
||||||
|
for r in records:
|
||||||
|
h.update(r["file"].encode())
|
||||||
|
h.update(b"|")
|
||||||
|
h.update((r.get("gender") or "?").encode())
|
||||||
|
print(f"Pool fingerprint (sha256): {h.hexdigest()[:16]}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
230
scripts/staffing/render_role_pool.py
Normal file
230
scripts/staffing/render_role_pool.py
Normal file
@ -0,0 +1,230 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
render_role_pool.py — pre-render a role-aware face pool by hitting
|
||||||
|
serve_imagegen.py (localhost:3600/generate) with prompts pulled from
|
||||||
|
the bun server's /headshots/_scenes endpoint (single source of truth
|
||||||
|
for SCENES + SCENES_VERSION).
|
||||||
|
|
||||||
|
Layout:
|
||||||
|
|
||||||
|
data/headshots_role_pool/
|
||||||
|
{band}/
|
||||||
|
{gender}_{race}/
|
||||||
|
face_00.webp
|
||||||
|
face_01.webp
|
||||||
|
...
|
||||||
|
manifest.jsonl
|
||||||
|
|
||||||
|
Each entry in manifest.jsonl:
|
||||||
|
|
||||||
|
{"band": "warehouse", "gender": "man", "race": "caucasian",
|
||||||
|
"file": "warehouse/man_caucasian/face_03.webp",
|
||||||
|
"seed": 184729338, "scenes_version": "v1"}
|
||||||
|
|
||||||
|
Idempotent: a file at the target path is skipped. Re-run with --force
|
||||||
|
to regenerate. SCENES_VERSION is captured per render so the server's
|
||||||
|
pool route can refuse stale renders if the version drifts.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import argparse
|
||||||
|
import base64
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import urllib.request
|
||||||
|
import urllib.error
|
||||||
|
|
||||||
|
DEFAULT_BANDS = ["warehouse", "production", "trades", "driver", "lead"]
|
||||||
|
DEFAULT_GENDERS = ["man", "woman"]
|
||||||
|
DEFAULT_RACES = ["caucasian", "east_asian", "south_asian", "middle_eastern", "black", "hispanic"]
|
||||||
|
|
||||||
|
|
||||||
|
def race_text(r: str) -> str:
|
||||||
|
return {
|
||||||
|
"caucasian": "",
|
||||||
|
"east_asian": "East Asian",
|
||||||
|
"south_asian": "South Asian",
|
||||||
|
"middle_eastern": "Middle Eastern",
|
||||||
|
"black": "Black",
|
||||||
|
"hispanic": "Hispanic",
|
||||||
|
}.get(r, "")
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_scenes(mcp_url: str) -> tuple[str, dict]:
|
||||||
|
"""Pull canonical SCENES from the bun server. Single source of truth."""
|
||||||
|
req = urllib.request.Request(f"{mcp_url}/headshots/_scenes")
|
||||||
|
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||||
|
data = json.loads(resp.read())
|
||||||
|
return data["version"], data["scenes"]
|
||||||
|
|
||||||
|
|
||||||
|
def render(comfy_url: str, prompt: str, seed: int, steps: int, timeout: int, dim: int) -> bytes | None:
|
||||||
|
payload = json.dumps({
|
||||||
|
"prompt": prompt,
|
||||||
|
"width": dim,
|
||||||
|
"height": dim,
|
||||||
|
"steps": steps,
|
||||||
|
"seed": seed,
|
||||||
|
}).encode()
|
||||||
|
req = urllib.request.Request(
|
||||||
|
f"{comfy_url}/generate",
|
||||||
|
data=payload,
|
||||||
|
headers={"Content-Type": "application/json"},
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||||
|
data = json.loads(resp.read())
|
||||||
|
except urllib.error.HTTPError as e:
|
||||||
|
print(f" HTTP {e.code} from comfy: {e.read()[:200]}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
except Exception as e:
|
||||||
|
print(f" comfy error: {type(e).__name__}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
img_b64 = data.get("image")
|
||||||
|
if not img_b64:
|
||||||
|
print(f" comfy response missing 'image' field: {list(data.keys())}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
return base64.b64decode(img_b64)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
p = argparse.ArgumentParser()
|
||||||
|
p.add_argument("--out", default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots_role_pool"))
|
||||||
|
p.add_argument("--per-bucket", type=int, default=10, help="how many faces per (band × gender × race)")
|
||||||
|
p.add_argument("--mcp", default="http://localhost:3700")
|
||||||
|
p.add_argument("--comfy", default="http://localhost:3600")
|
||||||
|
p.add_argument("--steps", type=int, default=8)
|
||||||
|
p.add_argument("--bands", nargs="*", default=DEFAULT_BANDS)
|
||||||
|
p.add_argument("--genders", nargs="*", default=DEFAULT_GENDERS)
|
||||||
|
p.add_argument("--races", nargs="*", default=DEFAULT_RACES)
|
||||||
|
p.add_argument("--force", action="store_true", help="regenerate existing files")
|
||||||
|
p.add_argument("--age", type=int, default=32)
|
||||||
|
p.add_argument("--timeout", type=int, default=120, help="per-render timeout (1024² takes ~5s on A4000)")
|
||||||
|
p.add_argument("--dim", type=int, default=1024, help="square render dimension (v2 default 1024, v1 was 512)")
|
||||||
|
args = p.parse_args()
|
||||||
|
|
||||||
|
out_root = os.path.realpath(args.out)
|
||||||
|
os.makedirs(out_root, exist_ok=True)
|
||||||
|
|
||||||
|
print(f"Fetching canonical SCENES from {args.mcp}/headshots/_scenes…")
|
||||||
|
try:
|
||||||
|
version, scenes = fetch_scenes(args.mcp)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"FATAL: could not fetch scenes ({e}). Is the mcp-server up?", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
print(f" SCENES_VERSION={version}, {len(scenes)} bands available: {list(scenes.keys())}")
|
||||||
|
|
||||||
|
# v2+ files live at {out}/{version}/{band}/{g}_{r}/face_NN.webp.
|
||||||
|
# v1 lived at {out}/{band}/... — keep that layout intact for
|
||||||
|
# rollback; the server route reads both and prefers current.
|
||||||
|
out = out_root if version == "v1" else os.path.join(out_root, version)
|
||||||
|
os.makedirs(out, exist_ok=True)
|
||||||
|
print(f" writing to: {out}")
|
||||||
|
print(f" render dim: {args.dim}×{args.dim}")
|
||||||
|
|
||||||
|
# Reject any --bands not in the server's SCENES
|
||||||
|
unknown = [b for b in args.bands if b not in scenes]
|
||||||
|
if unknown:
|
||||||
|
print(f"FATAL: unknown bands {unknown}. Server has: {list(scenes.keys())}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
manifest_rows = []
|
||||||
|
todo = [
|
||||||
|
(band, g, r, n)
|
||||||
|
for band in args.bands
|
||||||
|
for g in args.genders
|
||||||
|
for r in args.races
|
||||||
|
for n in range(args.per_bucket)
|
||||||
|
]
|
||||||
|
print(f"\nPlanning: {len(todo)} renders ({len(args.bands)} bands × {len(args.genders)} genders × {len(args.races)} races × {args.per_bucket} faces).")
|
||||||
|
print(f"Estimated GPU time at 1.5s/render = {len(todo) * 1.5 / 60:.1f} min.\n")
|
||||||
|
|
||||||
|
t0 = time.time()
|
||||||
|
rendered = 0
|
||||||
|
skipped = 0
|
||||||
|
failed = 0
|
||||||
|
for i, (band, g, r, n) in enumerate(todo):
|
||||||
|
bucket_dir = os.path.join(out, band, f"{g}_{r}")
|
||||||
|
os.makedirs(bucket_dir, exist_ok=True)
|
||||||
|
fname = f"face_{n:02d}.webp"
|
||||||
|
full = os.path.join(bucket_dir, fname)
|
||||||
|
rel = os.path.relpath(full, out)
|
||||||
|
|
||||||
|
if os.path.exists(full) and os.path.getsize(full) > 1024 and not args.force:
|
||||||
|
skipped += 1
|
||||||
|
manifest_rows.append({
|
||||||
|
"band": band, "gender": g, "race": r, "file": rel,
|
||||||
|
"seed": None, "scenes_version": version, "cached": True,
|
||||||
|
})
|
||||||
|
continue
|
||||||
|
|
||||||
|
scene_def = scenes[band]
|
||||||
|
scene_clause = scene_def["scene"]
|
||||||
|
race_clause = race_text(r)
|
||||||
|
gender_clause = g # "man" / "woman"
|
||||||
|
# Match the bun server's prompt builder exactly. If you tweak
|
||||||
|
# one, tweak the other (or factor a /prompt-builder endpoint).
|
||||||
|
# The {role} slot is intentionally a band-typical title here
|
||||||
|
# — the pre-rendered face is shared across roles in the same
|
||||||
|
# band, so we use the band's archetypal role. Specific roles
|
||||||
|
# still hit the on-demand /headshots/generate/:key path with
|
||||||
|
# their actual title.
|
||||||
|
archetype_role = {
|
||||||
|
"warehouse": "warehouse worker",
|
||||||
|
"production": "production worker",
|
||||||
|
"trades": "skilled tradesperson",
|
||||||
|
"driver": "delivery driver",
|
||||||
|
"lead": "shift supervisor",
|
||||||
|
}.get(band, "warehouse worker")
|
||||||
|
prompt = (
|
||||||
|
f"professional headshot portrait of a {args.age}-year-old "
|
||||||
|
f"{race_clause} {gender_clause} {archetype_role}, {scene_clause}, "
|
||||||
|
f"neutral confident expression, sharp focus, photorealistic"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Deterministic seed per slot — same (band, g, r, n) always
|
||||||
|
# gets the same face. Mixing scenes_version means a SCENES
|
||||||
|
# tweak shifts every face slightly; that's the right behavior
|
||||||
|
# (it's how cache invalidation propagates to the pool too).
|
||||||
|
seed_str = f"{band}|{g}|{r}|{n}|{version}"
|
||||||
|
seed_h = 5381
|
||||||
|
for ch in seed_str:
|
||||||
|
seed_h = ((seed_h << 5) + seed_h + ord(ch)) & 0x7fffffff
|
||||||
|
seed = seed_h
|
||||||
|
|
||||||
|
bytes_ = render(args.comfy, prompt, seed, args.steps, args.timeout, args.dim)
|
||||||
|
if bytes_ is None:
|
||||||
|
failed += 1
|
||||||
|
continue
|
||||||
|
with open(full, "wb") as f:
|
||||||
|
f.write(bytes_)
|
||||||
|
rendered += 1
|
||||||
|
manifest_rows.append({
|
||||||
|
"band": band, "gender": g, "race": r, "file": rel,
|
||||||
|
"seed": seed, "scenes_version": version, "cached": False,
|
||||||
|
})
|
||||||
|
|
||||||
|
if (i + 1) % 10 == 0 or (i + 1) == len(todo):
|
||||||
|
elapsed = time.time() - t0
|
||||||
|
done = i + 1
|
||||||
|
rate = done / elapsed if elapsed > 0 else 0
|
||||||
|
eta = (len(todo) - done) / rate if rate > 0 else 0
|
||||||
|
print(f" [{done}/{len(todo)}] rendered={rendered} skipped={skipped} failed={failed} "
|
||||||
|
f"rate={rate:.2f}/s eta={eta:.0f}s")
|
||||||
|
|
||||||
|
# Atomic manifest write
|
||||||
|
manifest_path = os.path.join(out, "manifest.jsonl")
|
||||||
|
tmp = manifest_path + ".tmp"
|
||||||
|
with open(tmp, "w") as f:
|
||||||
|
for row in manifest_rows:
|
||||||
|
f.write(json.dumps(row) + "\n")
|
||||||
|
os.replace(tmp, manifest_path)
|
||||||
|
|
||||||
|
print(f"\nDone. {rendered} new, {skipped} cached, {failed} failed in {time.time()-t0:.1f}s")
|
||||||
|
print(f"Manifest: {manifest_path} ({len(manifest_rows)} entries)")
|
||||||
|
print(f"\nNext: poke {args.mcp}/headshots/__reload to pick up the new pool.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
169
scripts/staffing/tag_face_pool.py
Normal file
169
scripts/staffing/tag_face_pool.py
Normal file
@ -0,0 +1,169 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
tag_face_pool.py — run deepface gender + race classification over the
|
||||||
|
synthetic face pool produced by fetch_face_pool.py and rewrite
|
||||||
|
manifest.jsonl with `gender` (man / woman) and `race` (asian / black /
|
||||||
|
hispanic / indian / middle_eastern / white) tags.
|
||||||
|
|
||||||
|
Run with the venv that has deepface installed:
|
||||||
|
|
||||||
|
/home/profit/.local/share/deepface-venv/bin/python \
|
||||||
|
scripts/staffing/tag_face_pool.py
|
||||||
|
|
||||||
|
Idempotent: rows that already have BOTH gender and race tagged are
|
||||||
|
skipped. Pass --force to re-tag everything.
|
||||||
|
|
||||||
|
Mapping deepface buckets → /headshots/ ?e= values:
|
||||||
|
asian → split by manual region (deepface doesn't differentiate
|
||||||
|
East / South Asian; we lump as 'east_asian' since the
|
||||||
|
StyleGAN training set leans East Asian)
|
||||||
|
indian → south_asian
|
||||||
|
middle eastern → middle_eastern
|
||||||
|
black → black
|
||||||
|
hispanic → hispanic
|
||||||
|
white → caucasian
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
|
||||||
|
DEEPFACE_RACE_TO_HINT = {
|
||||||
|
"asian": "east_asian",
|
||||||
|
"indian": "south_asian",
|
||||||
|
"middle eastern": "middle_eastern",
|
||||||
|
"black": "black",
|
||||||
|
"latino hispanic": "hispanic",
|
||||||
|
"hispanic": "hispanic",
|
||||||
|
"white": "caucasian",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
p = argparse.ArgumentParser()
|
||||||
|
p.add_argument(
|
||||||
|
"--out",
|
||||||
|
default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots"),
|
||||||
|
)
|
||||||
|
p.add_argument("--force", action="store_true", help="re-tag rows that already have gender+race")
|
||||||
|
p.add_argument("--limit", type=int, default=0, help="cap how many faces to process this run (0 = all)")
|
||||||
|
p.add_argument("--min-age", type=int, default=22, help="exclude faces estimated below this age (kids/teens). Staffing context = legal-age workers only.")
|
||||||
|
args = p.parse_args()
|
||||||
|
|
||||||
|
out = os.path.realpath(args.out)
|
||||||
|
manifest_path = os.path.join(out, "manifest.jsonl")
|
||||||
|
if not os.path.exists(manifest_path):
|
||||||
|
print(f"manifest not found: {manifest_path}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
print(f"loading deepface (cold start ~10-15s for first model build)…")
|
||||||
|
from deepface import DeepFace # type: ignore
|
||||||
|
|
||||||
|
rows = []
|
||||||
|
with open(manifest_path) as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
rows.append(json.loads(line))
|
||||||
|
print(f"manifest: {len(rows)} rows")
|
||||||
|
|
||||||
|
todo = [
|
||||||
|
r for r in rows
|
||||||
|
if args.force or r.get("gender") is None or r.get("race") is None or r.get("age") is None
|
||||||
|
]
|
||||||
|
if args.limit > 0:
|
||||||
|
todo = todo[: args.limit]
|
||||||
|
print(f"to tag: {len(todo)} faces")
|
||||||
|
|
||||||
|
if not todo:
|
||||||
|
print("nothing to do.")
|
||||||
|
return
|
||||||
|
|
||||||
|
counts_g = {}
|
||||||
|
counts_r = {}
|
||||||
|
failed = 0
|
||||||
|
t0 = time.time()
|
||||||
|
for i, r in enumerate(todo):
|
||||||
|
full = os.path.join(out, r["file"])
|
||||||
|
try:
|
||||||
|
ana = DeepFace.analyze(
|
||||||
|
img_path=full,
|
||||||
|
actions=["gender", "race", "age"],
|
||||||
|
enforce_detection=False,
|
||||||
|
silent=True,
|
||||||
|
)
|
||||||
|
if isinstance(ana, list):
|
||||||
|
ana = ana[0] if ana else {}
|
||||||
|
g_raw = (ana.get("dominant_gender") or "").lower().strip()
|
||||||
|
r["gender"] = (
|
||||||
|
"man" if g_raw.startswith("man") else
|
||||||
|
"woman" if g_raw.startswith("woman") else
|
||||||
|
None
|
||||||
|
)
|
||||||
|
r_raw = (ana.get("dominant_race") or "").lower().strip()
|
||||||
|
r["race"] = DEEPFACE_RACE_TO_HINT.get(r_raw, None)
|
||||||
|
if r["race"] is None and r_raw:
|
||||||
|
r["race_raw"] = r_raw
|
||||||
|
# Age estimation — exclude minors / teens. Staffing context
|
||||||
|
# uses adult workers only. Threshold is 22 by default
|
||||||
|
# (legal + a buffer because age estimation is noisy).
|
||||||
|
try:
|
||||||
|
age = int(round(float(ana.get("age") or 0)))
|
||||||
|
except Exception:
|
||||||
|
age = 0
|
||||||
|
r["age"] = age
|
||||||
|
if age and age < args.min_age:
|
||||||
|
r["excluded"] = "minor"
|
||||||
|
else:
|
||||||
|
r.pop("excluded", None)
|
||||||
|
counts_g[r["gender"] or "unknown"] = counts_g.get(r["gender"] or "unknown", 0) + 1
|
||||||
|
counts_r[r["race"] or r_raw or "unknown"] = counts_r.get(r["race"] or r_raw or "unknown", 0) + 1
|
||||||
|
except Exception as e:
|
||||||
|
r["tag_error"] = f"{type(e).__name__}: {e}"
|
||||||
|
failed += 1
|
||||||
|
if (i + 1) % 25 == 0 or (i + 1) == len(todo):
|
||||||
|
elapsed = time.time() - t0
|
||||||
|
rate = (i + 1) / elapsed if elapsed > 0 else 0
|
||||||
|
eta = (len(todo) - i - 1) / rate if rate > 0 else 0
|
||||||
|
print(f" [{i+1}/{len(todo)}] rate={rate:.1f}/s eta={eta:.0f}s failed={failed}")
|
||||||
|
print(f" gender: {counts_g}")
|
||||||
|
print(f" race : {counts_r}")
|
||||||
|
|
||||||
|
# Write updated manifest atomically
|
||||||
|
tmp = manifest_path + ".tmp"
|
||||||
|
with open(tmp, "w") as f:
|
||||||
|
for r in rows:
|
||||||
|
f.write(json.dumps(r) + "\n")
|
||||||
|
os.replace(tmp, manifest_path)
|
||||||
|
|
||||||
|
final_g = {}
|
||||||
|
final_r = {}
|
||||||
|
excluded = 0
|
||||||
|
age_hist = {"<18": 0, "18-22": 0, "22-30": 0, "30-40": 0, "40-50": 0, "50-60": 0, "60+": 0, "unknown": 0}
|
||||||
|
for r in rows:
|
||||||
|
if r.get("excluded"):
|
||||||
|
excluded += 1
|
||||||
|
continue
|
||||||
|
final_g[r.get("gender") or "untagged"] = final_g.get(r.get("gender") or "untagged", 0) + 1
|
||||||
|
final_r[r.get("race") or "untagged"] = final_r.get(r.get("race") or "untagged", 0) + 1
|
||||||
|
a = r.get("age") or 0
|
||||||
|
if a == 0: age_hist["unknown"] += 1
|
||||||
|
elif a < 18: age_hist["<18"] += 1
|
||||||
|
elif a < 22: age_hist["18-22"] += 1
|
||||||
|
elif a < 30: age_hist["22-30"] += 1
|
||||||
|
elif a < 40: age_hist["30-40"] += 1
|
||||||
|
elif a < 50: age_hist["40-50"] += 1
|
||||||
|
elif a < 60: age_hist["50-60"] += 1
|
||||||
|
else: age_hist["60+"] += 1
|
||||||
|
print(f"\nDone. {len(rows)} rows, {excluded} excluded as <{args.min_age}, {failed} tag errors, {time.time()-t0:.1f}s")
|
||||||
|
print(f" final gender: {final_g}")
|
||||||
|
print(f" final race : {final_r}")
|
||||||
|
print(f" age dist : {age_hist}")
|
||||||
|
print(f"\nNext: poke /headshots/__reload to refresh the in-memory pool.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Loading…
x
Reference in New Issue
Block a user