# ADR-021: Sparse Data Trust Path — start with nothing, earn everything **Status:** Proposed — 2026-04-17 **Triggered by:** Legacy staffing company pushback: "we don't have that data" **Owner:** J --- ## The Problem We demonstrated a system with rich worker profiles (18 fields, behavioral scores, certifications, communication history). The staffing company said: "We don't have any of that. We have a name, a phone number, and a contract." They're right. Our demo assumed data that doesn't exist in their world. Showing AI that only works with perfect data is worse than useless — it builds distrust. ## Their Reality Day 1 data for a typical worker: - Name - Phone number - Maybe: city, role, one or two skills mentioned on a call Day 1 data for a contract: - Client name - Role needed - Headcount - Start date - Maybe: required certs, location That's it. No reliability scores. No availability metrics. No communication history. No certifications in a database. The staffing coordinator carries that knowledge in their HEAD. ## The Trust Path ### Phase 1: Work with what they have (Day 1) - Accept sparse profiles: name + phone + role. That's enough. - Match contracts to workers by role + location only. No scores. - The system is useful immediately: "here are the 12 people you have tagged as Forklift Operators in IL." That's already faster than scrolling a spreadsheet. - Don't show empty fields. Don't show 0% bars. Don't show what's missing — show what's THERE. ### Phase 2: Earn data through use (Week 1-4) - Every placement generates a timesheet → reliability starts building - Every call logged → communication history accumulates - Every check-in → availability becomes real - Every cert verified → certification database grows - The staffer doesn't "enter data" — they just do their job, and the system learns from it. ### Phase 3: AI starts helping (Month 2+) - Enough data to actually score workers → show reliability - Enough history to predict availability → surface it - Enough placements to know client preferences → suggest matches - The enrichment happened BECAUSE they used the system, not as a prerequisite TO use it. ## What This Means for the UI - Worker cards must gracefully degrade. Name only? Show name only. Name + role? Show name + role. Full profile? Show everything. - Never show empty metrics. No "Reliability: 0%" — that looks like the worker is unreliable. Just don't show it until there's data. - Lead with what the staffer KNOWS: "you placed this worker at Company X last month" — that's information they trust because they lived it. ## What This Means for the Architecture - The vector index works on whatever text exists. A name + role is 200 characters. That's enough for an embedding. As more data arrives, the embeddings get richer and search gets better. - The hybrid search works with sparse SQL filters too. "role = 'Forklift Operator'" is a valid filter even without reliability. - The playbook system captures the staffer's decisions: "you picked this worker for this contract" → that IS the training data for future AI recommendations. ## Implementation 1. Sparse profile ingest: accept CSV with as few as 2 columns (name, phone). Everything else optional. 2. Graceful UI degradation: worker cards only show fields that exist 3. Progressive enrichment: timesheet ingest → auto-calculate reliability; check-in ingest → auto-calculate availability 4. Trust indicators: "based on 3 placements" not "Reliability: 87%" — show WHERE the number comes from