From 13b01fee9f14055ad12f21403fa3bc08d802b80c Mon Sep 17 00:00:00 2001 From: root Date: Fri, 17 Apr 2026 15:32:06 -0500 Subject: [PATCH] =?UTF-8?q?ADR-021:=20Sparse=20data=20trust=20path=20?= =?UTF-8?q?=E2=80=94=20start=20with=20nothing,=20earn=20everything?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The staffing company said: 'we don't have any of that data.' They're right. We showed a demo with 18-field profiles and they have a name and a phone number. This ADR documents the trust path: Phase 1 (Day 1): Work with name + phone + role. That's enough. Phase 2 (Week 1-4): Timesheets → reliability. Calls → history. Phase 3 (Month 2+): AI starts helping with real earned data. Key principles: - Never show empty fields or 0% bars - Show what's THERE, not what's missing - Trust indicators: 'based on 3 placements' not just 'Reliability: 87%' - The system earns data by being useful, not by demanding it upfront Also created sparse_workers dataset (200 workers, 74% have role, 34% have notes, 5 have ONLY name+phone) for realistic testing. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/ADR-021-sparse-data-trust.md | 93 +++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/ADR-021-sparse-data-trust.md diff --git a/docs/ADR-021-sparse-data-trust.md b/docs/ADR-021-sparse-data-trust.md new file mode 100644 index 0000000..652c736 --- /dev/null +++ b/docs/ADR-021-sparse-data-trust.md @@ -0,0 +1,93 @@ +# ADR-021: Sparse Data Trust Path — start with nothing, earn everything + +**Status:** Proposed — 2026-04-17 +**Triggered by:** Legacy staffing company pushback: "we don't have that data" +**Owner:** J + +--- + +## The Problem + +We demonstrated a system with rich worker profiles (18 fields, +behavioral scores, certifications, communication history). The +staffing company said: "We don't have any of that. We have a name, +a phone number, and a contract." + +They're right. Our demo assumed data that doesn't exist in their +world. Showing AI that only works with perfect data is worse than +useless — it builds distrust. + +## Their Reality + +Day 1 data for a typical worker: +- Name +- Phone number +- Maybe: city, role, one or two skills mentioned on a call + +Day 1 data for a contract: +- Client name +- Role needed +- Headcount +- Start date +- Maybe: required certs, location + +That's it. No reliability scores. No availability metrics. No +communication history. No certifications in a database. The staffing +coordinator carries that knowledge in their HEAD. + +## The Trust Path + +### Phase 1: Work with what they have (Day 1) +- Accept sparse profiles: name + phone + role. That's enough. +- Match contracts to workers by role + location only. No scores. +- The system is useful immediately: "here are the 12 people you + have tagged as Forklift Operators in IL." That's already faster + than scrolling a spreadsheet. +- Don't show empty fields. Don't show 0% bars. Don't show what's + missing — show what's THERE. + +### Phase 2: Earn data through use (Week 1-4) +- Every placement generates a timesheet → reliability starts building +- Every call logged → communication history accumulates +- Every check-in → availability becomes real +- Every cert verified → certification database grows +- The staffer doesn't "enter data" — they just do their job, and + the system learns from it. + +### Phase 3: AI starts helping (Month 2+) +- Enough data to actually score workers → show reliability +- Enough history to predict availability → surface it +- Enough placements to know client preferences → suggest matches +- The enrichment happened BECAUSE they used the system, not as a + prerequisite TO use it. + +## What This Means for the UI + +- Worker cards must gracefully degrade. Name only? Show name only. + Name + role? Show name + role. Full profile? Show everything. +- Never show empty metrics. No "Reliability: 0%" — that looks like + the worker is unreliable. Just don't show it until there's data. +- Lead with what the staffer KNOWS: "you placed this worker at + Company X last month" — that's information they trust because they + lived it. + +## What This Means for the Architecture + +- The vector index works on whatever text exists. A name + role is + 200 characters. That's enough for an embedding. As more data + arrives, the embeddings get richer and search gets better. +- The hybrid search works with sparse SQL filters too. "role = + 'Forklift Operator'" is a valid filter even without reliability. +- The playbook system captures the staffer's decisions: "you picked + this worker for this contract" → that IS the training data for + future AI recommendations. + +## Implementation + +1. Sparse profile ingest: accept CSV with as few as 2 columns + (name, phone). Everything else optional. +2. Graceful UI degradation: worker cards only show fields that exist +3. Progressive enrichment: timesheet ingest → auto-calculate + reliability; check-in ingest → auto-calculate availability +4. Trust indicators: "based on 3 placements" not "Reliability: 87%" + — show WHERE the number comes from