From 13b01fee9f14055ad12f21403fa3bc08d802b80c Mon Sep 17 00:00:00 2001
From: root <root@island37.com>
Date: Fri, 17 Apr 2026 15:32:06 -0500
Subject: [PATCH] =?UTF-8?q?ADR-021:=20Sparse=20data=20trust=20path=20?=
 =?UTF-8?q?=E2=80=94=20start=20with=20nothing,=20earn=20everything?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The staffing company said: 'we don't have any of that data.'
They're right. We showed a demo with 18-field profiles and they
have a name and a phone number.

This ADR documents the trust path:
  Phase 1 (Day 1): Work with name + phone + role. That's enough.
  Phase 2 (Week 1-4): Timesheets → reliability. Calls → history.
  Phase 3 (Month 2+): AI starts helping with real earned data.

Key principles:
- Never show empty fields or 0% bars
- Show what's THERE, not what's missing
- Trust indicators: 'based on 3 placements' not just 'Reliability: 87%'
- The system earns data by being useful, not by demanding it upfront

Also created sparse_workers dataset (200 workers, 74% have role,
34% have notes, 5 have ONLY name+phone) for realistic testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/ADR-021-sparse-data-trust.md | 93 +++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 docs/ADR-021-sparse-data-trust.md

diff --git a/docs/ADR-021-sparse-data-trust.md b/docs/ADR-021-sparse-data-trust.md
new file mode 100644
index 0000000..652c736
--- /dev/null
+++ b/docs/ADR-021-sparse-data-trust.md
@@ -0,0 +1,93 @@
+# ADR-021: Sparse Data Trust Path — start with nothing, earn everything
+
+**Status:** Proposed — 2026-04-17
+**Triggered by:** Legacy staffing company pushback: "we don't have that data"
+**Owner:** J
+
+---
+
+## The Problem
+
+We demonstrated a system with rich worker profiles (18 fields,
+behavioral scores, certifications, communication history). The
+staffing company said: "We don't have any of that. We have a name,
+a phone number, and a contract."
+
+They're right. Our demo assumed data that doesn't exist in their
+world. Showing AI that only works with perfect data is worse than
+useless — it builds distrust.
+
+## Their Reality
+
+Day 1 data for a typical worker:
+- Name
+- Phone number
+- Maybe: city, role, one or two skills mentioned on a call
+
+Day 1 data for a contract:
+- Client name
+- Role needed
+- Headcount
+- Start date
+- Maybe: required certs, location
+
+That's it. No reliability scores. No availability metrics. No
+communication history. No certifications in a database. The staffing
+coordinator carries that knowledge in their HEAD.
+
+## The Trust Path
+
+### Phase 1: Work with what they have (Day 1)
+- Accept sparse profiles: name + phone + role. That's enough.
+- Match contracts to workers by role + location only. No scores.
+- The system is useful immediately: "here are the 12 people you
+  have tagged as Forklift Operators in IL." That's already faster
+  than scrolling a spreadsheet.
+- Don't show empty fields. Don't show 0% bars. Don't show what's
+  missing — show what's THERE.
+
+### Phase 2: Earn data through use (Week 1-4)
+- Every placement generates a timesheet → reliability starts building
+- Every call logged → communication history accumulates
+- Every check-in → availability becomes real
+- Every cert verified → certification database grows
+- The staffer doesn't "enter data" — they just do their job, and
+  the system learns from it.
+
+### Phase 3: AI starts helping (Month 2+)
+- Enough data to actually score workers → show reliability
+- Enough history to predict availability → surface it
+- Enough placements to know client preferences → suggest matches
+- The enrichment happened BECAUSE they used the system, not as a
+  prerequisite TO use it.
+
+## What This Means for the UI
+
+- Worker cards must gracefully degrade. Name only? Show name only.
+  Name + role? Show name + role. Full profile? Show everything.
+- Never show empty metrics. No "Reliability: 0%" — that looks like
+  the worker is unreliable. Just don't show it until there's data.
+- Lead with what the staffer KNOWS: "you placed this worker at
+  Company X last month" — that's information they trust because they
+  lived it.
+
+## What This Means for the Architecture
+
+- The vector index works on whatever text exists. A name + role is
+  200 characters. That's enough for an embedding. As more data
+  arrives, the embeddings get richer and search gets better.
+- The hybrid search works with sparse SQL filters too. "role =
+  'Forklift Operator'" is a valid filter even without reliability.
+- The playbook system captures the staffer's decisions: "you picked
+  this worker for this contract" → that IS the training data for
+  future AI recommendations.
+
+## Implementation
+
+1. Sparse profile ingest: accept CSV with as few as 2 columns
+   (name, phone). Everything else optional.
+2. Graceful UI degradation: worker cards only show fields that exist
+3. Progressive enrichment: timesheet ingest → auto-calculate
+   reliability; check-in ingest → auto-calculate availability
+4. Trust indicators: "based on 3 placements" not "Reliability: 87%"
+   — show WHERE the number comes from