lakehouse/docs/PYTHON_INVENTORY.md

# Python inventory — what's wired vs dead

Generated 2026-04-22. Enumerates every `.py` file in the repo and classifies each by actual integration status. Goal: surface the "ambiguous names that don't hook in" J flagged.

## Classification

- **🟢 Production** — running as a systemd service, or imported by code that does
- **🟡 Documented** — referenced in `docs/PRD.md` / `docs/PHASES.md` / `docs/CONTROL_PLANE_PRD.md` or called from a shell/TS orchestrator
- **🟠 Manual** — runnable as a one-off utility but not in any automated pipeline
- **🔴 Dead** — not imported, not referenced, not called anywhere

## Results

### Sidecar (`sidecar/sidecar/`)

The Python FastAPI sidecar on `:3200`. Runs as `lakehouse-sidecar.service`.

| File | Status | Evidence |
|---|---|---|
| `main.py` | 🟢 Production | systemd ExecStart: `uvicorn sidecar.sidecar.main:app` |
| `admin.py` | 🟢 Production | imported by `main.py` as `admin_router` |
| `embed.py` | 🟢 Production | imported by `main.py` as `embed_router` |
| `generate.py` | 🟢 Production | imported by `main.py` as `generate_router` |
| `rerank.py` | 🟢 Production | imported by `main.py` as `rerank_router` |
| `ollama.py` | 🟢 Production | imported transitively by embed/generate/rerank |
| `__init__.py` | 🟢 Production | package marker |
| `lab_ui.py` | 🔴 **Dead** | **not imported by any file in `sidecar/`. `scripts/serve_lab.py` only uses stdlib — doesn't import this.** Mtime 2026-04-16 (pre-session). Safe to delete. |
| `pipeline_lab.py` | 🔴 **Dead** | **same — not imported anywhere. Mtime 2026-04-16.** Docstring says "iterative embedding/LLM pipeline experimentation" — looks like an abandoned prototype. Safe to delete. |

### Scripts (`scripts/`)

17 files. Only **2** are production (behind systemd), **3** are documented in phase/PRD docs, **12** are unreferenced.

| File | Status | Evidence |
|---|---|---|
| `serve_ui.py` | 🟢 Production | `lakehouse-ui.service` ExecStart |
| `serve_imagegen.py` | 🟢 Production | `imagegen.service` ExecStart |
| `serve_lab.py` | 🟢 Production | `pipeline-lab.service` ExecStart |
| `kb_measure.py` | 🟡 Documented | referenced in `docs/PHASES.md` (Phase 22 aggregator) |
| `kb_staffer_report.py` | 🟡 Documented | referenced in `docs/PHASES.md` + `docs/PRD.md` (Phase 23), also called by `scripts/run_staffer_demo.sh` |
| `autonomous_agent.py` | 🟠 Manual | "Autonomous stress-test agent" — no docs refs, no callers. Mtime 2026-04-17. |
| `copilot.py` | 🟠 Manual | "Staffing Co-Pilot — the anticipation layer" — no docs refs. Writes briefing to `/tmp/copilot_briefing.json` on run. |
| `generate_demo.py` | 🟠 Manual | "realistic demo datasets" — mtime 2026-03-27, never touched this session |
| `generate_workers.py` | 🟠 Manual | "worker profiles at scale" — usage example in docstring: `python3 generate_workers.py 100000 > /tmp/workers_100k.csv`. Still useful, just not auto-run. |
| `lance_tune.py` | 🟠 Manual | "IVF_PQ parameter sweep" — one-shot experiment tool. |
| `qwen3_plan.py` | 🟠 Manual | "Qwen 3 agent plan — structured test with playbook building" |
| `quality_eval.py` | 🟠 Manual | "Quality evaluation pipeline — tests whether the system gives CORRECT" |
| `scale_test.py` | 🟠 Manual | called by `scripts/scale_10m_test.sh` (not in systemd). 2.5M row test. |
| `staffing_day.py` | 🟠 Manual | "Real-world staffing agency day simulation" — no docs refs. Overlaps conceptually with `tests/multi-agent/scenario.ts`. |
| `staffing_demo.py` | 🟠 Manual | "Realistic staffing company data generator" — demo fixture. |
| `staffing_simulation.py` | 🟠 Manual | "multi-agent stress test" — overlaps with `tests/multi-agent/scenario.ts`. |
| `stress_test.py` | 🟠 Manual | "prove this architecture is sound or find where it breaks" — overlaps with `tests/multi-agent/run_stress.ts`. |

## Overlap with TypeScript / Rust equivalents

Several Python scripts overlap conceptually with newer TS/Rust code. This is the "ambiguous names that don't hook in" pattern — the Python originals were superseded but not removed.

| Python script | Newer equivalent | Integration drift |
|---|---|---|
| `scripts/staffing_simulation.py` | `tests/multi-agent/scenario.ts` | scenario.ts ships + observer-integrated (Phase 24); Python version is unwired. |
| `scripts/staffing_day.py` | `tests/multi-agent/scenario.ts` | same. |
| `scripts/stress_test.py` | `tests/multi-agent/run_stress.ts` | TS version is the one the auditor fixture exercises. |
| `scripts/autonomous_agent.py` | `bot/` + `auditor/` Bun sub-agents | new Bun pattern supersedes. Python version predates the control-plane pivot. |
| `scripts/qwen3_plan.py` | `tests/multi-agent/agent.ts` | TS agent is the one with Phase 21 continuation + cloud routing. |

**Recommended cleanup actions (not in this PR — separate decision):**

1. **Delete the two confirmed dead sidecar files:** `sidecar/sidecar/lab_ui.py`, `sidecar/sidecar/pipeline_lab.py`. Nothing calls them; removal is safe.
2. **Archive the 5 overlapping scripts** (staffing_simulation, staffing_day, stress_test, autonomous_agent, qwen3_plan) — either move to `scripts/legacy/` with a README explaining they predate the TS versions, or delete if J confirms no future use.
3. **Keep the one-off utilities** (generate_demo, generate_workers, lance_tune, quality_eval, copilot, scale_test, staffing_demo) — useful as manual tools. Maybe add a `scripts/README.md` documenting what each does and when to run.
4. **Promote `kb_measure.py` and `kb_staffer_report.py`** to be actually callable from the KB flow — they're documented but only run manually after scenario batches.

## Honest next step

The cleanup above isn't this session's work — it's a subsequent PR. This doc lands first so the state is visible and operator-auditable.