Three issues from J 2026-04-30:
1. Silent fail2ban-client / nginx subprocess failures
Pre-fix _execute_ban called fail2ban-client with capture_output=True
but threw away the result. _nginx_ban had bare `except: pass`
swallowing everything. So a non-zero fail2ban exit (jail not
configured, IP already banned, IPv6 quirk) or PermissionError
on /etc/nginx/banned_ips.conf logged "AI_BAN" while the attacker
walked through unimpeded. The "errors in logs" J was seeing.
Now every subprocess failure surfaces:
- FAIL2BAN_FAILED rc=N stderr=... — non-zero exit
- FAIL2BAN_TIMEOUT — client didn't return in 5s
- FAIL2BAN_NOT_INSTALLED — binary missing
- NGINX_BAN_WRITE_DENIED — permission error on conf file
- NGINX_RELOAD_FAILED rc=N stderr=... — systemctl reload non-zero
- NGINX_RELOAD_TIMEOUT / NGINX_RELOAD_NO_SYSTEMCTL — runtime gaps
sec_log.error catches these so journalctl -u llm-team-ui shows
the actual reason a ban didn't stick.
2. AI auto-scan failure callback when model is busy
Pre-fix Ollama unreachable / busy / timeout silently preserved
log position + skipped the scan. Operator only learned about
the gap by manually checking sentinel-status. Now:
- 1 retry inside same scan after SENTINEL_AI_RETRY_DELAY_SECS
(30s) on connection error / timeout / 429 / 503
- 4xx errors that won't recover (404 model missing, 400 bad
prompt) fail fast without retrying
- consecutive_ai_failures counter in _sentinel_stats
- On 3+ consecutive failures, send_security_alert() fires —
"Sentinel AI unreachable" email with last error + endpoint
+ model name. One alert per outage (ai_busy_alerted flag);
clears on first successful scan so flapping doesn't spam.
- AI_RECOVERED log line on first scan after a streak.
3. Sentinel ban path still substring-matched 192.168
Same vulnerability class as admin_ban_ip had — only protected
one /16. Replaced 4 sites with is_allowlisted(ip):
- threat-list display filter (line 7638): now hides ALL
allowlisted IPs from the panel
- mass-ban API (line 8016): refuses ban for any allowlisted IP
- sentinel analysis filter (line 12786): saves AI tokens by
never sending allowlisted-IP traffic to the judge
- sentinel ban verdict gate (line 12949): defense in depth —
even if the AI says "ban" on an allowlisted IP, this catches it
Combined with the layered defenses in b09b73c (track_violation,
_auto_escalate, _nginx_ban, admin_ban_ip), there is now no
code path that can ban an allowlisted IP. Operator self-ban
is structurally impossible.
Privilege note: the systemd unit at /root/llm-team-ui/llm-team-ui.service
runs as User=root, so subprocess.run(["fail2ban-client", ...]) and
systemctl reload nginx have permission. The "errors in logs" J was
seeing weren't permission-denied; they were silent non-zero exits.
The new subprocess wrappers surface those.
If the operator later splits the app into a non-root tier
(Opus OB-3 architectural recommendation, deferred), this same
infrastructure still works — the wrappers will then surface
"PermissionError" with full path + uid context, telling the
operator exactly which command needs sudo NOPASSWD or
PolicyKit rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
LLM Team UI - Full-stack local AI orchestration platform
Languages
Python
97.4%
Shell
2.6%