llm_team_ui: ban-system defense in depth (extends OB-4)

The OB-4 fix at 939dfdd was incomplete. It bypassed the path-regex EXPLOIT check for allowlisted-IP admins, but _track_violation is called from THREE sites (exploit_scan, rate_limit, login_fail) and only the exploit_scan path had the bypass. An admin who hit a rate limit + had a session timeout + entered a wrong password within 60s could still self-ban via the OTHER paths, ending up locked out through the back door. This commit adds 4 layers of defense in depth, each independently sufficient to stop an allowlisted IP from being banned: 1. _track_violation: bail early if is_allowlisted(ip). Allowlisted IPs never accumulate violations from ANY path. Plus eviction sweep when _violation_tracker grows >10K (same shape as the _rate_limit eviction at 266de61). 2. _auto_escalate: re-check is_allowlisted before issuing any ban. Defense in depth — if a future call path bypasses #1, this catches it. 3. _nginx_ban: refuse to write the deny rule for allowlisted IPs, even if a buggy caller reached this far. Last write before nginx reload; last place to stop a bad ban. 4. admin_ban_ip: replace `ip.startswith("192.168.")` substring check with the canonical ALLOWLIST_IPS membership test. Pre-fix this only protected one LAN; 10.0.0.0/8, IPv6 loopback ::1, and custom allowlist entries (e.g. an external monitoring IP) were all banable by manual admin error. Now uses the same allowlist as the auto-ban paths. Operationally the admin can no longer self-ban through any path. The auto-escalate ban audit log entries get a corresponding "AUTO_ESCALATE_BLOCKED ip=... — allowlisted" entry instead of the ban firing silently. Same for nginx_ban: NGINX_BAN_BLOCKED entries log the saved bullet for operator review. Builds on 939dfdd (OB-4 path-regex bypass) + 266de61 (rate_limit eviction + auth_login IP gate). Together these three commits close the LLM Team UI's ban-system self-foot-shoot vulnerability surface. Outstanding from the scrum (architectural, separate session): - OB-3 root-running web app + privileged shell calls - Sentinel prompt-injection WARN Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:24:01 -05:00 · 2026-04-30 03:24:01 -05:00 · b09b73c409
commit b09b73c409
parent 266de613b2
1 changed files with 53 additions and 5 deletions
--- a/llm_team_ui.py
+++ b/llm_team_ui.py
@ -109,8 +109,24 @@ _original_sentinel_interval = None  # stash the normal interval during high-aler


 def _track_violation(ip, event_type="unknown"):
-    """Record a security violation. If velocity threshold exceeded, auto-escalate."""
+    """Record a security violation. If velocity threshold exceeded, auto-escalate.
+
+    Cross-lineage scrum 2026-04-30 follow-up: OB-4's path-bypass for
+    admins was incomplete — _track_violation is called from 3 sites
+    (exploit_scan, rate_limit, login_fail) and only the exploit one
+    had the bypass. An admin hitting rate-limit + login-typo + a
+    legit URL containing 'UNION' within 60s could still self-ban
+    via the OTHER paths. Now is_allowlisted bails early so allowlisted
+    IPs never accumulate violations from ANY path. Defense in depth —
+    _auto_escalate also re-checks below.
+
+    Eviction sweep when tracker grows >10K entries (same pattern as
+    _rate_limit; both had identical unbounded-dict WARNs)."""
+    if is_allowlisted(ip):
+        return False
    now = time.time()
+    if len(_violation_tracker) > 10000:
+        _evict_stale_violation_tracker(now)
    if ip not in _violation_tracker:
        _violation_tracker[ip] = []
    _violation_tracker[ip].append(now)
@ -123,8 +139,26 @@ def _track_violation(ip, event_type="unknown"):
    return False


+def _evict_stale_violation_tracker(now):
+    """Drop _violation_tracker entries whose newest timestamp is past
+    the window. Called from _track_violation only when dict exceeds
+    10K — the hot path stays untouched for normal traffic."""
+    cutoff = now - VELOCITY_WINDOW
+    stale = [ip for ip, ts in _violation_tracker.items() if not ts or max(ts) < cutoff]
+    for ip in stale:
+        del _violation_tracker[ip]
+
+
 def _auto_escalate(ip, violation_count, event_type):
-    """Auto-ban IP and switch sentinel to high-alert mode."""
+    """Auto-ban IP and switch sentinel to high-alert mode.
+
+    Defense in depth (2026-04-30): _track_violation already short-
+    circuits for allowlisted IPs, but if a future code path calls
+    _auto_escalate directly we still want the allowlist guard. Bail
+    early; nothing escalates against a trusted IP."""
+    if is_allowlisted(ip):
+        sec_log.info("AUTO_ESCALATE_BLOCKED ip=%s — allowlisted, refused to ban", ip)
+        return
    global _original_sentinel_interval, SENTINEL_INTERVAL
    sec_log.warning("AUTO_ESCALATE ip=%s violations=%d/%ds type=%s", ip, violation_count, VELOCITY_WINDOW, event_type)
    _sentinel_log_entry(f"AUTO_ESCALATE ip={ip} violations={violation_count}/{VELOCITY_WINDOW}s type={event_type}")
@ -7600,7 +7634,16 @@ def _kill_connections(ip):
        pass

 def _nginx_ban(ip):
-    """Add IP to nginx deny list and reload."""
+    """Add IP to nginx deny list and reload.
+
+    Defense in depth (2026-04-30): refuse to write allowlisted IPs
+    to the deny list under ANY circumstance — even a buggy caller
+    that bypassed _track_violation's allowlist check. The deny list
+    is the last write before nginx reload; this is the last place
+    we can stop a bad ban."""
+    if is_allowlisted(ip):
+        sec_log.info("NGINX_BAN_BLOCKED ip=%s — allowlisted, refused to write deny rule", ip)
+        return
    import subprocess
    try:
        line = f"deny {ip};\n"
@ -7642,8 +7685,13 @@ def admin_ban_ip():
    action = data.get("action", "ban")
    if not ip:
        return jsonify({"error": "IP required"}), 400
-    if ip.startswith("192.168."):
-        return jsonify({"error": "Cannot ban LAN addresses"}), 400
+    # Defense in depth (2026-04-30): use the canonical ALLOWLIST_IPS
+    # check rather than a substring on "192.168." which would let
+    # 10.0.0.0/8 LANs and IPv6 loopback ::1 through. Same allowlist
+    # the auto-ban paths now respect — operator can't accidentally
+    # cut off their own LAN gateway.
+    if is_allowlisted(ip):
+        return jsonify({"error": f"refusing to ban allowlisted IP {ip} (in ALLOWLIST_IPS)"}), 400
    try:
        if action == "ban":
            subprocess.run(["fail2ban-client", "set", "llm-team-exploit", "banip", ip],