llm_team_ui: 4 fixes from 2026-04-30 cross-lineage scrum
Cross-lineage scrum (Opus 4.7 + Kimi K2.6 + Qwen3-coder via the
local-review-harness chatd) on this codebase surfaced 5 BLOCK-class
issues from Opus + a convergent finding from the harness. This
commit lands the 4 surgical fixes; OB-3 (web app runs as root with
fail2ban-client + systemctl reload nginx + writes to
/etc/nginx/banned_ips.conf) needs an architectural split into a
non-root web tier + a privileged sudo wrapper, deferred for its
own session.
OB-1 — log file open at import crashes app on perm error
Pre-fix: `_sec_handler = logging.FileHandler("/var/log/llm-team-
security.log")` raised PermissionError at import time on any
non-root or fresh-install run, killing the app before Flask
started — failure was silent (no Flask process to inspect logs
on).
Fix: try/except, fall back to StreamHandler(sys.stderr) when
the path is unwritable. App starts; sec_log events still land
in journald via stderr. LLM_TEAM_SECURITY_LOG env var lets
operators override the path.
OB-2 — DB password hardcoded in source (CONVERGENT FINDING)
The `kbuser` Postgres credential
`IPbLBA0EQI8u4TeM2YZrbm1OAy5nSwqC` was leaked in source here
AND in voice-ai/audiosocket_bridge.py + voice-ai/sales_assistant.py.
Caught independently by harness LLM phase (qwen3.5 local) on
voice-ai earlier today AND Opus on this file just now. Same
password, same DB (`knowledge_base`) shared between services,
three reviewers converged.
Fix: source from LLM_TEAM_DB_DSN env var, fail loud on unset.
Operator follow-ups:
1. Rotate the password in Postgres (still in git history;
redacting source doesn't un-leak it).
2. Set LLM_TEAM_DB_DSN in /etc/llm-team-ui.env (mode 0600,
loaded via systemd EnvironmentFile=).
3. Same DSN env-var pattern needs applying to
voice-ai/audiosocket_bridge.py:47 once that branch's
workspace_context WIP lands.
OB-5 — demo_mode default=True ships public access on first boot
Pre-fix: `_demo_mode = {"active": True, ...}` + the demo branch
in login_required let users through without a session. Combined
with /api/run + /api/imagegen proxies, fresh installs were open
LLM/compute abuse surface from first boot.
Fix: default to False; LLM_TEAM_DEMO_MODE=1 env override exists
for the public devop.live deployment systemd unit so the demo
doesn't need a manual flip on every restart, but everywhere else
defaults closed.
OB-4 — EXPLOIT_PATTERNS LAN/admin lockout
Pre-fix: regex matched on `request.path` + query string against
patterns like UNION / SELECT / ;-- / <script /admin.php. Admin
URLs containing those keywords in legitimate ways (e.g. a team
name "select-rebrand" or a docs link /admin/select_a_mode) hit
3 violations in 60s and auto-banned the admin's IP. No allowlist.
Fix: bypass the path-based check for authenticated admins from
an ALLOWLIST_IPS source. Body/UA checks still apply (the prompt-
injection-as-DoS WARN in the scrum is separate). Combination
prevents self-ban without weakening the broader scanner defense.
Plus a .gitignore: /.memory/ — the local-review-harness writes
JSONL findings under <repo>/.memory/ when scanning; harness's own
gitignore is at the harness repo root, not here, so without this
the .memory/ dir would show up as untracked on every harness run
against this tree.
Other Opus WARNs deferred:
- Sentinel feeds attacker-controlled UA into LLM prompt → can
steer ban verdicts. Fix needs prompt-template hardening or
output-validation gate.
- CSP `'unsafe-inline'` defeats most XSS protection (would break
inline scripts; needs HTML refactor).
- _rate_limit unbounded dict + per-worker (needs eviction loop or
Redis-backed counter).
- auth_login first-time setup gated only by COUNT(*)==0 (needs
network-source restriction or a setup token).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
205eff64b4
commit
939dfddb93
1
.gitignore
vendored
1
.gitignore
vendored
@ -2,3 +2,4 @@ __pycache__/
|
|||||||
*.pyc
|
*.pyc
|
||||||
.env
|
.env
|
||||||
*.log
|
*.log
|
||||||
|
.memory/
|
||||||
|
|||||||
@ -3,6 +3,7 @@
|
|||||||
|
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
import time
|
import time
|
||||||
import threading
|
import threading
|
||||||
import secrets
|
import secrets
|
||||||
@ -36,8 +37,23 @@ app.config["SESSION_COOKIE_HTTPONLY"] = True
|
|||||||
app.config["SESSION_COOKIE_SAMESITE"] = "Lax"
|
app.config["SESSION_COOKIE_SAMESITE"] = "Lax"
|
||||||
|
|
||||||
# ─── SECURITY LOGGING ─────────────────────────────────────────
|
# ─── SECURITY LOGGING ─────────────────────────────────────────
|
||||||
# Dedicated security log for fail2ban and audit trail
|
# Dedicated security log for fail2ban and audit trail.
|
||||||
_sec_handler = logging.FileHandler("/var/log/llm-team-security.log")
|
#
|
||||||
|
# Cross-lineage scrum 2026-04-30 (Opus BLOCK OB-1): wrapped in
|
||||||
|
# try/except. Pre-fix this raised PermissionError at import time
|
||||||
|
# when the service user couldn't write /var/log/llm-team-security.log,
|
||||||
|
# crashing the app before Flask started. Now falls back to stderr;
|
||||||
|
# sec_log still works (ban events still land in journald via stderr),
|
||||||
|
# but the app starts. Operator should still create the file with
|
||||||
|
# proper perms on production deploy. Path is overridable via
|
||||||
|
# LLM_TEAM_SECURITY_LOG env var.
|
||||||
|
_LOG_PATH = os.environ.get("LLM_TEAM_SECURITY_LOG", "/var/log/llm-team-security.log")
|
||||||
|
try:
|
||||||
|
_sec_handler = logging.FileHandler(_LOG_PATH)
|
||||||
|
except (PermissionError, FileNotFoundError, OSError) as _log_err:
|
||||||
|
print(f"[security] WARNING: can't open {_LOG_PATH} ({_log_err}); "
|
||||||
|
f"falling back to stderr.", file=sys.stderr, flush=True)
|
||||||
|
_sec_handler = logging.StreamHandler(sys.stderr)
|
||||||
_sec_handler.setFormatter(logging.Formatter("%(asctime)s %(message)s"))
|
_sec_handler.setFormatter(logging.Formatter("%(asctime)s %(message)s"))
|
||||||
sec_log = logging.getLogger("security")
|
sec_log = logging.getLogger("security")
|
||||||
sec_log.addHandler(_sec_handler)
|
sec_log.addHandler(_sec_handler)
|
||||||
@ -150,8 +166,19 @@ def _check_high_alert_expiry():
|
|||||||
|
|
||||||
# IPs that never get rate-limited (your LAN, localhost)
|
# IPs that never get rate-limited (your LAN, localhost)
|
||||||
ALLOWLIST_IPS = {"127.0.0.1", "::1", "192.168.1.1"}
|
ALLOWLIST_IPS = {"127.0.0.1", "::1", "192.168.1.1"}
|
||||||
# Demo mode state — toggled by admin at runtime
|
# Demo mode state — toggled by admin at runtime.
|
||||||
_demo_mode = {"active": True, "started_by": "boot", "showcase": True}
|
#
|
||||||
|
# Cross-lineage scrum 2026-04-30 (Opus BLOCK OB-5): pre-fix this
|
||||||
|
# defaulted to active=True, meaning fresh installs shipped with
|
||||||
|
# public unauthenticated access enabled — login_required let demo
|
||||||
|
# users straight through. Combined with /api/run + /api/imagegen
|
||||||
|
# proxies, that was an open LLM/compute abuse surface from first
|
||||||
|
# boot. Now defaults to active=False; operators flip it on
|
||||||
|
# explicitly via the admin UI or LLM_TEAM_DEMO_MODE=1 env override
|
||||||
|
# (the env override exists for the demo systemd unit so the public
|
||||||
|
# devop.live deployment doesn't need a manual toggle on every restart).
|
||||||
|
_DEMO_DEFAULT = os.environ.get("LLM_TEAM_DEMO_MODE", "0") == "1"
|
||||||
|
_demo_mode = {"active": _DEMO_DEFAULT, "started_by": "boot" if _DEMO_DEFAULT else "off", "showcase": _DEMO_DEFAULT}
|
||||||
|
|
||||||
# Routes that demo users CAN trigger (read-like POSTs — enrichment, self-analysis, team runs)
|
# Routes that demo users CAN trigger (read-like POSTs — enrichment, self-analysis, team runs)
|
||||||
DEMO_ALLOWED_POSTS = {
|
DEMO_ALLOWED_POSTS = {
|
||||||
@ -560,8 +587,24 @@ def security_checks():
|
|||||||
# Check high-alert expiry
|
# Check high-alert expiry
|
||||||
_check_high_alert_expiry()
|
_check_high_alert_expiry()
|
||||||
|
|
||||||
# Exploit scanner detection — log, alert, track velocity, block
|
# Exploit scanner detection — log, alert, track velocity, block.
|
||||||
if EXPLOIT_PATTERNS.search(path) or EXPLOIT_PATTERNS.search(request.query_string.decode("utf-8", errors="ignore")):
|
#
|
||||||
|
# Cross-lineage scrum 2026-04-30 (Opus BLOCK OB-4): pre-fix the
|
||||||
|
# path regex matched on substrings like UNION, SELECT, ;-- and
|
||||||
|
# auto-banned after 3 hits. Admin URLs containing those keywords
|
||||||
|
# in query strings (e.g. an LLM team named "select-rebrand" or a
|
||||||
|
# docs link to /admin/select_a_mode) self-banned the admin's IP.
|
||||||
|
# Now: skip the path-based check for authenticated admins from
|
||||||
|
# an allowlisted IP. The user-agent + body checks (sentinel) still
|
||||||
|
# apply. Allowlisted-IP admins clicking weird URLs no longer
|
||||||
|
# lock themselves out.
|
||||||
|
_skip_exploit_check = False
|
||||||
|
if ip in ALLOWLIST_IPS and session.get("role") == "admin":
|
||||||
|
_skip_exploit_check = True
|
||||||
|
if not _skip_exploit_check and (
|
||||||
|
EXPLOIT_PATTERNS.search(path) or
|
||||||
|
EXPLOIT_PATTERNS.search(request.query_string.decode("utf-8", errors="ignore"))
|
||||||
|
):
|
||||||
sec_log.warning("EXPLOIT_SCAN ip=%s path=%s ua=%s", ip, path, ua)
|
sec_log.warning("EXPLOIT_SCAN ip=%s path=%s ua=%s", ip, path, ua)
|
||||||
_track_violation(ip, "exploit_scan")
|
_track_violation(ip, "exploit_scan")
|
||||||
send_security_alert(
|
send_security_alert(
|
||||||
@ -1888,7 +1931,18 @@ def get_api_key(provider_name):
|
|||||||
env_map = {"openrouter": "OPENROUTER_API_KEY", "openai": "OPENAI_API_KEY", "anthropic": "ANTHROPIC_API_KEY", "ollama_cloud": "OLLAMA_CLOUD_API_KEY"}
|
env_map = {"openrouter": "OPENROUTER_API_KEY", "openai": "OPENAI_API_KEY", "anthropic": "ANTHROPIC_API_KEY", "ollama_cloud": "OLLAMA_CLOUD_API_KEY"}
|
||||||
return os.environ.get(env_map.get(provider_name, ""), "")
|
return os.environ.get(env_map.get(provider_name, ""), "")
|
||||||
|
|
||||||
DB_DSN = "dbname=knowledge_base user=kbuser password=IPbLBA0EQI8u4TeM2YZrbm1OAy5nSwqC host=localhost"
|
# Cross-lineage scrum 2026-04-30 (Opus BLOCK OB-2 + harness LLM
|
||||||
|
# convergent finding): DB_DSN previously had the password hardcoded
|
||||||
|
# in source. Same `kbuser`/`knowledge_base` DSN was leaked in
|
||||||
|
# voice-ai's audiosocket_bridge.py + sales_assistant.py — confirmed
|
||||||
|
# canonical leak by 3 independent reviewers across 2 sessions. Now
|
||||||
|
# sourced from env (set via systemd EnvironmentFile=/etc/llm-team-ui.env).
|
||||||
|
# No silent fallback to the leaked literal — fail loud. The leaked
|
||||||
|
# password is in git history regardless; rotate it in Postgres.
|
||||||
|
DB_DSN = os.environ.get("LLM_TEAM_DB_DSN", "")
|
||||||
|
if not DB_DSN:
|
||||||
|
print("[llm-team-ui] WARNING: LLM_TEAM_DB_DSN not set — DB ops will fail. "
|
||||||
|
"Set in systemd EnvironmentFile or shell env.", file=sys.stderr, flush=True)
|
||||||
|
|
||||||
def get_db():
|
def get_db():
|
||||||
return psycopg2.connect(DB_DSN)
|
return psycopg2.connect(DB_DSN)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user