Bugs fixed:
- Ratchet loop had no trial cap — experiment #1 ran 3762 trials
unchecked. Now capped at max_trials=50 per start cycle.
- meta_pipelines, meta_runs, self_reports tables had no GRANT
for kbuser — fixed permissions for all tables and sequences.
All 4 running experiments auto-paused on restart.
Stress test confirms all tables accessible, all models responding,
meta-pipeline creation working, self-report save/retrieve working.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Engine:
- Chains modes in sequence: extract → research → validate → debate → synthesize
- Each stage feeds its output to the next as input
- Runs same pipeline with different model sets (one model per iteration)
- Auto-scores final output using judge model (1-10)
- Keeps best result across all iterations
- All stage results + final outputs saved to meta_runs table
4 preset pipelines:
1. Security Deep Dive — security logs through 5-stage analysis
2. Run History Insights — team run data through 4-stage extraction
3. Threat Intel Enrichment — profiled IPs through 5-stage analysis
4. Cross-Report Synthesis — past self-reports through 4-stage debate
Database:
- meta_pipelines: name, source, stages, status, best_score, iterations
- meta_runs: per-iteration stage results, final output, score, models
API:
- POST /api/meta-pipeline — create pipeline from preset
- POST /api/meta-pipeline/:id/start — run in background
- POST /api/meta-pipeline/:id/stop — halt execution
- GET /api/meta-pipelines — list all with live status
- GET /api/meta-pipeline/:id — full detail with all iteration results
UI (Lab page):
- Magenta-bordered Meta-Pipeline card with 4 clickable presets
- Click preset → creates + auto-starts pipeline
- Pipeline list with live status dots, progress, scores
- Click pipeline → drill-down with per-iteration results
- Each stage expandable (click to show output)
- Best output highlighted in green border
- Auto-refreshes every 5 seconds during runs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- self_reports table: report_type, model, report text, data_size, timestamp
- Reports auto-saved on generation (no extra step needed)
API:
- GET /api/self-reports — list all past reports (id, type, model, size, date)
- GET /api/self-reports/:id — full report text
UI:
- "✓ Saved as report #N" indicator after generation
- "Past Reports (N)" section below self-analysis buttons
- Click any past report → expands inline (toggle on/off)
- Shows: type, model, timestamp for each saved report
- Reports persist across page refreshes and restarts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 one-click self-analysis reports in Lab:
1. Threat Intelligence Report — security logs → attack taxonomy,
attacker profiling, predictive analysis, recommendations
2. Model Performance Analysis — 96 team runs → usage patterns,
model workload, response efficiency, optimization opportunities
3. Usage Analytics — nginx access logs → traffic patterns, feature
usage, user journey mapping, UX recommendations
4. Security Posture Assessment — combined audit of security logs,
sentinel verdicts, fail2ban, threat intel DB → risk rating
API: POST /api/self-analyze
- type: threat_intel|model_performance|access_patterns|security_posture
- model: which local model to use (default qwen2.5)
- Returns structured report from real system data
Lab UI:
- Green-bordered Self-Analysis card above experiment templates
- Click any report → runs analysis in background → result panel
expands inline with full report (scrollable, closeable)
- Loading state shows "Analyzing..." during generation
Each report analyzes REAL data: actual security logs, actual run
history, actual nginx access patterns — not synthetic test data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Templates section below experiment list:
BASIC — Better Summaries (3 eval cases)
Optimize summarization quality. Tests across biology, history,
and technical content. Shows the simplest Lab workflow.
INTERMEDIATE — Code Explainer (4 eval cases)
Find the best prompt+model to explain code to non-programmers.
Tests loops, recursion, error handling, comprehensions.
Shows how the ratchet evolves system prompts.
ADVANCED — Security Analyst Persona (5 eval cases)
Evolve a cybersecurity AI across threat classification, executive
summaries, developer education, incident response, and forensics.
Tests multi-audience adaptation and domain expertise.
Click any template → auto-fills the create form with name, objective,
metric, all eval cases, and selects all available models. User can
modify before creating.
Each template card shows: level badge (green/amber/red), name,
eval case count, and a description explaining what the experiment
does and why it matters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Full theme swap: amber accents, JetBrains Mono, 2px borders
- Animated dot-grid background + scanlines
- Backdrop-filter blur on cards
- Status pills: square with borders (was rounded)
- Model chips: square with 2px borders
- Chart wraps: dark background with 2px borders
- Trial items: monospace numbers and scores
- Best config box: monospace with green border
- Nav bar with links to Team, History, Admin, Logs
- Toast: monospace with fade-out animation
- Config textarea: monospace font with dark background
- Responsive: tabs compact on mobile
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /history page (replaces slide-out panel):
- Full-page data table: ID, Mode, Prompt, Models, Tags, Date
- Active/Archived/All view toggle
- Filter by mode, tag, or search text
- Checkbox select for bulk archive/restore
- Click any row → detail panel with full responses
Per-run detail:
- Inline tag editor: add tags (Enter), remove tags (click ✕)
- Notes textarea with auto-save (1s debounce)
- Archive/Restore/Delete buttons
- Collapsible response cards (click header to expand)
Database:
- tags TEXT[] column with GIN index for fast tag queries
- notes TEXT column for freeform annotations
APIs:
- POST /api/runs/:id/tags — update tags and/or notes
- GET /api/runs/tags — list all unique tags in use
- GET /api/runs/vectors — structured text documents for AI/embedding
Returns: mode, prompt, models, date, tags, notes + all response text
Filters: ?mode=, ?tag=, ?limit=
Each doc includes token estimate for embedding planning
Main UI: History button now links to /history page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- Added 'archived' boolean column to team_runs (indexed)
- Active runs filtered by archived=false by default
API:
- GET /api/runs?show=active|archived|all
- POST /api/runs/:id/archive — archive single run
- POST /api/runs/:id/restore — restore single run
- POST /api/runs/bulk-archive — archive/restore by IDs or date
History panel UI:
- Active/Archived toggle tabs at top
- Per-run Archive button (magenta) in detail view
- Per-run Restore button (green) in detail view for archived runs
- "Archive All" bulk button when viewing active runs
- "Restore All" bulk button when viewing archived runs
- Archived runs hidden from active view, accessible anytime
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sentinel thread sets next_scan_ts = time.time() + interval BEFORE sleeping
- API returns next_scan_in derived from real next_scan_ts, not estimated
- Frontend calculates server clock offset and counts down to the actual
target timestamp — refresh shows the same remaining time, not a reset
- Shows ✓ in green when scan fires, resumes countdown on next poll
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Entire sentinel status fits in one header row now
- Mini 28px countdown ring (was 64px) inline with title
- Scans/bans counts inline as text, not grid boxes
- Verdicts collapsed by default — click to expand
- Card padding reduced (8px vs 14px)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SVG progress ring shows time until next scan (magenta arc)
- Countdown ticks every second: "245s → 244s → ... → scanning..."
- Ring fills as time progresses, resets on scan
- Turns green and shows "scanning..." when timer hits 0
- Stats grid: Scans count, AI Bans count, Last Run time, Interval
- Backend API returns elapsed_since_scan and next_scan_in
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- threat_intel table with full enrichment data per IP
- UPSERT on IP — re-enriching updates existing record
- Stores: geo, AI analysis, web-check results, indicators, raw JSON
- Indexed on IP (unique), threat_level, enriched_at
Auto-save:
- Every enrichment auto-saves to DB (step 5 in enrichment pipeline)
- "Saved to Wall of Shame database" indicator in enrichment panel
- No duplicate scans — re-enrich updates the existing record
Wall of Shame tab (/logs):
- Stats bar: Total Profiled, Critical, High, Proxies, Automated
- Sortable table: IP, Threat, Type, Summary, Country, Ports
- Click any row to expand full detail:
ISP, Org, ASN, City, Proxy/Hosting flags, Confidence,
Blocklist count, Pattern, Recommendation, Indicators
- All data persists across restarts — no re-scanning needed
API:
- /api/admin/wall-of-shame — list all enriched IPs with sorting/filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enrichment AI prompt:
- Explicitly states this is a PRIVATE application
- Strict threat level rules: 10+ blocklists = always critical,
exploit scans = always critical, SSH-only = suspicious
- Added "compromised_host" classification option
- Recommendation options: ban permanently, ban 24h, monitor, ignore
Sentinel batch prompt:
- "Err on the side of banning" directive
- .env.production/.env.local probing = targeted recon, instant ban
- When in doubt, BAN — private server has no public scanning excuse
- Tighter rules for automated UA detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now queries 6 web-check endpoints per IP:
- ports — open port scan
- dns — reverse DNS / PTR records
- block-lists — DNS blocklist check (AdGuard, CloudFlare, etc.)
- trace-route — full network path with per-hop latency
- headers — HTTP response headers (server, powered-by, etc.)
- status — HTTP status code and response time
Frontend rendering:
- Traceroute displayed as hop chips with latency: IP (45ms) → IP (56ms)
- HTTP status with response time
- Server headers inline
- Errors silently skipped (many endpoints fail on raw IPs)
AI analysis now includes:
- Blocklist count and names in prompt
- Traceroute hops in prompt for network path analysis
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Web-check (ports, DNS, blocklists) now runs as step 3, AI analysis
as step 4. AI prompt includes open ports and blocklist status for
richer threat verdicts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup:
- lissy93/web-check running in Docker on port 3000
- Queries ports, DNS, and blocklist endpoints per IP
Enrichment now includes 4 layers:
1. Geolocation (ip-api.com) — country, ISP, proxy/hosting flags
2. Web-Check deep scan — open ports, DNS/PTR, blocklist status
3. Security log aggregation — all activity for that IP
4. AI analysis (qwen2.5) — gets ALL above data as context
Frontend rendering:
- Open ports displayed in red (security risk indicators)
- Blocklist status: "3/8 blocked (AdGuard, AdGuard Family, ...)"
- Reverse DNS (PTR records)
- All data feeds into AI analysis prompt for richer verdicts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sorting:
- Sort by: hits, threat level, recent activity, banned status
- Active sort button highlighted in amber
Mass operations:
- Checkbox per IP for multi-select
- "Ban Selected" / "Unban Selected" buttons with confirmation
- /api/admin/security/mass-ban endpoint handles batch operations
- Selection counter shows "N selected"
IP Enrichment (click "Enrich" button per IP):
- Geolocation via ip-api.com (country, city, ISP, org, AS number)
- Proxy/hosting/mobile detection flags (red for proxy/hosting)
- AI threat analysis via local qwen2.5:
- Threat level, classification, confidence score
- Attack pattern description
- Specific indicators list
- Automated detection flag
- Actionable recommendation
- Enrichment panel expands inline below the IP card (toggle)
Per-IP drill-down:
- Expandable raw log lines per IP (click to show/hide)
- User agent listing with count
- First seen / last seen timestamps
- HTTP method breakdown (GET:5 POST:2)
- AI sentinel verdicts shown inline
- Jail information for banned IPs
Enhanced backend:
- Security API returns per-IP log lines, first_seen, methods, event_types
- AI verdicts attached to IP records
- Multiple UA detection (fingerprint: rotating scanner)
- Sort parameter support (?sort=threat|hits|recent|banned)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background thread runs qwen2.5 to analyze new security log entries:
- Aggregates new entries by IP since last scan
- Sends batch to local LLM with security analysis prompt
- LLM classifies each IP: threat level, action, attack type, reason
- Auto-bans IPs the AI recommends banning (via fail2ban)
- Logs all verdicts and bans to /var/log/llm-team-sentinel.log
- Logs AI bans to security log as AI_BAN events
API:
- /api/admin/sentinel — sentinel status, stats, recent verdicts
Threat Intel tab enhancement:
- Sentinel status card with magenta accent (distinct from threat cards)
- Shows: model, scan count, ban count, last run, interval
- Recent AI verdicts table: action, IP, attack type, reason
- Errors displayed inline
Security prompt tuning:
- Explicit rules for common attack patterns
- Low temperature (0.1) for consistent classification
- JSON-only response format for reliable parsing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Highlander pattern — one view at each level, clean transitions:
Level 1 - Run List:
- Active runs (live, with progress bars)
- Recent runs (in-memory session runs)
- History from DB (all saved runs, click to drill down)
Level 2 - Pipeline Detail (click any DB run):
- Breadcrumb nav: Monitor → mode #id
- Header card with mode, models, timestamp, full prompt
- Step timeline with dot indicators on a vertical line
- Each step shows: model, role tag, character count, token estimate
- Green dots for completed, red for errors
Level 3 - Response Text (click any step):
- Accordion expand/collapse on click
- Full response text in monospace scrollable container
- Smooth max-height transition
Architecture ready for Level 4 (future AI comparison):
- Responses are individually addressable by step index
- Role-based grouping visible in timeline
- Side-by-side view can be added per-step
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /logs page with 5 tabs:
- App Log (journalctl for llm-team-ui service)
- Run History (all completed runs with errors inline)
- Nginx Errors (with red highlighting)
- Nginx Access (with color-coded status codes)
- Security Log (fail2ban/exploit detection)
Features:
- Live text filter (grep-style)
- Configurable line limit (50-500)
- Auto-refresh every 10s
- Run history shows mode, user, duration, response count, errors
- Error lines highlighted red, warnings amber
- Status codes color-coded (2xx green, 3xx blue, 4xx amber, 5xx red)
Error linking:
- Stream errors in main UI link to /admin/monitor
- Error response cards have "View error details" link
- Error cards styled with red border and monospace body
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted
to 2048 tokens. Research mode with 15 questions builds prompts that
exceed model context windows, causing Ollama to hang until the 300s
timeout.
Fix:
- Calculate num_ctx from prompt size + 1024 token response buffer
- Cap at model's actual context limit
- Truncate prompts that exceed context window minus 512 response tokens
- Uses smart_truncate() to preserve start + end of prompt
- Updated MODEL_CONTEXT map with accurate limits for all local models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 real-time metrics in the progress panel:
- Elapsed time (updates every 500ms)
- Models active/total (tracks unique models as they respond)
- Responses received (count)
- Estimated tokens (~chars/4)
- Data received (formatted KB)
- SSE events (total protocol events)
- Errors (turns red if > 0)
- Heartbeats (keepalive count)
Metrics update every 500ms during run. On completion, all
metric values turn green. Magenta/purple theme for metric
values, micro labels underneath.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Border: magenta (#d946ef) with purple glow
- Fill: gradient from magenta → purple → cyan
- Shimmer animation sweeps across the fill
- Step indicators: magenta active pulse with glow
- Completed steps: magenta→green gradient
- Phase labels: bright green with gradient fade line
- Completion: green→cyan gradient with green glow
- 8px height track (was 6px) for better visibility
- All text in progress panel uses purple/pink tones
- Clearly distinct from the amber UI elements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Progress panel is now position:sticky at top of output — always visible
- Phase labels (─── scouting ───, ─── researching ───, etc.) appear
between response cards when the pipeline role changes
- Auto-scroll to latest response card as they arrive
- Completion state shows response count and fades after 5s
- Clear previous errors: all 'input stream' errors were caused by
service restarts during in-flight runs, not code bugs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Enable Flask threaded=True for concurrent request handling
- Refactor generate() to use producer-consumer queue pattern:
- Runner executes in background thread, pushes events to queue
- Heartbeat thread sends keepalive every 10s independently
- Generator reads from queue — stream never goes silent
- Brainstorm mode: stream responses as they arrive (was waiting for all)
- Prevents nginx/browser timeout during long model queries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend:
- Active run tracking with step/substep/error state
- SSE keepalive heartbeat every 15s to prevent nginx timeout
- Run log (last 100 completed runs with timing/errors)
- Research mode: per-question progress, context caps, graceful failures
- Hard cap on research questions (15), answer truncation (8K chars)
Frontend:
- Real progress bar with step segments, elapsed time, event counter
- Progress shimmer animation, step completion indicators
- Improved error display with timing context
- Green completion state with fade
Admin:
- /admin/monitor — live process dashboard
- Stats: active runs, completed, errors, avg duration
- Active run cards with live progress, substep detail, errors
- Recent run history with error traces
- Auto-polls every 3 seconds
- Full retro-brutalist theme matching main UI
Nginx:
- proxy_read_timeout 600s, proxy_send_timeout 600s
- proxy_buffering off for SSE streaming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New color palette: amber/gold accent, deep black backgrounds
- JetBrains Mono for headings, labels, and system text
- 2px borders, 2px border-radius (brutalist)
- Animated dot-grid background canvas with random scanline artifacts
- CRT scanline overlay + vignette effect
- Backdrop-filter blur on panels for glass depth
- Pulsing status dot, amber glow effects
- Login page: full retro treatment with sys-tag footer
- All functional elements preserved
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three demo prompts per mode (basic/mid/advanced) that showcase each
orchestration pattern's unique value. Clickable chips below the prompt
textarea auto-fill on click with green flash feedback. Prompts swap
dynamically when switching modes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Security logging to /var/log/llm-team-security.log for fail2ban
- Email alerts for security events via SMTP
- Exploit pattern detection (scanner probes, SQL injection, path traversal)
- Use X-Real-IP header for accurate client IP behind nginx
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- brain-backup: daily borg + pg_dump, 7d/4w/3m retention, cron at 3AM
- brain-triage: full system health check (services, ports, firewall,
headers, kernel, app, DB, disk, backups, security scan)
- brain-recover: restore from backup (full/db/configs/app) + emergency
lockdown mode that blocks all external access except LAN SSH
All accessible via /usr/local/bin/brain-{backup,triage,recover}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Nginx configs with security headers (X-Frame-Options, CSP, etc.)
- fail2ban jails for nginx (botsearch, bad-request, forbidden)
- Kernel hardening via sysctl (rp_filter, no redirects, log martians)
- SSH hardening (no root, max 3 attempts, no X11)
- UFW rules export
- Idempotent setup.sh to restore all configs on fresh install
- Flask bound to 127.0.0.1 (nginx-only access)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Output panel renders first on mobile (CSS order swap)
- Prompt + Run button immediately below output
- Mode/config hidden behind "Mode: Brainstorm" collapsible toggle
- Tapping toggle expands full mode grid + model config
- Compact header nav with smaller text
- 3-column mode grid on mobile (was 4)
- Larger run button (16px font, 14px padding) for touch
- Full-width repipe modal and history panel on mobile
- Desktop layout unchanged (toggle hidden, collapse always open)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GoAccess installed and running as systemd service (goaccess.service)
- Real-time HTML report at /var/www/html/report.html
- /logs route serves GoAccess dashboard, protected by @admin_required
- "Logs" link added to admin panel nav (orange)
- Auto-starts on boot, reads nginx access.log
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Consistent nav across all pages (Team UI / Lab / Admin / Logout)
- Main header: separator between nav and auth actions, smaller text
- Login box: subtle purple glow shadow, wider card
- Demo banner: gradient background, bolder text, larger font
- Lab + Admin: matching nav with logout link
- Reduced visual clutter in main header
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Demo mode toggle: admin can enable public access without login
- Demo users can view/run everything but cannot modify admin settings
- Admin write routes (config saves, API keys) blocked for non-admins in demo
- IP allowlist: LAN (192.168.1.*) and localhost never rate-limited
- Admin panel gets Security tab: demo toggle, allowlist management
- Main UI shows "Demo ON" button (green) + top banner when active
- Demo mode state is in-memory, resets on restart (safe default)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>