Phase 1 — Run Quality Scoring:
- Auto-score every run in background via qwen2.5 judge (1-10)
- Thumbs up/down vote buttons on output cards
- POST /api/runs/<id>/score for user feedback
- run_saved SSE event enables vote buttons after run completes
- User votes override auto-scores (race-condition safe)
- DB: quality_score, score_method, score_metadata on team_runs
Phase 1 — Analytics Dashboard:
- GET /api/admin/analytics: score-by-mode, score-by-model, heatmap, trend
- New Analytics tab on Admin page with bar charts, heatmap table, trend sparkline
- Scoring coverage tracker (scored vs total runs)
- Model × Mode heatmap with color-coded cells
Phase 2 — Reactive Pipeline:
- _assess_stage(): orchestrator evaluates each stage's output mid-run
- _reactive_decide(): can insert/skip stages based on assessment
- Dynamic stage loop replaces fixed iteration in run_refine()
- Budget tracking prevents infinite loops (max_stages hard cap)
- Reactive decisions render as dashed notification bars between cards
- Pipeline adjusts in real-time: "Inserting VALIDATE — high severity gaps found"
Phase 3 — Cross-Run Learning:
- _build_routing_table(): queries historical scores for model×mode performance
- Best stage sequences per content_type from pipeline_runs
- Routing table cached with 30-min TTL
- Auto-Refine strategist prompt augmented with historical data
- GET /api/suggest-models?mode=X returns top 3 models for that mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each mode now has {basic: [...], mid: [...], advanced: [...]} with 5
prompts per difficulty level. Renderer picks one random prompt from
each tier on every mode switch, so users see fresh examples each time.
315 hand-crafted prompts designed to highlight each mode's strengths:
- brainstorm: creative problem-solving at increasing scale
- pipeline: multi-step transformations from simple to complex
- debate: ethical dilemmas with escalating nuance
- validator: common myths to complex historical misconceptions
- roundrobin: writing tasks that benefit from iterative refinement
- redteam: security vulnerabilities from obvious to systemic
- consensus: opinion questions from clear to deeply contested
- codereview: coding tasks from functions to distributed systems
- ladder: concepts that scale from kindergarten to PhD
- tournament: creative competitions from one-liners to algorithms
- evolution: optimization targets from names to city infrastructure
- blindassembly: decomposable projects from explanations to systems
- staircase: progressive constraints from party planning to treaties
- drift: factual claims from simple dates to complex event sequences
- mesh: stakeholder analysis from office policies to life-or-death
- hallucination: fact-checkable claims from simple to obscure
- timeloop: cascading failures from restaurants to civilization
- research: deep dives from single topics to geopolitical analysis
- eval: benchmark prompts from trivia to formal proofs
- extract: structured extraction from sentences to legal documents
- refine: documents from product blurbs to architecture specs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-Refine mode (21st mode):
- AI strategist analyzes content type and quality
- Selects 3-5 optimal refinement stages from 8 available
(validate, critique, expand, structure, stakeholder, clarity,
edge_cases, align)
- Executes stages sequentially with output chaining
- Final synthesis produces polished version
- Stages are content-aware — PRD gets different pipeline than essay
- Saved to pipeline_runs DB
Composer UX overhaul:
- Initial state: full-screen centered composer overlay
- Mode grid + models + prompt front-and-center for new users
- On Run: composer closes, output takes full screen width
- "New Prompt" button in header nav bar (not floating)
- Close button (×) on composer overlay
- Works across all 4 themes + mobile
Dropdown fixes:
- Dark theme: select options get solid #1a1d23 bg
- Modern theme: select options get solid #18181b bg
- Light/Reddit: select options get white bg with dark text
- Native <option> elements now readable in all themes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Theme system (Dark/Light/Reddit/Modern):
- Injectable CSS/JS via after_request — zero template changes
- Dark: original gold accent on black
- Light: warm off-white with indigo accent, readable buttons
- Reddit: bluish-gray bg, orange accent, pill buttons, 8px corners
- Modern: glassmorphism dark, blue accent, frosted cards, 16px corners
- Toggle cycles all 4 themes, persists via localStorage
- Button injected into every page header automatically
Enrichment panel fix:
- threat-card changed from display:flex to display:grid
- enrich-panel now spans full width via grid-column:1/-1
- Added .enrich-section/.enrich-title/.enrich-grid CSS classes
- Sections (Geo, Deep Scan, AI) visually separated with dividers
Iterate/repipe modal themed for all modes:
- Light themes get white modal bg, proper contrast
- Reddit gets rounded corners + orange accent
- Modern gets glassmorphism modal with blue glow
Scrollbar styling across all themes:
- Rounded, properly sized (6-8px), theme-colored thumbs
- macOS-style inset look via background-clip
Layout improvements:
- Output area min-height 400px, padding-bottom 40px
- Empty state centered with more breathing room
- Docker + containerd enabled at boot for web-check survival
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fail2ban was using nftables action while UFW uses iptables-nft, so bans
were recorded but never enforced. Added three-layer ban enforcement:
1. nginx deny list (/etc/nginx/banned_ips.conf) for instant 403
2. ss -K to kill existing TCP connections on ban
3. Auto-sync nginx deny file on ban/unban (manual, mass, AI sentinel)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Off: login required for everything
Demo: public gets Team UI + run modes + admin page (browse only)
Blocked: /logs, /admin/monitor, /history, threat intel APIs,
sentinel, wall-of-shame, meta-pipelines, self-reports, vectors
Showcase: public gets full read-only access to ALL pages
Allowed: admin, monitor, logs, threat intel, enrichment,
lab, history, self-analysis, meta-pipelines
Blocked: config changes, bans, deletes, bulk operations
Admin (logged in): full access to everything always
SHOWCASE_ONLY_ROUTES set defines which pages/APIs are
blocked in basic demo but allowed in showcase mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Problem: plain toggle set showcase=true, so demo always became showcase.
No way to enable basic demo mode separately.
Fix:
- Three explicit buttons: [Demo] [Showcase] [Off]
- Demo mode: active=true, showcase=false (team UI only)
- Showcase mode: active=true, showcase=true (full read-only admin)
- Off: both false
- Plain toggle cycles demo on/off without touching showcase
- Clear status text shows which mode and what it means
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The demo toggle route was in DEMO_BLOCKED_POSTS, so once showcase
was enabled, the before_request handler blocked the toggle POST
even for admins (the before_request check ran before the route's
own admin check could verify the session).
Fix: removed /api/demo/toggle from blocked list. The route already
has its own admin-only check (line 460).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
before_request handler still referenced old variable name.
Updated to use DEMO_BLOCKED_POSTS with simpler path-in-set check.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New mode: Showcase (replaces basic demo mode for client demos)
- Visitors see EVERYTHING: Admin, Monitor, Logs, Threat Intel,
Lab, History, Meta-Pipelines — all without logging in
- Read-only: all GET requests allowed on all routes
- Allowed POSTs: team runs, self-analysis, IP enrichment
(read-like operations that don't modify system config)
- Blocked POSTs: config changes, bans, deletes, bulk archive
Admin UI (Security tab):
- "Enable Showcase" button (magenta) — one click to activate
- "Turn Off" button appears when active
- Clear description of what visitors can and can't do
- Status shows "SHOWCASE MODE" with magenta styling
Banner:
- Magenta gradient banner on all pages when showcase is active
- Shows: "Showcase Mode — Full Read-Only Access — Admin · Monitor · Logs · Lab · History"
- Demo button in nav shows "Showcase" in magenta
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-refresh now skips when any detail panel is open (checks for
meta-detail-* elements). Panel stays stable while reading results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each pipeline card now shows:
- Status dot + name + status tag + best score
- Stop button (red) when running
- Restart button (green) when stopped/completed
- Results button (magenta) to drill into iterations
- Live progress text when running
- Stages and iteration count on info line
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bugs fixed:
- Ratchet loop had no trial cap — experiment #1 ran 3762 trials
unchecked. Now capped at max_trials=50 per start cycle.
- meta_pipelines, meta_runs, self_reports tables had no GRANT
for kbuser — fixed permissions for all tables and sequences.
All 4 running experiments auto-paused on restart.
Stress test confirms all tables accessible, all models responding,
meta-pipeline creation working, self-report save/retrieve working.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Engine:
- Chains modes in sequence: extract → research → validate → debate → synthesize
- Each stage feeds its output to the next as input
- Runs same pipeline with different model sets (one model per iteration)
- Auto-scores final output using judge model (1-10)
- Keeps best result across all iterations
- All stage results + final outputs saved to meta_runs table
4 preset pipelines:
1. Security Deep Dive — security logs through 5-stage analysis
2. Run History Insights — team run data through 4-stage extraction
3. Threat Intel Enrichment — profiled IPs through 5-stage analysis
4. Cross-Report Synthesis — past self-reports through 4-stage debate
Database:
- meta_pipelines: name, source, stages, status, best_score, iterations
- meta_runs: per-iteration stage results, final output, score, models
API:
- POST /api/meta-pipeline — create pipeline from preset
- POST /api/meta-pipeline/:id/start — run in background
- POST /api/meta-pipeline/:id/stop — halt execution
- GET /api/meta-pipelines — list all with live status
- GET /api/meta-pipeline/:id — full detail with all iteration results
UI (Lab page):
- Magenta-bordered Meta-Pipeline card with 4 clickable presets
- Click preset → creates + auto-starts pipeline
- Pipeline list with live status dots, progress, scores
- Click pipeline → drill-down with per-iteration results
- Each stage expandable (click to show output)
- Best output highlighted in green border
- Auto-refreshes every 5 seconds during runs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- self_reports table: report_type, model, report text, data_size, timestamp
- Reports auto-saved on generation (no extra step needed)
API:
- GET /api/self-reports — list all past reports (id, type, model, size, date)
- GET /api/self-reports/:id — full report text
UI:
- "✓ Saved as report #N" indicator after generation
- "Past Reports (N)" section below self-analysis buttons
- Click any past report → expands inline (toggle on/off)
- Shows: type, model, timestamp for each saved report
- Reports persist across page refreshes and restarts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 one-click self-analysis reports in Lab:
1. Threat Intelligence Report — security logs → attack taxonomy,
attacker profiling, predictive analysis, recommendations
2. Model Performance Analysis — 96 team runs → usage patterns,
model workload, response efficiency, optimization opportunities
3. Usage Analytics — nginx access logs → traffic patterns, feature
usage, user journey mapping, UX recommendations
4. Security Posture Assessment — combined audit of security logs,
sentinel verdicts, fail2ban, threat intel DB → risk rating
API: POST /api/self-analyze
- type: threat_intel|model_performance|access_patterns|security_posture
- model: which local model to use (default qwen2.5)
- Returns structured report from real system data
Lab UI:
- Green-bordered Self-Analysis card above experiment templates
- Click any report → runs analysis in background → result panel
expands inline with full report (scrollable, closeable)
- Loading state shows "Analyzing..." during generation
Each report analyzes REAL data: actual security logs, actual run
history, actual nginx access patterns — not synthetic test data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Templates section below experiment list:
BASIC — Better Summaries (3 eval cases)
Optimize summarization quality. Tests across biology, history,
and technical content. Shows the simplest Lab workflow.
INTERMEDIATE — Code Explainer (4 eval cases)
Find the best prompt+model to explain code to non-programmers.
Tests loops, recursion, error handling, comprehensions.
Shows how the ratchet evolves system prompts.
ADVANCED — Security Analyst Persona (5 eval cases)
Evolve a cybersecurity AI across threat classification, executive
summaries, developer education, incident response, and forensics.
Tests multi-audience adaptation and domain expertise.
Click any template → auto-fills the create form with name, objective,
metric, all eval cases, and selects all available models. User can
modify before creating.
Each template card shows: level badge (green/amber/red), name,
eval case count, and a description explaining what the experiment
does and why it matters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Full theme swap: amber accents, JetBrains Mono, 2px borders
- Animated dot-grid background + scanlines
- Backdrop-filter blur on cards
- Status pills: square with borders (was rounded)
- Model chips: square with 2px borders
- Chart wraps: dark background with 2px borders
- Trial items: monospace numbers and scores
- Best config box: monospace with green border
- Nav bar with links to Team, History, Admin, Logs
- Toast: monospace with fade-out animation
- Config textarea: monospace font with dark background
- Responsive: tabs compact on mobile
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /history page (replaces slide-out panel):
- Full-page data table: ID, Mode, Prompt, Models, Tags, Date
- Active/Archived/All view toggle
- Filter by mode, tag, or search text
- Checkbox select for bulk archive/restore
- Click any row → detail panel with full responses
Per-run detail:
- Inline tag editor: add tags (Enter), remove tags (click ✕)
- Notes textarea with auto-save (1s debounce)
- Archive/Restore/Delete buttons
- Collapsible response cards (click header to expand)
Database:
- tags TEXT[] column with GIN index for fast tag queries
- notes TEXT column for freeform annotations
APIs:
- POST /api/runs/:id/tags — update tags and/or notes
- GET /api/runs/tags — list all unique tags in use
- GET /api/runs/vectors — structured text documents for AI/embedding
Returns: mode, prompt, models, date, tags, notes + all response text
Filters: ?mode=, ?tag=, ?limit=
Each doc includes token estimate for embedding planning
Main UI: History button now links to /history page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- Added 'archived' boolean column to team_runs (indexed)
- Active runs filtered by archived=false by default
API:
- GET /api/runs?show=active|archived|all
- POST /api/runs/:id/archive — archive single run
- POST /api/runs/:id/restore — restore single run
- POST /api/runs/bulk-archive — archive/restore by IDs or date
History panel UI:
- Active/Archived toggle tabs at top
- Per-run Archive button (magenta) in detail view
- Per-run Restore button (green) in detail view for archived runs
- "Archive All" bulk button when viewing active runs
- "Restore All" bulk button when viewing archived runs
- Archived runs hidden from active view, accessible anytime
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sentinel thread sets next_scan_ts = time.time() + interval BEFORE sleeping
- API returns next_scan_in derived from real next_scan_ts, not estimated
- Frontend calculates server clock offset and counts down to the actual
target timestamp — refresh shows the same remaining time, not a reset
- Shows ✓ in green when scan fires, resumes countdown on next poll
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Entire sentinel status fits in one header row now
- Mini 28px countdown ring (was 64px) inline with title
- Scans/bans counts inline as text, not grid boxes
- Verdicts collapsed by default — click to expand
- Card padding reduced (8px vs 14px)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SVG progress ring shows time until next scan (magenta arc)
- Countdown ticks every second: "245s → 244s → ... → scanning..."
- Ring fills as time progresses, resets on scan
- Turns green and shows "scanning..." when timer hits 0
- Stats grid: Scans count, AI Bans count, Last Run time, Interval
- Backend API returns elapsed_since_scan and next_scan_in
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- threat_intel table with full enrichment data per IP
- UPSERT on IP — re-enriching updates existing record
- Stores: geo, AI analysis, web-check results, indicators, raw JSON
- Indexed on IP (unique), threat_level, enriched_at
Auto-save:
- Every enrichment auto-saves to DB (step 5 in enrichment pipeline)
- "Saved to Wall of Shame database" indicator in enrichment panel
- No duplicate scans — re-enrich updates the existing record
Wall of Shame tab (/logs):
- Stats bar: Total Profiled, Critical, High, Proxies, Automated
- Sortable table: IP, Threat, Type, Summary, Country, Ports
- Click any row to expand full detail:
ISP, Org, ASN, City, Proxy/Hosting flags, Confidence,
Blocklist count, Pattern, Recommendation, Indicators
- All data persists across restarts — no re-scanning needed
API:
- /api/admin/wall-of-shame — list all enriched IPs with sorting/filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enrichment AI prompt:
- Explicitly states this is a PRIVATE application
- Strict threat level rules: 10+ blocklists = always critical,
exploit scans = always critical, SSH-only = suspicious
- Added "compromised_host" classification option
- Recommendation options: ban permanently, ban 24h, monitor, ignore
Sentinel batch prompt:
- "Err on the side of banning" directive
- .env.production/.env.local probing = targeted recon, instant ban
- When in doubt, BAN — private server has no public scanning excuse
- Tighter rules for automated UA detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now queries 6 web-check endpoints per IP:
- ports — open port scan
- dns — reverse DNS / PTR records
- block-lists — DNS blocklist check (AdGuard, CloudFlare, etc.)
- trace-route — full network path with per-hop latency
- headers — HTTP response headers (server, powered-by, etc.)
- status — HTTP status code and response time
Frontend rendering:
- Traceroute displayed as hop chips with latency: IP (45ms) → IP (56ms)
- HTTP status with response time
- Server headers inline
- Errors silently skipped (many endpoints fail on raw IPs)
AI analysis now includes:
- Blocklist count and names in prompt
- Traceroute hops in prompt for network path analysis
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Web-check (ports, DNS, blocklists) now runs as step 3, AI analysis
as step 4. AI prompt includes open ports and blocklist status for
richer threat verdicts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup:
- lissy93/web-check running in Docker on port 3000
- Queries ports, DNS, and blocklist endpoints per IP
Enrichment now includes 4 layers:
1. Geolocation (ip-api.com) — country, ISP, proxy/hosting flags
2. Web-Check deep scan — open ports, DNS/PTR, blocklist status
3. Security log aggregation — all activity for that IP
4. AI analysis (qwen2.5) — gets ALL above data as context
Frontend rendering:
- Open ports displayed in red (security risk indicators)
- Blocklist status: "3/8 blocked (AdGuard, AdGuard Family, ...)"
- Reverse DNS (PTR records)
- All data feeds into AI analysis prompt for richer verdicts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sorting:
- Sort by: hits, threat level, recent activity, banned status
- Active sort button highlighted in amber
Mass operations:
- Checkbox per IP for multi-select
- "Ban Selected" / "Unban Selected" buttons with confirmation
- /api/admin/security/mass-ban endpoint handles batch operations
- Selection counter shows "N selected"
IP Enrichment (click "Enrich" button per IP):
- Geolocation via ip-api.com (country, city, ISP, org, AS number)
- Proxy/hosting/mobile detection flags (red for proxy/hosting)
- AI threat analysis via local qwen2.5:
- Threat level, classification, confidence score
- Attack pattern description
- Specific indicators list
- Automated detection flag
- Actionable recommendation
- Enrichment panel expands inline below the IP card (toggle)
Per-IP drill-down:
- Expandable raw log lines per IP (click to show/hide)
- User agent listing with count
- First seen / last seen timestamps
- HTTP method breakdown (GET:5 POST:2)
- AI sentinel verdicts shown inline
- Jail information for banned IPs
Enhanced backend:
- Security API returns per-IP log lines, first_seen, methods, event_types
- AI verdicts attached to IP records
- Multiple UA detection (fingerprint: rotating scanner)
- Sort parameter support (?sort=threat|hits|recent|banned)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background thread runs qwen2.5 to analyze new security log entries:
- Aggregates new entries by IP since last scan
- Sends batch to local LLM with security analysis prompt
- LLM classifies each IP: threat level, action, attack type, reason
- Auto-bans IPs the AI recommends banning (via fail2ban)
- Logs all verdicts and bans to /var/log/llm-team-sentinel.log
- Logs AI bans to security log as AI_BAN events
API:
- /api/admin/sentinel — sentinel status, stats, recent verdicts
Threat Intel tab enhancement:
- Sentinel status card with magenta accent (distinct from threat cards)
- Shows: model, scan count, ban count, last run, interval
- Recent AI verdicts table: action, IP, attack type, reason
- Errors displayed inline
Security prompt tuning:
- Explicit rules for common attack patterns
- Low temperature (0.1) for consistent classification
- JSON-only response format for reliable parsing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Highlander pattern — one view at each level, clean transitions:
Level 1 - Run List:
- Active runs (live, with progress bars)
- Recent runs (in-memory session runs)
- History from DB (all saved runs, click to drill down)
Level 2 - Pipeline Detail (click any DB run):
- Breadcrumb nav: Monitor → mode #id
- Header card with mode, models, timestamp, full prompt
- Step timeline with dot indicators on a vertical line
- Each step shows: model, role tag, character count, token estimate
- Green dots for completed, red for errors
Level 3 - Response Text (click any step):
- Accordion expand/collapse on click
- Full response text in monospace scrollable container
- Smooth max-height transition
Architecture ready for Level 4 (future AI comparison):
- Responses are individually addressable by step index
- Role-based grouping visible in timeline
- Side-by-side view can be added per-step
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /logs page with 5 tabs:
- App Log (journalctl for llm-team-ui service)
- Run History (all completed runs with errors inline)
- Nginx Errors (with red highlighting)
- Nginx Access (with color-coded status codes)
- Security Log (fail2ban/exploit detection)
Features:
- Live text filter (grep-style)
- Configurable line limit (50-500)
- Auto-refresh every 10s
- Run history shows mode, user, duration, response count, errors
- Error lines highlighted red, warnings amber
- Status codes color-coded (2xx green, 3xx blue, 4xx amber, 5xx red)
Error linking:
- Stream errors in main UI link to /admin/monitor
- Error response cards have "View error details" link
- Error cards styled with red border and monospace body
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted
to 2048 tokens. Research mode with 15 questions builds prompts that
exceed model context windows, causing Ollama to hang until the 300s
timeout.
Fix:
- Calculate num_ctx from prompt size + 1024 token response buffer
- Cap at model's actual context limit
- Truncate prompts that exceed context window minus 512 response tokens
- Uses smart_truncate() to preserve start + end of prompt
- Updated MODEL_CONTEXT map with accurate limits for all local models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 real-time metrics in the progress panel:
- Elapsed time (updates every 500ms)
- Models active/total (tracks unique models as they respond)
- Responses received (count)
- Estimated tokens (~chars/4)
- Data received (formatted KB)
- SSE events (total protocol events)
- Errors (turns red if > 0)
- Heartbeats (keepalive count)
Metrics update every 500ms during run. On completion, all
metric values turn green. Magenta/purple theme for metric
values, micro labels underneath.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Border: magenta (#d946ef) with purple glow
- Fill: gradient from magenta → purple → cyan
- Shimmer animation sweeps across the fill
- Step indicators: magenta active pulse with glow
- Completed steps: magenta→green gradient
- Phase labels: bright green with gradient fade line
- Completion: green→cyan gradient with green glow
- 8px height track (was 6px) for better visibility
- All text in progress panel uses purple/pink tones
- Clearly distinct from the amber UI elements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Progress panel is now position:sticky at top of output — always visible
- Phase labels (─── scouting ───, ─── researching ───, etc.) appear
between response cards when the pipeline role changes
- Auto-scroll to latest response card as they arrive
- Completion state shows response count and fades after 5s
- Clear previous errors: all 'input stream' errors were caused by
service restarts during in-flight runs, not code bugs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Enable Flask threaded=True for concurrent request handling
- Refactor generate() to use producer-consumer queue pattern:
- Runner executes in background thread, pushes events to queue
- Heartbeat thread sends keepalive every 10s independently
- Generator reads from queue — stream never goes silent
- Brainstorm mode: stream responses as they arrive (was waiting for all)
- Prevents nginx/browser timeout during long model queries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend:
- Active run tracking with step/substep/error state
- SSE keepalive heartbeat every 15s to prevent nginx timeout
- Run log (last 100 completed runs with timing/errors)
- Research mode: per-question progress, context caps, graceful failures
- Hard cap on research questions (15), answer truncation (8K chars)
Frontend:
- Real progress bar with step segments, elapsed time, event counter
- Progress shimmer animation, step completion indicators
- Improved error display with timing context
- Green completion state with fade
Admin:
- /admin/monitor — live process dashboard
- Stats: active runs, completed, errors, avg duration
- Active run cards with live progress, substep detail, errors
- Recent run history with error traces
- Auto-polls every 3 seconds
- Full retro-brutalist theme matching main UI
Nginx:
- proxy_read_timeout 600s, proxy_send_timeout 600s
- proxy_buffering off for SSE streaming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New color palette: amber/gold accent, deep black backgrounds
- JetBrains Mono for headings, labels, and system text
- 2px borders, 2px border-radius (brutalist)
- Animated dot-grid background canvas with random scanline artifacts
- CRT scanline overlay + vignette effect
- Backdrop-filter blur on panels for glass depth
- Pulsing status dot, amber glow effects
- Login page: full retro treatment with sys-tag footer
- All functional elements preserved
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three demo prompts per mode (basic/mid/advanced) that showcase each
orchestration pattern's unique value. Clickable chips below the prompt
textarea auto-fill on click with green flash feedback. Prompts swap
dynamically when switching modes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Security logging to /var/log/llm-team-security.log for fail2ban
- Email alerts for security events via SMTP
- Exploit pattern detection (scanner probes, SQL injection, path traversal)
- Use X-Real-IP header for accurate client IP behind nginx
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- brain-backup: daily borg + pg_dump, 7d/4w/3m retention, cron at 3AM
- brain-triage: full system health check (services, ports, firewall,
headers, kernel, app, DB, disk, backups, security scan)
- brain-recover: restore from backup (full/db/configs/app) + emergency
lockdown mode that blocks all external access except LAN SSH
All accessible via /usr/local/bin/brain-{backup,triage,recover}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Nginx configs with security headers (X-Frame-Options, CSP, etc.)
- fail2ban jails for nginx (botsearch, bad-request, forbidden)
- Kernel hardening via sysctl (rp_filter, no redirects, log martians)
- SSH hardening (no root, max 3 attempts, no X11)
- UFW rules export
- Idempotent setup.sh to restore all configs on fresh install
- Flask bound to 127.0.0.1 (nginx-only access)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Output panel renders first on mobile (CSS order swap)
- Prompt + Run button immediately below output
- Mode/config hidden behind "Mode: Brainstorm" collapsible toggle
- Tapping toggle expands full mode grid + model config
- Compact header nav with smaller text
- 3-column mode grid on mobile (was 4)
- Larger run button (16px font, 14px padding) for touch
- Full-width repipe modal and history panel on mobile
- Desktop layout unchanged (toggle hidden, collapse always open)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GoAccess installed and running as systemd service (goaccess.service)
- Real-time HTML report at /var/www/html/report.html
- /logs route serves GoAccess dashboard, protected by @admin_required
- "Logs" link added to admin panel nav (orange)
- Auto-starts on boot, reads nginx access.log
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Consistent nav across all pages (Team UI / Lab / Admin / Logout)
- Main header: separator between nav and auth actions, smaller text
- Login box: subtle purple glow shadow, wider card
- Demo banner: gradient background, bolder text, larger font
- Lab + Admin: matching nav with logout link
- Reduced visual clutter in main header
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Demo mode toggle: admin can enable public access without login
- Demo users can view/run everything but cannot modify admin settings
- Admin write routes (config saves, API keys) blocked for non-admins in demo
- IP allowlist: LAN (192.168.1.*) and localhost never rate-limited
- Admin panel gets Security tab: demo toggle, allowlist management
- Main UI shows "Demo ON" button (green) + top banner when active
- Demo mode state is in-memory, resets on restart (safe default)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>