55 Commits

Author SHA1 Message Date
root
7b9b7f6641 Add optimization history, reconnect, and duplicate prevention
History detail panel now shows optimization results:
- If a run has been optimized, shows results section with best score,
  original score, and link to view the winning variation
- Fetches full optimization history via GET /api/optimize-history/<id>
- Shows count of optimizations run and child variation count
- Button changes to "Re-Optimize" for already-optimized runs

Reconnect to active optimizations:
- If optimization is already running, returns job_id in error response
- Frontend detects this and reconnects to the SSE stream
- No more losing progress when navigating away and coming back
- Refactored startOptimize() into startOptimize() + _showOptimizeStream()

New endpoint: GET /api/optimize-history/<run_id>
- Returns all pipeline_runs where pipeline='optimize' for that parent
- Returns all child team_runs created by optimization
- Includes scores, strategies, rankings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:20:01 -05:00
root
bc2ad7c1a9 Fix Lab UX: visual selection, auto-navigate, live status, stuck detection
Lab experiment selection:
- Selected experiment now highlighted with accent border + glow
- Clicking auto-navigates to relevant tab (config if idle, monitor if running)
- No more silent toast-only feedback

Live status display:
- SSE "status" events now rendered in monitor (were silently dropped before)
- Shows real-time: "Proposing change... (trial 3/50)" during execution
- Error messages displayed inline instead of just toast

Stuck experiment fix:
- On app startup, reset all "running" experiments to "paused"
- Prevents ghost "running" status after service restart
- Fixed experiments 2, 3, 4 that showed running but had dead threads

Trial cap fix:
- Changed from lifetime cap (trial_num < 50) to per-run cap (trials_this_run < 50)
- Prevents runaway experiments like #1 that accumulated 3762 trials
- Shows trial progress in status: "trial 3/50"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:14:12 -05:00
root
3b4fa449f1 Add Auto-Optimize: AI agent for history-driven prompt improvement
When viewing any past run in History, click "Optimize" to trigger an
automated workflow that:

1. Analyzes the original prompt + responses + score
2. Identifies improvement strategies (clarity, depth, specificity, etc.)
3. Generates 3-5 improved prompt variations
4. Tests each variation across original mode + brainstorm
5. Auto-scores all results via background judge
6. Ranks results and highlights the winner
7. "Use This" button loads winning prompt into composer

Architecture:
- _run_optimize(job_id, run_id): background thread, 5-phase engine
- POST /api/runs/<id>/optimize: starts optimization job
- GET /api/optimize/<job_id>/stream: SSE for live progress
- Budget-capped at 15 model calls per optimization
- Child runs saved as real team_runs (source: "optimize")
- Auto-scored → feeds into analytics + routing table automatically
- Results saved to pipeline_runs (pipeline: "optimize")

Frontend:
- "Optimize" button in history detail panel (accent-colored)
- startOptimize(runId): replaces detail view with live optimization stream
- Phase cards: Analysis → Variations → Testing → Ranked Results
- Score bars with color coding (green/amber/red)
- Winner row highlighted with star + "Use This" button

Closes the learning loop: system studies its own history → generates
better prompts → tests them → scores results → routing table improves.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:03:27 -05:00
root
8ad221b41f Add self-improving pipeline: auto-scoring, analytics, reactive refine, routing intelligence
Phase 1 — Run Quality Scoring:
- Auto-score every run in background via qwen2.5 judge (1-10)
- Thumbs up/down vote buttons on output cards
- POST /api/runs/<id>/score for user feedback
- run_saved SSE event enables vote buttons after run completes
- User votes override auto-scores (race-condition safe)
- DB: quality_score, score_method, score_metadata on team_runs

Phase 1 — Analytics Dashboard:
- GET /api/admin/analytics: score-by-mode, score-by-model, heatmap, trend
- New Analytics tab on Admin page with bar charts, heatmap table, trend sparkline
- Scoring coverage tracker (scored vs total runs)
- Model × Mode heatmap with color-coded cells

Phase 2 — Reactive Pipeline:
- _assess_stage(): orchestrator evaluates each stage's output mid-run
- _reactive_decide(): can insert/skip stages based on assessment
- Dynamic stage loop replaces fixed iteration in run_refine()
- Budget tracking prevents infinite loops (max_stages hard cap)
- Reactive decisions render as dashed notification bars between cards
- Pipeline adjusts in real-time: "Inserting VALIDATE — high severity gaps found"

Phase 3 — Cross-Run Learning:
- _build_routing_table(): queries historical scores for model×mode performance
- Best stage sequences per content_type from pipeline_runs
- Routing table cached with 30-min TTL
- Auto-Refine strategist prompt augmented with historical data
- GET /api/suggest-models?mode=X returns top 3 models for that mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:18:32 -05:00
root
c2cc211f21 Expand sample prompts to 5 per tier across all 21 modes (315 total)
Each mode now has {basic: [...], mid: [...], advanced: [...]} with 5
prompts per difficulty level. Renderer picks one random prompt from
each tier on every mode switch, so users see fresh examples each time.

315 hand-crafted prompts designed to highlight each mode's strengths:
- brainstorm: creative problem-solving at increasing scale
- pipeline: multi-step transformations from simple to complex
- debate: ethical dilemmas with escalating nuance
- validator: common myths to complex historical misconceptions
- roundrobin: writing tasks that benefit from iterative refinement
- redteam: security vulnerabilities from obvious to systemic
- consensus: opinion questions from clear to deeply contested
- codereview: coding tasks from functions to distributed systems
- ladder: concepts that scale from kindergarten to PhD
- tournament: creative competitions from one-liners to algorithms
- evolution: optimization targets from names to city infrastructure
- blindassembly: decomposable projects from explanations to systems
- staircase: progressive constraints from party planning to treaties
- drift: factual claims from simple dates to complex event sequences
- mesh: stakeholder analysis from office policies to life-or-death
- hallucination: fact-checkable claims from simple to obscure
- timeloop: cascading failures from restaurants to civilization
- research: deep dives from single topics to geopolitical analysis
- eval: benchmark prompts from trivia to formal proofs
- extract: structured extraction from sentences to legal documents
- refine: documents from product blurbs to architecture specs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 05:22:35 -05:00
root
0d09bb5293 Add Auto-Refine mode, composer UX, select dropdown fixes
Auto-Refine mode (21st mode):
- AI strategist analyzes content type and quality
- Selects 3-5 optimal refinement stages from 8 available
  (validate, critique, expand, structure, stakeholder, clarity,
  edge_cases, align)
- Executes stages sequentially with output chaining
- Final synthesis produces polished version
- Stages are content-aware — PRD gets different pipeline than essay
- Saved to pipeline_runs DB

Composer UX overhaul:
- Initial state: full-screen centered composer overlay
- Mode grid + models + prompt front-and-center for new users
- On Run: composer closes, output takes full screen width
- "New Prompt" button in header nav bar (not floating)
- Close button (×) on composer overlay
- Works across all 4 themes + mobile

Dropdown fixes:
- Dark theme: select options get solid #1a1d23 bg
- Modern theme: select options get solid #18181b bg
- Light/Reddit: select options get white bg with dark text
- Native <option> elements now readable in all themes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 05:12:35 -05:00
root
713f18a65f Add 4-theme system, fix enrichment panel layout, enable Docker on boot
Theme system (Dark/Light/Reddit/Modern):
- Injectable CSS/JS via after_request — zero template changes
- Dark: original gold accent on black
- Light: warm off-white with indigo accent, readable buttons
- Reddit: bluish-gray bg, orange accent, pill buttons, 8px corners
- Modern: glassmorphism dark, blue accent, frosted cards, 16px corners
- Toggle cycles all 4 themes, persists via localStorage
- Button injected into every page header automatically

Enrichment panel fix:
- threat-card changed from display:flex to display:grid
- enrich-panel now spans full width via grid-column:1/-1
- Added .enrich-section/.enrich-title/.enrich-grid CSS classes
- Sections (Geo, Deep Scan, AI) visually separated with dividers

Iterate/repipe modal themed for all modes:
- Light themes get white modal bg, proper contrast
- Reddit gets rounded corners + orange accent
- Modern gets glassmorphism modal with blue glow

Scrollbar styling across all themes:
- Rounded, properly sized (6-8px), theme-colored thumbs
- macOS-style inset look via background-clip

Layout improvements:
- Output area min-height 400px, padding-bottom 40px
- Empty state centered with more breathing room
- Docker + containerd enabled at boot for web-check survival

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 04:31:01 -05:00
root
411040f206 Fix IP banning: nginx deny list + connection kill for instant enforcement
fail2ban was using nftables action while UFW uses iptables-nft, so bans
were recorded but never enforced. Added three-layer ban enforcement:
1. nginx deny list (/etc/nginx/banned_ips.conf) for instant 403
2. ss -K to kill existing TCP connections on ban
3. Auto-sync nginx deny file on ban/unban (manual, mass, AI sentinel)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:05:49 -05:00
root
eea8ff46db Three-tier access: Off → Demo → Showcase
Off: login required for everything

Demo: public gets Team UI + run modes + admin page (browse only)
  Blocked: /logs, /admin/monitor, /history, threat intel APIs,
  sentinel, wall-of-shame, meta-pipelines, self-reports, vectors

Showcase: public gets full read-only access to ALL pages
  Allowed: admin, monitor, logs, threat intel, enrichment,
  lab, history, self-analysis, meta-pipelines
  Blocked: config changes, bans, deletes, bulk operations

Admin (logged in): full access to everything always

SHOWCASE_ONLY_ROUTES set defines which pages/APIs are
blocked in basic demo but allowed in showcase mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:29:39 -05:00
root
ffd5e43709 Fix demo/showcase toggle: separate buttons, distinct modes
Problem: plain toggle set showcase=true, so demo always became showcase.
No way to enable basic demo mode separately.

Fix:
- Three explicit buttons: [Demo] [Showcase] [Off]
- Demo mode: active=true, showcase=false (team UI only)
- Showcase mode: active=true, showcase=true (full read-only admin)
- Off: both false
- Plain toggle cycles demo on/off without touching showcase
- Clear status text shows which mode and what it means

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:26:48 -05:00
root
732f29d836 Fix showcase toggle: remove /api/demo/toggle from blocked POSTs
The demo toggle route was in DEMO_BLOCKED_POSTS, so once showcase
was enabled, the before_request handler blocked the toggle POST
even for admins (the before_request check ran before the route's
own admin check could verify the session).

Fix: removed /api/demo/toggle from blocked list. The route already
has its own admin-only check (line 460).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:24:45 -05:00
root
f0cf69b4bd Fix NameError: ADMIN_WRITE_ROUTES renamed to DEMO_BLOCKED_POSTS
before_request handler still referenced old variable name.
Updated to use DEMO_BLOCKED_POSTS with simpler path-in-set check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:23:01 -05:00
root
9f48a050c8 Showcase Mode: full read-only admin access for client demos
New mode: Showcase (replaces basic demo mode for client demos)
- Visitors see EVERYTHING: Admin, Monitor, Logs, Threat Intel,
  Lab, History, Meta-Pipelines — all without logging in
- Read-only: all GET requests allowed on all routes
- Allowed POSTs: team runs, self-analysis, IP enrichment
  (read-like operations that don't modify system config)
- Blocked POSTs: config changes, bans, deletes, bulk archive

Admin UI (Security tab):
- "Enable Showcase" button (magenta) — one click to activate
- "Turn Off" button appears when active
- Clear description of what visitors can and can't do
- Status shows "SHOWCASE MODE" with magenta styling

Banner:
- Magenta gradient banner on all pages when showcase is active
- Shows: "Showcase Mode — Full Read-Only Access — Admin · Monitor · Logs · Lab · History"
- Demo button in nav shows "Showcase" in magenta

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:19:41 -05:00
root
dfab02f114 Fix meta-pipeline detail panel collapsing on auto-refresh
Auto-refresh now skips when any detail panel is open (checks for
meta-detail-* elements). Panel stays stable while reading results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:07:47 -05:00
root
c9901dbc94 Meta-pipeline UI: add Stop/Restart/Results controls per pipeline
Each pipeline card now shows:
- Status dot + name + status tag + best score
- Stop button (red) when running
- Restart button (green) when stopped/completed
- Results button (magenta) to drill into iterations
- Live progress text when running
- Stages and iteration count on info line

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:06:24 -05:00
root
28df789745 Fix runaway experiments: cap at 50 trials, fix DB permissions
Bugs fixed:
- Ratchet loop had no trial cap — experiment #1 ran 3762 trials
  unchecked. Now capped at max_trials=50 per start cycle.
- meta_pipelines, meta_runs, self_reports tables had no GRANT
  for kbuser — fixed permissions for all tables and sequences.

All 4 running experiments auto-paused on restart.
Stress test confirms all tables accessible, all models responding,
meta-pipeline creation working, self-report save/retrieve working.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:56:37 -05:00
root
4dc561af12 Meta-Pipeline: self-improving multi-mode chains on real system data
Engine:
- Chains modes in sequence: extract → research → validate → debate → synthesize
- Each stage feeds its output to the next as input
- Runs same pipeline with different model sets (one model per iteration)
- Auto-scores final output using judge model (1-10)
- Keeps best result across all iterations
- All stage results + final outputs saved to meta_runs table

4 preset pipelines:
1. Security Deep Dive — security logs through 5-stage analysis
2. Run History Insights — team run data through 4-stage extraction
3. Threat Intel Enrichment — profiled IPs through 5-stage analysis
4. Cross-Report Synthesis — past self-reports through 4-stage debate

Database:
- meta_pipelines: name, source, stages, status, best_score, iterations
- meta_runs: per-iteration stage results, final output, score, models

API:
- POST /api/meta-pipeline — create pipeline from preset
- POST /api/meta-pipeline/:id/start — run in background
- POST /api/meta-pipeline/:id/stop — halt execution
- GET /api/meta-pipelines — list all with live status
- GET /api/meta-pipeline/:id — full detail with all iteration results

UI (Lab page):
- Magenta-bordered Meta-Pipeline card with 4 clickable presets
- Click preset → creates + auto-starts pipeline
- Pipeline list with live status dots, progress, scores
- Click pipeline → drill-down with per-iteration results
- Each stage expandable (click to show output)
- Best output highlighted in green border
- Auto-refreshes every 5 seconds during runs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:54:08 -05:00
root
804898b658 Auto-save self-analysis reports to DB with browsable history
Database:
- self_reports table: report_type, model, report text, data_size, timestamp
- Reports auto-saved on generation (no extra step needed)

API:
- GET /api/self-reports — list all past reports (id, type, model, size, date)
- GET /api/self-reports/:id — full report text

UI:
- "✓ Saved as report #N" indicator after generation
- "Past Reports (N)" section below self-analysis buttons
- Click any past report → expands inline (toggle on/off)
- Shows: type, model, timestamp for each saved report
- Reports persist across page refreshes and restarts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:49:17 -05:00
root
28e641f939 Self-Analysis: AI reports from system's own data + Lab experiments API
4 one-click self-analysis reports in Lab:
1. Threat Intelligence Report — security logs → attack taxonomy,
   attacker profiling, predictive analysis, recommendations
2. Model Performance Analysis — 96 team runs → usage patterns,
   model workload, response efficiency, optimization opportunities
3. Usage Analytics — nginx access logs → traffic patterns, feature
   usage, user journey mapping, UX recommendations
4. Security Posture Assessment — combined audit of security logs,
   sentinel verdicts, fail2ban, threat intel DB → risk rating

API: POST /api/self-analyze
- type: threat_intel|model_performance|access_patterns|security_posture
- model: which local model to use (default qwen2.5)
- Returns structured report from real system data

Lab UI:
- Green-bordered Self-Analysis card above experiment templates
- Click any report → runs analysis in background → result panel
  expands inline with full report (scrollable, closeable)
- Loading state shows "Analyzing..." during generation

Each report analyzes REAL data: actual security logs, actual run
history, actual nginx access patterns — not synthetic test data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:42:07 -05:00
root
ca660cbd10 Lab: add 3 experiment templates with auto-fill
Templates section below experiment list:

BASIC — Better Summaries (3 eval cases)
  Optimize summarization quality. Tests across biology, history,
  and technical content. Shows the simplest Lab workflow.

INTERMEDIATE — Code Explainer (4 eval cases)
  Find the best prompt+model to explain code to non-programmers.
  Tests loops, recursion, error handling, comprehensions.
  Shows how the ratchet evolves system prompts.

ADVANCED — Security Analyst Persona (5 eval cases)
  Evolve a cybersecurity AI across threat classification, executive
  summaries, developer education, incident response, and forensics.
  Tests multi-audience adaptation and domain expertise.

Click any template → auto-fills the create form with name, objective,
metric, all eval cases, and selects all available models. User can
modify before creating.

Each template card shows: level badge (green/amber/red), name,
eval case count, and a description explaining what the experiment
does and why it matters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:32:39 -05:00
root
f34e05168b Retheme Lab page: retro-brutalist matching all other pages
- Full theme swap: amber accents, JetBrains Mono, 2px borders
- Animated dot-grid background + scanlines
- Backdrop-filter blur on cards
- Status pills: square with borders (was rounded)
- Model chips: square with 2px borders
- Chart wraps: dark background with 2px borders
- Trial items: monospace numbers and scores
- Best config box: monospace with green border
- Nav bar with links to Team, History, Admin, Logs
- Toast: monospace with fade-out animation
- Config textarea: monospace font with dark background
- Responsive: tabs compact on mobile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:28:29 -05:00
root
efa547bb68 Full history page with tags, notes, vector API, and bulk ops
New /history page (replaces slide-out panel):
- Full-page data table: ID, Mode, Prompt, Models, Tags, Date
- Active/Archived/All view toggle
- Filter by mode, tag, or search text
- Checkbox select for bulk archive/restore
- Click any row → detail panel with full responses

Per-run detail:
- Inline tag editor: add tags (Enter), remove tags (click ✕)
- Notes textarea with auto-save (1s debounce)
- Archive/Restore/Delete buttons
- Collapsible response cards (click header to expand)

Database:
- tags TEXT[] column with GIN index for fast tag queries
- notes TEXT column for freeform annotations

APIs:
- POST /api/runs/:id/tags — update tags and/or notes
- GET /api/runs/tags — list all unique tags in use
- GET /api/runs/vectors — structured text documents for AI/embedding
  Returns: mode, prompt, models, date, tags, notes + all response text
  Filters: ?mode=, ?tag=, ?limit=
  Each doc includes token estimate for embedding planning

Main UI: History button now links to /history page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:19:22 -05:00
root
aeab1f0194 Archive/restore history: soft-delete with toggle and bulk ops
Database:
- Added 'archived' boolean column to team_runs (indexed)
- Active runs filtered by archived=false by default

API:
- GET /api/runs?show=active|archived|all
- POST /api/runs/:id/archive — archive single run
- POST /api/runs/:id/restore — restore single run
- POST /api/runs/bulk-archive — archive/restore by IDs or date

History panel UI:
- Active/Archived toggle tabs at top
- Per-run Archive button (magenta) in detail view
- Per-run Restore button (green) in detail view for archived runs
- "Archive All" bulk button when viewing active runs
- "Restore All" bulk button when viewing archived runs
- Archived runs hidden from active view, accessible anytime

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:13:33 -05:00
root
7948089f04 Fix sentinel countdown: sync to actual scan schedule, not page load
- Sentinel thread sets next_scan_ts = time.time() + interval BEFORE sleeping
- API returns next_scan_in derived from real next_scan_ts, not estimated
- Frontend calculates server clock offset and counts down to the actual
  target timestamp — refresh shows the same remaining time, not a reset
- Shows ✓ in green when scan fires, resumes countdown on next poll

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:07:15 -05:00
root
357918013d Compact sentinel card: single-line with mini ring + collapsible verdicts
- Entire sentinel status fits in one header row now
- Mini 28px countdown ring (was 64px) inline with title
- Scans/bans counts inline as text, not grid boxes
- Verdicts collapsed by default — click to expand
- Card padding reduced (8px vs 14px)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:03:57 -05:00
root
3cdfc01835 Sentinel countdown ring timer with live stats
- SVG progress ring shows time until next scan (magenta arc)
- Countdown ticks every second: "245s → 244s → ... → scanning..."
- Ring fills as time progresses, resets on scan
- Turns green and shows "scanning..." when timer hits 0
- Stats grid: Scans count, AI Bans count, Last Run time, Interval
- Backend API returns elapsed_since_scan and next_scan_in

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:01:09 -05:00
root
418da99fa7 Wall of Shame: persistent threat intel database with drill-down table
Database:
- threat_intel table with full enrichment data per IP
- UPSERT on IP — re-enriching updates existing record
- Stores: geo, AI analysis, web-check results, indicators, raw JSON
- Indexed on IP (unique), threat_level, enriched_at

Auto-save:
- Every enrichment auto-saves to DB (step 5 in enrichment pipeline)
- "Saved to Wall of Shame database" indicator in enrichment panel
- No duplicate scans — re-enrich updates the existing record

Wall of Shame tab (/logs):
- Stats bar: Total Profiled, Critical, High, Proxies, Automated
- Sortable table: IP, Threat, Type, Summary, Country, Ports
- Click any row to expand full detail:
  ISP, Org, ASN, City, Proxy/Hosting flags, Confidence,
  Blocklist count, Pattern, Recommendation, Indicators
- All data persists across restarts — no re-scanning needed

API:
- /api/admin/wall-of-shame — list all enriched IPs with sorting/filtering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:52:34 -05:00
root
e7f12a6d93 Tighten AI security prompts — aggressive stance for private server
Enrichment AI prompt:
- Explicitly states this is a PRIVATE application
- Strict threat level rules: 10+ blocklists = always critical,
  exploit scans = always critical, SSH-only = suspicious
- Added "compromised_host" classification option
- Recommendation options: ban permanently, ban 24h, monitor, ignore

Sentinel batch prompt:
- "Err on the side of banning" directive
- .env.production/.env.local probing = targeted recon, instant ban
- When in doubt, BAN — private server has no public scanning excuse
- Tighter rules for automated UA detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:49:17 -05:00
root
3c4846d52c Expand web-check enrichment: traceroute, headers, status, full rendering
Now queries 6 web-check endpoints per IP:
- ports — open port scan
- dns — reverse DNS / PTR records
- block-lists — DNS blocklist check (AdGuard, CloudFlare, etc.)
- trace-route — full network path with per-hop latency
- headers — HTTP response headers (server, powered-by, etc.)
- status — HTTP status code and response time

Frontend rendering:
- Traceroute displayed as hop chips with latency: IP (45ms) → IP (56ms)
- HTTP status with response time
- Server headers inline
- Errors silently skipped (many endpoints fail on raw IPs)

AI analysis now includes:
- Blocklist count and names in prompt
- Traceroute hops in prompt for network path analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:46:43 -05:00
root
51ffd2b82c Fix enrichment: run web-check before AI analysis so data is available
Web-check (ports, DNS, blocklists) now runs as step 3, AI analysis
as step 4. AI prompt includes open ports and blocklist status for
richer threat verdicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:42:23 -05:00
root
e816e81820 Integrate web-check Docker for deep IP enrichment
Setup:
- lissy93/web-check running in Docker on port 3000
- Queries ports, DNS, and blocklist endpoints per IP

Enrichment now includes 4 layers:
1. Geolocation (ip-api.com) — country, ISP, proxy/hosting flags
2. Web-Check deep scan — open ports, DNS/PTR, blocklist status
3. Security log aggregation — all activity for that IP
4. AI analysis (qwen2.5) — gets ALL above data as context

Frontend rendering:
- Open ports displayed in red (security risk indicators)
- Blocklist status: "3/8 blocked (AdGuard, AdGuard Family, ...)"
- Reverse DNS (PTR records)
- All data feeds into AI analysis prompt for richer verdicts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:39:42 -05:00
root
472a5d0917 IP threat intel: sorting, mass ban, enrichment with geo + AI analysis
Sorting:
- Sort by: hits, threat level, recent activity, banned status
- Active sort button highlighted in amber

Mass operations:
- Checkbox per IP for multi-select
- "Ban Selected" / "Unban Selected" buttons with confirmation
- /api/admin/security/mass-ban endpoint handles batch operations
- Selection counter shows "N selected"

IP Enrichment (click "Enrich" button per IP):
- Geolocation via ip-api.com (country, city, ISP, org, AS number)
- Proxy/hosting/mobile detection flags (red for proxy/hosting)
- AI threat analysis via local qwen2.5:
  - Threat level, classification, confidence score
  - Attack pattern description
  - Specific indicators list
  - Automated detection flag
  - Actionable recommendation
- Enrichment panel expands inline below the IP card (toggle)

Per-IP drill-down:
- Expandable raw log lines per IP (click to show/hide)
- User agent listing with count
- First seen / last seen timestamps
- HTTP method breakdown (GET:5 POST:2)
- AI sentinel verdicts shown inline
- Jail information for banned IPs

Enhanced backend:
- Security API returns per-IP log lines, first_seen, methods, event_types
- AI verdicts attached to IP records
- Multiple UA detection (fingerprint: rotating scanner)
- Sort parameter support (?sort=threat|hits|recent|banned)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:24:32 -05:00
root
de4ca533dd AI Security Sentinel: local LLM scans logs every 5 minutes
Background thread runs qwen2.5 to analyze new security log entries:
- Aggregates new entries by IP since last scan
- Sends batch to local LLM with security analysis prompt
- LLM classifies each IP: threat level, action, attack type, reason
- Auto-bans IPs the AI recommends banning (via fail2ban)
- Logs all verdicts and bans to /var/log/llm-team-sentinel.log
- Logs AI bans to security log as AI_BAN events

API:
- /api/admin/sentinel — sentinel status, stats, recent verdicts

Threat Intel tab enhancement:
- Sentinel status card with magenta accent (distinct from threat cards)
- Shows: model, scan count, ban count, last run, interval
- Recent AI verdicts table: action, IP, attack type, reason
- Errors displayed inline

Security prompt tuning:
- Explicit rules for common attack patterns
- Low temperature (0.1) for consistent classification
- JSON-only response format for reliable parsing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:08:02 -05:00
root
f1bb2a92e7 Interactive threat intelligence dashboard with one-click ban
Security API:
- /api/admin/security — aggregates security log into per-IP threat intel
  (hit count, exploit scans, login fails, paths probed, threat level)
- /api/admin/security/ban — manual ban/unban via fail2ban
  (logs MANUAL_BAN/MANUAL_UNBAN to security log)

Threat Intel tab in /logs:
- Summary stats: Critical IPs, High Threat, Currently Banned
- Per-IP cards showing: threat level, hit count, scan count, paths probed
- Critical IPs have red border, high threat amber
- One-click "Ban 24h" button per IP (calls fail2ban-client banip)
- One-click "Unban" for currently banned IPs
- Banned IPs shown at reduced opacity
- LAN IPs (192.168.*) filtered out

fail2ban tuning:
- llm-team-exploit findtime: 600s → 3600s (catch slow scanners)
- llm-team-exploit maxretry: 3 → 2 (more aggressive)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:05:01 -05:00
root
21c8c2a3e5 Monitor: drill-down pipeline view with step timeline
Highlander pattern — one view at each level, clean transitions:

Level 1 - Run List:
- Active runs (live, with progress bars)
- Recent runs (in-memory session runs)
- History from DB (all saved runs, click to drill down)

Level 2 - Pipeline Detail (click any DB run):
- Breadcrumb nav: Monitor → mode #id
- Header card with mode, models, timestamp, full prompt
- Step timeline with dot indicators on a vertical line
- Each step shows: model, role tag, character count, token estimate
- Green dots for completed, red for errors

Level 3 - Response Text (click any step):
- Accordion expand/collapse on click
- Full response text in monospace scrollable container
- Smooth max-height transition

Architecture ready for Level 4 (future AI comparison):
- Responses are individually addressable by step index
- Role-based grouping visible in timeline
- Side-by-side view can be added per-step

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:58:34 -05:00
root
9af071df6c Retheme admin page, improve save feedback, add monitor nav link
Admin UI:
- Full retro-brutalist theme matching main UI
- JetBrains Mono headings, amber accent, 2px borders
- Animated dot-grid background + scanlines
- Square toggles (was rounded)
- Backdrop-filter blur on cards
- Nav bar with links to Team, Lab, Logs, Monitor

Save feedback:
- Every save now verifies the API response (checks d.ok)
- Toast shows what was saved: "ollama provider saved / Enabled"
- Toast shows details: "Cloud models saved / 3 models configured"
- Toast shows timeout details: "Timeouts saved / Global: 300s, 2 overrides"
- Failed saves show red toast with error message
- Toast fade-out animation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:44:58 -05:00
root
344e11f4b2 Replace GoAccess with built-in log viewer, clickable error links
New /logs page with 5 tabs:
- App Log (journalctl for llm-team-ui service)
- Run History (all completed runs with errors inline)
- Nginx Errors (with red highlighting)
- Nginx Access (with color-coded status codes)
- Security Log (fail2ban/exploit detection)

Features:
- Live text filter (grep-style)
- Configurable line limit (50-500)
- Auto-refresh every 10s
- Run history shows mode, user, duration, response count, errors
- Error lines highlighted red, warnings amber
- Status codes color-coded (2xx green, 3xx blue, 4xx amber, 5xx red)

Error linking:
- Stream errors in main UI link to /admin/monitor
- Error response cards have "View error details" link
- Error cards styled with red border and monospace body

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:35:17 -05:00
root
59379c624d Fix Ollama timeout: set num_ctx dynamically, truncate oversized prompts
Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted
to 2048 tokens. Research mode with 15 questions builds prompts that
exceed model context windows, causing Ollama to hang until the 300s
timeout.

Fix:
- Calculate num_ctx from prompt size + 1024 token response buffer
- Cap at model's actual context limit
- Truncate prompts that exceed context window minus 512 response tokens
- Uses smart_truncate() to preserve start + end of prompt
- Updated MODEL_CONTEXT map with accurate limits for all local models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:29:11 -05:00
root
1ac7a436e6 Add live metrics dashboard to progress panel
8 real-time metrics in the progress panel:
- Elapsed time (updates every 500ms)
- Models active/total (tracks unique models as they respond)
- Responses received (count)
- Estimated tokens (~chars/4)
- Data received (formatted KB)
- SSE events (total protocol events)
- Errors (turns red if > 0)
- Heartbeats (keepalive count)

Metrics update every 500ms during run. On completion, all
metric values turn green. Magenta/purple theme for metric
values, micro labels underneath.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:55:29 -05:00
root
c507ba1016 Progress bar: magenta→cyan gradient with green completion
- Border: magenta (#d946ef) with purple glow
- Fill: gradient from magenta → purple → cyan
- Shimmer animation sweeps across the fill
- Step indicators: magenta active pulse with glow
- Completed steps: magenta→green gradient
- Phase labels: bright green with gradient fade line
- Completion: green→cyan gradient with green glow
- 8px height track (was 6px) for better visibility
- All text in progress panel uses purple/pink tones
- Clearly distinct from the amber UI elements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:35:00 -05:00
root
9eaac813df Sticky progress bar, phase labels, auto-scroll
- Progress panel is now position:sticky at top of output — always visible
- Phase labels (─── scouting ───, ─── researching ───, etc.) appear
  between response cards when the pipeline role changes
- Auto-scroll to latest response card as they arrive
- Completion state shows response count and fades after 5s
- Clear previous errors: all 'input stream' errors were caused by
  service restarts during in-flight runs, not code bugs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:30:53 -05:00
root
c124b01681 Fix SSE stream reliability: threaded server, async keepalive, streaming responses
- Enable Flask threaded=True for concurrent request handling
- Refactor generate() to use producer-consumer queue pattern:
  - Runner executes in background thread, pushes events to queue
  - Heartbeat thread sends keepalive every 10s independently
  - Generator reads from queue — stream never goes silent
- Brainstorm mode: stream responses as they arrive (was waiting for all)
- Prevents nginx/browser timeout during long model queries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:27:42 -05:00
root
242dec7509 Add progress tracking, admin monitor, SSE keepalive, research hardening
Backend:
- Active run tracking with step/substep/error state
- SSE keepalive heartbeat every 15s to prevent nginx timeout
- Run log (last 100 completed runs with timing/errors)
- Research mode: per-question progress, context caps, graceful failures
- Hard cap on research questions (15), answer truncation (8K chars)

Frontend:
- Real progress bar with step segments, elapsed time, event counter
- Progress shimmer animation, step completion indicators
- Improved error display with timing context
- Green completion state with fade

Admin:
- /admin/monitor — live process dashboard
- Stats: active runs, completed, errors, avg duration
- Active run cards with live progress, substep detail, errors
- Recent run history with error traces
- Auto-polls every 3 seconds
- Full retro-brutalist theme matching main UI

Nginx:
- proxy_read_timeout 600s, proxy_send_timeout 600s
- proxy_buffering off for SSE streaming

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:22:36 -05:00
root
8cbc2bec84 Redesign UI: neo-brutalist retro-futuristic aesthetic
- New color palette: amber/gold accent, deep black backgrounds
- JetBrains Mono for headings, labels, and system text
- 2px borders, 2px border-radius (brutalist)
- Animated dot-grid background canvas with random scanline artifacts
- CRT scanline overlay + vignette effect
- Backdrop-filter blur on panels for glass depth
- Pulsing status dot, amber glow effects
- Login page: full retro treatment with sys-tag footer
- All functional elements preserved

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:09:40 -05:00
root
d651c52a59 Add sample prompt chips for all 20 modes
Three demo prompts per mode (basic/mid/advanced) that showcase each
orchestration pattern's unique value. Clickable chips below the prompt
textarea auto-fill on click with green flash feedback. Prompts swap
dynamically when switching modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:56:55 -05:00
root
a0ee901f66 Add security hardening: logging, email alerts, exploit detection
- Security logging to /var/log/llm-team-security.log for fail2ban
- Email alerts for security events via SMTP
- Exploit pattern detection (scanner probes, SQL injection, path traversal)
- Use X-Real-IP header for accurate client IP behind nginx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:46:25 -05:00
root
2bb910b72c Add triage, backup, and disaster recovery system
- brain-backup: daily borg + pg_dump, 7d/4w/3m retention, cron at 3AM
- brain-triage: full system health check (services, ports, firewall,
  headers, kernel, app, DB, disk, backups, security scan)
- brain-recover: restore from backup (full/db/configs/app) + emergency
  lockdown mode that blocks all external access except LAN SSH

All accessible via /usr/local/bin/brain-{backup,triage,recover}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 04:52:48 -05:00
root
6ea457d01d Add server security configs and setup script
- Nginx configs with security headers (X-Frame-Options, CSP, etc.)
- fail2ban jails for nginx (botsearch, bad-request, forbidden)
- Kernel hardening via sysctl (rp_filter, no redirects, log martians)
- SSH hardening (no root, max 3 attempts, no X11)
- UFW rules export
- Idempotent setup.sh to restore all configs on fresh install
- Flask bound to 127.0.0.1 (nginx-only access)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 04:47:54 -05:00
root
0d00ced622 Mobile-optimized layout: output-first, collapsible mode selector
- Output panel renders first on mobile (CSS order swap)
- Prompt + Run button immediately below output
- Mode/config hidden behind "Mode: Brainstorm" collapsible toggle
- Tapping toggle expands full mode grid + model config
- Compact header nav with smaller text
- 3-column mode grid on mobile (was 4)
- Larger run button (16px font, 14px padding) for touch
- Full-width repipe modal and history panel on mobile
- Desktop layout unchanged (toggle hidden, collapse always open)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 04:01:36 -05:00
root
e3207b9c8e Make /logs strictly admin-only, never accessible in demo mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:50:49 -05:00