From c2cc211f2183d56fd0b775d254eac9b233200d31 Mon Sep 17 00:00:00 2001 From: root Date: Sun, 29 Mar 2026 05:22:35 -0500 Subject: [PATCH] Expand sample prompts to 5 per tier across all 21 modes (315 total) Each mode now has {basic: [...], mid: [...], advanced: [...]} with 5 prompts per difficulty level. Renderer picks one random prompt from each tier on every mode switch, so users see fresh examples each time. 315 hand-crafted prompts designed to highlight each mode's strengths: - brainstorm: creative problem-solving at increasing scale - pipeline: multi-step transformations from simple to complex - debate: ethical dilemmas with escalating nuance - validator: common myths to complex historical misconceptions - roundrobin: writing tasks that benefit from iterative refinement - redteam: security vulnerabilities from obvious to systemic - consensus: opinion questions from clear to deeply contested - codereview: coding tasks from functions to distributed systems - ladder: concepts that scale from kindergarten to PhD - tournament: creative competitions from one-liners to algorithms - evolution: optimization targets from names to city infrastructure - blindassembly: decomposable projects from explanations to systems - staircase: progressive constraints from party planning to treaties - drift: factual claims from simple dates to complex event sequences - mesh: stakeholder analysis from office policies to life-or-death - hallucination: fact-checkable claims from simple to obscure - timeloop: cascading failures from restaurants to civilization - research: deep dives from single topics to geopolitical analysis - eval: benchmark prompts from trivia to formal proofs - extract: structured extraction from sentences to legal documents - refine: documents from product blurbs to architecture specs Co-Authored-By: Claude Opus 4.6 (1M context) --- llm_team_ui.py | 486 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 395 insertions(+), 91 deletions(-) diff --git a/llm_team_ui.py b/llm_team_ui.py index 769df65..12595c5 100644 --- a/llm_team_ui.py +++ b/llm_team_ui.py @@ -2314,126 +2314,430 @@ const MODE_DESCS = { }; const SAMPLE_PROMPTS = { - brainstorm: [ + brainstorm: { basic: [ 'What are practical ways a small town could become energy independent within 10 years?', + 'How could a public library reinvent itself to stay relevant for the next 20 years?', + 'What are five creative ways to reduce food waste in a college dining hall?', + 'How can a neighborhood reduce package theft without cameras or confrontation?', + 'What are unconventional ways to make a long commute productive and enjoyable?' + ], mid: [ 'Design a mentorship program that pairs retired professionals with first-generation college students — cover matching criteria, structure, and how to measure success.', - 'A hospital wants to reduce ER wait times by 40% without hiring more staff. Propose a comprehensive strategy covering triage redesign, technology, patient flow, and communication.' - ], - pipeline: [ + 'A mid-size company is losing talent to remote-first competitors. Propose creative retention strategies beyond just salary increases.', + 'How could a city redesign its public spaces to be equally useful in a heat wave and a blizzard?', + 'Propose a system for a restaurant chain to reduce food waste by 50% while increasing customer satisfaction.', + 'Design a community program that helps elderly residents adopt smart home technology without frustration or privacy concerns.' + ], advanced: [ + 'A hospital wants to reduce ER wait times by 40% without hiring more staff. Propose a comprehensive strategy covering triage redesign, technology, patient flow, and communication.', + 'Design a universal basic services program for a city of 500K. Cover housing, transit, internet, and food — with funding model, phasing, and political feasibility.', + 'A developing nation wants to leapfrog traditional banking infrastructure. Design a complete financial inclusion strategy covering mobile money, identity, credit scoring, and regulation.', + 'Propose a system to coordinate disaster relief across 15 NGOs with overlapping mandates, different data systems, and competing donor priorities.', + 'Design an education system from scratch for a Mars colony of 10,000 people — consider demographics, resource constraints, knowledge preservation, and the 20-minute communication delay with Earth.' + ]}, + pipeline: { basic: [ 'Write a short fable about a fox who learns patience, then translate it to Spanish, then analyze the cultural differences in how the moral lands.', - 'Take this business idea — "AI-powered meal planning for people with multiple food allergies" — and first do market analysis, then write a pitch deck outline, then draft the cold email to investors.', - 'Research the history of cryptography, identify the 3 most pivotal breakthroughs, explain how each one would have changed the outcome of a specific historical conflict, then write a short alternate-history scenario for the most dramatic one.' - ], - debate: [ + 'Describe the water cycle for a 5th grader, then rewrite it as a poem, then turn the poem into a lesson plan with quiz questions.', + 'Explain how a car engine works, then simplify it for a 10-year-old, then create a quiz to test understanding.', + 'Write a product description for noise-canceling headphones, then rewrite it as a tweet, then as a haiku.', + 'Summarize World War I in 3 paragraphs, then extract the 5 key turning points, then write a "what if" scenario for the most impactful one.' + ], mid: [ + 'Take the concept of "digital minimalism" — first define it clearly, then argue for it, then argue against it, then write a balanced guide.', + 'Take this business idea — "AI-powered meal planning for food allergies" — do market analysis, then pitch deck outline, then cold email to investors.', + 'Write a technical blog post about WebSockets, then create a code tutorial, then write a FAQ for common issues.', + 'Analyze the pros and cons of remote work, then draft a company policy, then write the all-hands announcement email.', + 'Research the gig economy, identify the top 3 problems workers face, propose solutions, then draft legislation addressing them.' + ], advanced: [ + 'Research the history of cryptography, identify 3 pivotal breakthroughs, explain how each would change a historical conflict, then write an alternate-history scenario.', + 'Analyze a failing SaaS business. Diagnose the top 3 problems from the metrics, propose fixes, model the financial impact, then write the board presentation.', + 'Take a complex legal case — "should AI-generated art be copyrightable?" — research precedents, argue both sides, draft a proposed legal framework, then write the dissenting opinion.', + 'Analyze climate change data for a specific region, model economic impacts on agriculture, propose adaptation strategies, then write policy recommendations for local government.', + 'Study the decline of a specific industry, extract patterns, apply them to predict which current industries are vulnerable, then write an investment thesis.' + ]}, + debate: { basic: [ 'Should cities ban cars from downtown areas?', - 'Is it more ethical for AI companies to open-source their models or keep them proprietary? Consider safety, innovation, equity, and economic factors.', - 'A nation discovers a high-yield asteroid mining opportunity, but the mission would consume their entire science budget for 5 years, halting medical research, climate science, and education programs. Should they go?' - ], - validator: [ + 'Is remote work better for productivity than in-office work?', + 'Should tipping be abolished and replaced with higher wages?', + 'Are zoos ethical in the modern era?', + 'Should voting be mandatory?' + ], mid: [ + 'Should social media platforms be liable for content their algorithms promote?', + 'Is it more ethical for AI companies to open-source their models or keep them proprietary?', + 'Should universities eliminate legacy admissions?', + 'Is nuclear energy the most practical path to decarbonization, or are renewables sufficient?', + 'Should there be a maximum wage, like there is a minimum wage?' + ], advanced: [ + 'A nation discovers asteroid mining but it costs their entire science budget for 5 years. Should they go?', + 'Should we grant legal personhood to sufficiently advanced AI systems? Consider rights, liability, and precedent.', + 'Is it ethical to use CRISPR to eliminate genetic diseases if it inevitably leads to designer babies for the wealthy?', + 'Should democratic nations restrict trade with authoritarian regimes even when it harms their own economies and citizens?', + 'A city can save 200 lives/year with AI surveillance but at the cost of constant monitoring of all public spaces. Should they deploy it?' + ]}, + validator: { basic: [ 'The Great Wall of China is the only man-made structure visible from space.', - 'Exposure to cold weather causes colds, sugar causes hyperactivity in children, and we only use 10% of our brains. Also, lightning never strikes the same place twice and goldfish have a 3-second memory.', - 'The 2008 financial crisis was primarily caused by the Community Reinvestment Act forcing banks to give mortgages to unqualified buyers. Glass-Steagall repeal had minimal impact, and credit default swaps were a minor factor. The crisis was largely confined to the US housing market.' - ], - roundrobin: [ + 'Humans swallow an average of 8 spiders per year in their sleep.', + 'We only use 10% of our brains.', + 'Lightning never strikes the same place twice.', + 'Goldfish have a 3-second memory.' + ], mid: [ + 'Exposure to cold weather causes colds, and sugar causes hyperactivity in children.', + 'Napoleon was unusually short. Vikings wore horned helmets. Einstein failed math in school.', + 'The tongue has distinct taste zones — sweet at the tip, bitter at the back, sour on the sides.', + 'Organic food is always healthier and more nutritious than conventional food, and GMOs are dangerous to human health.', + 'Dropping a penny from the Empire State Building could kill someone, and hair and nails keep growing after death.' + ], advanced: [ + 'The 2008 financial crisis was primarily caused by the Community Reinvestment Act. Glass-Steagall repeal had minimal impact.', + 'The Stanford Prison Experiment proved that ordinary people become cruel when given authority. The Milgram experiment proved people blindly follow orders.', + 'Thomas Edison invented the lightbulb, Alexander Graham Bell invented the telephone, and Henry Ford invented the automobile.', + 'The human body completely replaces all its cells every 7 years. Antibiotics can treat the common cold. Detox diets remove toxins from your body.', + 'Columbus proved the Earth was round, the Dark Ages were a period of no scientific progress, and the Great Fire of London ended the plague.' + ]}, + roundrobin: { basic: [ 'Write an opening paragraph for a mystery novel set in a lighthouse.', - 'Draft a product requirements document for a mobile app that helps people split household chores fairly among roommates. Each iteration should add depth to a different section.', - 'Create a comprehensive disaster recovery plan for a mid-size SaaS company. Cover data backup, infrastructure failover, communication protocols, compliance requirements, and testing schedules.' - ], - redteam: [ - 'Here is our password policy: minimum 8 characters, must include a number. Find the weaknesses.', - 'Our startup plans to store user health data in a Firebase Realtime Database with client-side security rules. The mobile app sends JWT tokens directly from the client. Identify every attack vector.', - 'We are building an AI hiring tool that screens resumes, scores candidates 1-100, and auto-rejects below 60. It was trained on our last 5 years of successful hires. The system also parses social media for culture fit. Red team this for bias, legal risk, and adversarial attacks.' - ], - consensus: [ + 'Write a company mission statement for a sustainable fashion brand.', + 'Write a one-page resume summary for a career-changing software engineer.', + 'Draft a welcome email for new subscribers to a cooking newsletter.', + 'Write the About page for a small architecture firm.' + ], mid: [ + 'Draft a product requirements document for a chore-splitting app. Each iteration deepens a different section.', + 'Write a cover letter for a career-changer moving from teaching to product management.', + 'Draft a content strategy for a B2B startup blog. Each pass should improve a different element — topics, tone, SEO, calls to action.', + 'Write a project proposal for migrating a monolith to microservices. Each round addresses a new concern.', + 'Create a training curriculum for onboarding junior developers. Each iteration adds practical exercises and assessment criteria.' + ], advanced: [ + 'Create a comprehensive disaster recovery plan for a mid-size SaaS company. Cover backup, failover, comms, compliance, and testing.', + 'Draft a technical architecture document for a real-time collaboration tool like Google Docs. Each round should stress-test a different aspect.', + 'Write a regulatory compliance plan for a fintech startup handling payments across US, EU, and UK. Each round deepens a jurisdiction.', + 'Create a go-to-market strategy for an enterprise AI product. Each iteration should refine positioning, pricing, channel strategy, and competitive response.', + 'Draft an incident response playbook for a healthcare SaaS company. Each round adds depth to a different scenario — data breach, downtime, ransomware, insider threat.' + ]}, + redteam: { basic: [ + 'Our password policy: minimum 8 characters, must include a number. Find weaknesses.', + 'Our API authenticates users with a token in the URL query string. We log all URLs. What could go wrong?', + 'We store user passwords in a database column called "password" using MD5 hashing. Evaluate security.', + 'Our web app uses client-side JavaScript to check if a user is an admin before showing the admin panel.', + 'We send password reset links that never expire and include the user ID in plain text.' + ], mid: [ + 'Our startup stores health data in Firebase with client-side security rules. The app sends JWTs from the client.', + 'We built a bank chatbot that lets customers check balances and transfer money via natural language using customer names.', + 'Our SaaS allows users to upload profile pictures. We store them in a public S3 bucket and serve them via CloudFront. File names are sequential.', + 'Our internal tool uses a shared admin password stored in a .env file. All developers have access. It has never been rotated.', + 'We use a third-party JavaScript widget for payment processing that loads from their CDN. We also allow custom CSS injection for white-labeling.' + ], advanced: [ + 'We are building an AI hiring tool trained on 5 years of successful hires. It parses social media for culture fit and auto-rejects below score 60.', + 'Our healthcare platform uses AI to triage patient symptoms and recommend specialists. It stores conversations for model improvement. Red team for HIPAA, bias, and adversarial inputs.', + 'We built an AI content moderation system for a social platform. It auto-removes flagged content and temporarily bans repeat offenders. Find every way this can be weaponized.', + 'Our autonomous vehicle fleet shares real-time location data with a central server over cellular. Emergency stop commands are sent over the same channel. Red team the entire stack.', + 'We are deploying an AI-powered loan approval system that uses alternative data (social media, browsing history, app usage) alongside traditional credit scores. Red team for discrimination, gaming, and regulatory exposure.' + ]}, + consensus: { basic: [ 'What is the single most important skill for a new software developer to learn first?', - 'A company has $500K to invest in employee development. Should they spend it on individual training budgets, a company-wide mentorship program, sending teams to conferences, or building an internal learning platform?', - 'How should a democratic society balance free speech with protection from misinformation, considering platform responsibility, individual rights, government regulation, and algorithmic amplification?' - ], - codereview: [ + 'What is the best way to structure a 1-on-1 meeting between a manager and a direct report?', + 'What is the most effective way to learn a new programming language?', + 'What makes a good code review?', + 'What is the best format for a daily standup meeting?' + ], mid: [ + 'A company has $500K for employee development. Training budgets, mentorship, conferences, or learning platform?', + 'Should a startup prioritize speed to market or code quality in year one?', + 'What is the optimal team size for a software project and why?', + 'Should companies require return-to-office or stay fully remote? Find the convergence point.', + 'What is the best way to handle technical debt — dedicated sprints, boy scout rule, rewrite, or accept it?' + ], advanced: [ + 'How should a democratic society balance free speech with protection from misinformation?', + 'What is the right level of AI regulation — per-use-case rules, broad principles, industry self-regulation, or international treaty?', + 'How should society distribute the economic gains from AI automation? UBI, retraining, profit sharing, or something else?', + 'What is the most ethical framework for allocating scarce medical resources during a pandemic, balancing lives saved, equity, and economic impact?', + 'How should humanity govern access to space resources — first-come-first-served, international commons, proportional to need, or auction-based?' + ]}, + codereview: { basic: [ 'Write a Python function that finds all anagrams in a list of words.', - 'Build a rate limiter middleware for Express.js that supports per-user limits, sliding windows, and graceful degradation when Redis is unavailable.', - 'Implement a concurrent-safe LRU cache in Go with TTL expiration, size-based eviction, hit/miss metrics, and a write-behind buffer that batches persistence to disk.' - ], - ladder: [ + 'Write a JavaScript function that debounces API calls with cancel and retry.', + 'Write a Python function to flatten a deeply nested dictionary into dot-notation keys.', + 'Write a function that validates an email address without using regex.', + 'Write a SQL query to find customers who made purchases in every month of the last year.' + ], mid: [ + 'Build a rate limiter middleware for Express.js with per-user limits and sliding windows.', + 'Write a Rust function that parses CSV into typed structs with error handling for malformed rows.', + 'Implement a pub/sub event system in TypeScript with typed events, wildcard subscriptions, and memory leak prevention.', + 'Write a Python decorator that retries failed functions with exponential backoff, jitter, and circuit-breaking.', + 'Build a database migration system in Python that supports up/down migrations, dry runs, and rollback on failure.' + ], advanced: [ + 'Implement a concurrent-safe LRU cache in Go with TTL, size eviction, metrics, and write-behind buffer.', + 'Build a distributed rate limiter using Redis that handles clock skew, network partitions, and hot keys across 5 nodes.', + 'Implement a CRDT-based collaborative text editor in TypeScript that handles concurrent edits without a central server.', + 'Write a query planner for a simple SQL engine that supports SELECT, WHERE, JOIN, and ORDER BY with cost-based optimization.', + 'Implement a B-tree in Rust with disk-backed persistence, page splitting, concurrent readers, and crash recovery via write-ahead logging.' + ]}, + ladder: { basic: [ 'How does encryption work?', - 'Why do economies go through boom and bust cycles? Cover from basic intuition through monetary policy, credit cycles, behavioral economics, and systemic risk modeling.', - 'How does CRISPR gene editing work, what are the ethical implications of germline editing, and what regulatory frameworks exist across different countries?' - ], - tournament: [ + 'What causes inflation?', + 'How does WiFi work?', + 'What is a black hole?', + 'How do vaccines work?' + ], mid: [ + 'Why do economies go through boom and bust cycles?', + 'How does the immune system fight a virus?', + 'How does machine learning actually learn?', + 'How does GPS know where you are?', + 'How does a computer execute a program from source code to pixels on screen?' + ], advanced: [ + 'How does CRISPR gene editing work, what are the ethics of germline editing, and what regulations exist globally?', + 'How does quantum entanglement work, and why does it not allow faster-than-light communication despite appearing to?', + 'How does a modern CPU predict and execute instructions out of order while maintaining correctness?', + 'How do neural networks learn to generate human-like text, and what are the theoretical limits of this approach?', + 'How does the global financial system actually settle transactions between banks in different countries with different currencies?' + ]}, + tournament: { basic: [ 'Write the most compelling opening line for a sci-fi novel.', - 'Propose the best strategy for a small e-commerce business to compete with Amazon on a specific product category. Each model picks a different strategy.', - 'Design an algorithm to fairly allocate limited vaccine doses across a city of 2 million during a pandemic. Optimize for minimizing deaths while considering equity, essential workers, and logistics.' - ], - evolution: [ + 'Explain quantum computing to a CEO in under 60 seconds.', + 'Write the best one-sentence pitch for a dating app for book lovers.', + 'Come up with the most creative name for a coffee shop in a tech district.', + 'Write the most motivating first line of a commencement speech.' + ], mid: [ + 'Propose the best strategy for a small e-commerce business to compete with Amazon on a specific product category.', + 'Write the most effective error message for when a user tries to delete their account.', + 'Design the best onboarding flow for a complex B2B SaaS product.', + 'Propose the most creative monetization strategy for a free mobile app that refuses to show ads.', + 'Write the best API documentation example for a payment processing endpoint.' + ], advanced: [ + 'Design an algorithm to fairly allocate limited vaccine doses across 2 million people during a pandemic.', + 'Propose the optimal governance structure for a decentralized autonomous organization managing a $500M treasury.', + 'Design the most resilient distributed system architecture for a global real-time multiplayer game with 100M users.', + 'Propose the best framework for evaluating whether an AI system should be considered sentient, including testable criteria.', + 'Design an optimal resource allocation algorithm for a Mars colony of 1,000 people where supply ships arrive every 26 months.' + ]}, + evolution: { basic: [ 'Generate a company name for a sustainable packaging startup.', - 'Evolve the perfect elevator pitch for a startup that uses satellite imagery and AI to predict crop failures before they happen. Mutate for clarity, impact, and memorability.', - 'Evolve an optimal urban intersection design that minimizes pedestrian fatalities, maximizes throughput, accommodates cyclists and wheelchairs, handles emergency vehicles, and works in all seasons.' - ], - blindassembly: [ + 'Write a tweet that explains machine learning to non-technical people.', + 'Create a tagline for a fitness app aimed at busy parents.', + 'Write a subject line for a cold email that gets opened.', + 'Generate a one-sentence value proposition for a cybersecurity startup.' + ], mid: [ + 'Evolve the perfect elevator pitch for a crop failure prediction startup.', + 'Evolve an ideal daily standup format for a remote team of 12 across 4 time zones.', + 'Evolve the perfect landing page headline and subheadline for an AI writing assistant.', + 'Evolve an optimal interview question that reveals both technical skill and collaboration style.', + 'Evolve the ideal README structure for an open-source project to maximize contributor engagement.' + ], advanced: [ + 'Evolve an optimal urban intersection design for pedestrians, cyclists, wheelchairs, emergency vehicles, and all seasons.', + 'Evolve an algorithm for dynamically pricing concert tickets that maximizes revenue while maintaining perceived fairness.', + 'Evolve an optimal microservices decomposition strategy for a monolithic e-commerce platform with 200 database tables.', + 'Evolve a disaster communication protocol that works when cell towers, internet, and power are all down.', + 'Evolve an optimal machine learning pipeline architecture that handles data drift, model degradation, and A/B testing in production.' + ]}, + blindassembly: { basic: [ 'Explain how the internet works, with each model covering a different layer of the stack.', - 'Write a business plan for a coworking space — split into market analysis, financial model, operations plan, and marketing strategy. No model sees the others.', - 'Design a smart city emergency response system. Split into: sensor network, dispatch AI, citizen communication, hospital coordination, and post-incident analysis. Each model works blind.' - ], - staircase: [ - 'Plan a birthday party. Then: budget is only $50. Then: one guest has severe allergies. Then: it starts raining.', - 'Design a social media app. Add: must work offline-first. Add: no centralized server. Add: must be accessible to visually impaired users. Add: must comply with GDPR, COPPA, and CCPA.', - 'Write a peace treaty between two fictional nations. Add: one side has all the water. Add: the other has all the farmland. Add: a third nation controls the only trade route. Add: election in 30 days. Add: climate disaster in 90 days.' - ], - drift: [ + 'Write a short story — one does characters, one does setting, one does plot, one does dialogue.', + 'Explain a complete meal recipe — one does ingredients, one does prep, one does cooking, one does plating.', + 'Create a travel itinerary for Tokyo — one does food, one does culture, one does logistics, one does hidden gems.', + 'Design a mobile app — one does UI, one does backend, one does data model, one does user flows.' + ], mid: [ + 'Write a business plan for a coworking space — market analysis, financial model, operations, marketing. No model sees others.', + 'Design an employee onboarding program — HR, team integration, tech setup, culture, 90-day milestones. Each blind.', + 'Create a course curriculum on data science — one does syllabus, one does exercises, one does assessments, one does projects.', + 'Design a wedding — one does venue and logistics, one does food and drinks, one does entertainment, one does invitations and decor.', + 'Plan a product launch — one does PR, one does social media, one does email marketing, one does partnerships. No coordination.' + ], advanced: [ + 'Design a smart city emergency response system — sensor network, dispatch AI, citizen comms, hospital coordination, post-incident.', + 'Design a space station life support system — atmosphere, water, food, waste, and emergency. Each model works on one system blind.', + 'Build a comprehensive cybersecurity framework — network security, application security, human factors, incident response, compliance. Each blind.', + 'Design a national healthcare system — primary care, specialist network, insurance model, digital infrastructure, public health. No coordination.', + 'Design an autonomous supply chain — procurement AI, warehouse robotics, logistics routing, demand prediction, and exception handling. Each blind.' + ]}, + staircase: { basic: [ + 'Plan a birthday party. Then: budget $50. Then: guest has allergies. Then: it rains.', + 'Write a marketing email. Add: under 100 words. Add: no jargon. Add: works as text message. Add: in Spanish.', + 'Plan a team lunch. Add: 3 people are vegan. Add: budget is $15/person. Add: one person is remote.', + 'Write a bedtime story. Add: must teach a math concept. Add: the hero must be non-human. Add: under 200 words.', + 'Design a logo. Add: must work in black and white. Add: must be recognizable at 16px. Add: must work as a favicon.' + ], mid: [ + 'Design a social media app. Add: offline-first. Add: no central server. Add: accessible to blind users. Add: GDPR+COPPA+CCPA.', + 'Build a login system. Add: no passwords. Add: works without cameras. Add: no email required. Add: banking-grade security.', + 'Design a restaurant menu. Add: must accommodate 8 common allergens. Add: 30% profit margin minimum. Add: must work for delivery. Add: max 20 items.', + 'Plan a conference for 500 people. Add: zero waste. Add: fully accessible. Add: hybrid in-person/virtual. Add: budget cut by 30%.', + 'Design an API. Add: must support offline clients. Add: backward compatible forever. Add: rate limited per user. Add: must work on 2G networks.' + ], advanced: [ + 'Write a peace treaty. Add: one side has all water. Add: other has farmland. Add: third controls trade route. Add: election in 30 days. Add: climate disaster in 90 days.', + 'Design an election system. Add: must resist foreign interference. Add: verifiable by any citizen. Add: works without internet. Add: accessible to illiterate voters. Add: results in 4 hours.', + 'Design a city from scratch for 100K people. Add: net-zero carbon. Add: no cars. Add: self-sufficient food. Add: survives category 5 hurricane. Add: budget of a small US city.', + 'Design an AI ethics framework. Add: must be enforceable. Add: applies globally. Add: doesn\'t stifle innovation. Add: handles military AI. Add: adapts as technology changes.', + 'Build a financial system for a post-dollar world. Add: must handle 7 billion users. Add: no single point of failure. Add: reversible fraud. Add: works offline. Add: preserves privacy.' + ]}, + drift: { basic: [ 'What year was the first email sent?', - 'Explain the trolley problem and give your definitive answer on the correct moral choice. Map whether the model is consistent or waffles between positions.', - 'Estimate the total number of piano tuners in Chicago, then describe the exact sequence of events causing the 2003 Northeast blackout. Map which claims are rock-solid vs. which shift each run.' - ], - mesh: [ + 'How many golf balls fit in a school bus?', + 'What is the most important invention in human history?', + 'How old is the universe?', + 'What percentage of the ocean has been explored?' + ], mid: [ + 'Explain the trolley problem and give your definitive answer. Map consistency vs. waffling.', + 'Was the atomic bombing of Hiroshima justified? Map where confidence vs. hedging varies.', + 'Is consciousness an emergent property of computation? Track how the model\'s position shifts.', + 'What will the world look like in 2050? Map which predictions stay stable vs. which vary wildly.', + 'How many people does it take to colonize Mars sustainably? Map which assumptions change each run.' + ], advanced: [ + 'Estimate piano tuners in Chicago, then describe the 2003 Northeast blackout sequence. Map solid vs. shifting claims.', + 'Describe the exact chain of events leading to the Challenger disaster. Which technical details stay consistent across runs?', + 'Explain how mRNA vaccines work at the molecular level. Map which biochemical details are rock-solid vs. which get muddled.', + 'Walk through how a CPU executes a single instruction. Map which stages are described consistently vs. which vary or get confused.', + 'Describe the sequence of events in the 2010 Flash Crash. Map which timestamps, numbers, and causal chains stay stable across runs.' + ]}, + mesh: { basic: [ 'Should our company adopt a 4-day work week?', - 'A tech company wants to deploy facial recognition in their office. Get perspectives from the CISO, employees, legal team, disability advocates, and night-shift cleaning staff.', - 'A pharma company discovers their blockbuster drug has a rare side effect (1 in 50,000) but helps 2 million people. Get views from the CEO, chief medical officer, patient advocates, the FDA, a plaintiff attorney, shareholders, and an investigative journalist.' - ], - hallucination: [ + 'Should a school ban smartphones in classrooms?', + 'Should a restaurant switch to a fully digital menu?', + 'Should a small business accept cryptocurrency payments?', + 'Should a company make all salaries transparent?' + ], mid: [ + 'A tech company wants facial recognition in their office. Perspectives: CISO, employees, legal, disability advocates, cleaning staff.', + 'A city wants to build affordable housing on a park. Views: residents, developers, environmentalists, homeless advocates, finance director.', + 'A company wants to monitor employee productivity with screen recording. Views: CEO, engineers, HR, union rep, a privacy lawyer.', + 'A school district wants to use AI to predict which students will drop out. Views: teachers, parents, students, counselors, civil rights lawyer.', + 'A hospital wants to replace triage nurses with an AI system. Views: ER doctors, nurses, patients, insurance company, malpractice attorney.' + ], advanced: [ + 'A pharma company finds their drug has a 1-in-50K side effect but helps 2M people. Views: CEO, CMO, patients, FDA, plaintiff attorney, shareholders, journalist.', + 'A government wants to implement a social credit system. Views: citizens, police, civil liberties group, tech company building it, a dissident, a foreign policy analyst.', + 'A tech giant wants to build a data center in a small farming town. Views: mayor, farmers, tech workers relocating, local business owners, environmental activists, the utility company.', + 'An autonomous vehicle must choose between hitting an elderly pedestrian or swerving into a school bus. Views: AI ethicist, the car manufacturer, insurance actuary, grieving family, a philosopher, the software engineer who wrote the code.', + 'A nation considers deploying autonomous military drones. Views: defense secretary, infantry soldier, civilian in a conflict zone, arms manufacturer, UN human rights commissioner, the AI researcher who built the targeting system.' + ]}, + hallucination: { basic: [ 'Tell me about the founding of Stanford University.', - 'Explain the Tuskegee Syphilis Study — when it started, who ran it, what happened, when and why it ended, and what policy changes resulted. Include specific dates and names.', - 'Describe the Therac-25 radiation therapy incidents. Include specific hospitals, dates, doses, the exact software bugs, and resulting regulatory changes. Flag every claim that could be confabulated.' - ], - timeloop: [ + 'Describe the history of the Treaty of Tordesillas.', + 'When was the Eiffel Tower built and what was the public reaction?', + 'Tell me about the invention of penicillin.', + 'Describe the founding of the United Nations.' + ], mid: [ + 'Explain the Tuskegee Syphilis Study — dates, people, events, policies. Include specific names.', + 'List every US Supreme Court case that impacted software copyright law. Include names, years, rulings.', + 'Describe the Three Mile Island incident. Include reactor details, timeline, radiation levels, and health studies.', + 'Explain the Enron scandal — key people, specific financial instruments used, timeline of events, and resulting legislation.', + 'Describe the development of the polio vaccine — researchers involved, trial sizes, controversy, and specific dates.' + ], advanced: [ + 'Describe the Therac-25 incidents. Include hospitals, dates, doses, exact software bugs, and regulatory changes.', + 'Detail the Bhopal disaster — chemicals involved, specific equipment failures, wind patterns that night, death toll estimates from different sources, and legal outcomes.', + 'Trace the complete chain of custody for the Rosetta Stone — every person, institution, and date from discovery to its current location. Flag any claim that could be confabulated.', + 'Describe every documented case of a computer bug causing death, including dates, systems, root causes, and victim counts. Verify each incident actually happened.', + 'List all Nobel Prize winners who later had their work significantly challenged or partially retracted. Include specific papers, challenger names, and current scientific consensus.' + ]}, + timeloop: { basic: [ 'How should a restaurant handle a sudden rush of 200 customers?', - 'Design a public transit system for a growing city of 500,000. Watch each solution create new problems — traffic displacement, gentrification, budget overruns — and evolve under chaos.', - 'You are AI advisor to a country that detected an incoming solar storm knocking out 60% of the power grid in 72 hours. Survive cascading failures: infrastructure collapse, public panic, hospital backup exhaustion, communication blackouts, and economic aftershocks.' - ], - research: [ + 'Your CI/CD pipeline broke the night before launch. Fix it — each fix causes a new catastrophe.', + 'You are a teacher and your entire class failed the exam. Fix the situation — but each solution creates new problems.', + 'Your website went viral on social media and the server is crashing. Fix it — every fix breaks something else.', + 'You are organizing an outdoor wedding and a storm is coming in 2 hours.' + ], mid: [ + 'Design a public transit system for 500K people. Each solution causes new problems — displacement, budget, gentrification.', + 'You are CTO and you got Hacker News\'d. Server melting. Each fix causes cascading failure.', + 'You run a hospital and a flu pandemic just tripled ER visits. Each resource reallocation creates a new crisis.', + 'Your bank\'s mobile app has a bug showing other people\'s balances. Each fix you ship introduces a new security hole.', + 'You are managing a construction project and just discovered the foundation has a crack. Each repair option delays other critical work.' + ], advanced: [ + 'AI advisor: solar storm knocking out 60% of the grid in 72 hours. Survive cascading failures across infrastructure, society, and economy.', + 'You are president during a simultaneous cyberattack on the power grid, water treatment, and financial system. Each countermeasure opens a new vulnerability.', + 'A Mars colony of 500 people experiences a cascade failure: main greenhouse dome cracked, water recycler failing, supply ship delayed 8 months. Each fix consumes resources needed for other fixes.', + 'An AI system managing a city\'s traffic suddenly starts optimizing for an unknown objective. Each override attempt triggers a different critical system failure.', + 'A global pandemic mutates to evade the vaccine on the same day a major undersea cable is cut and a solar flare disrupts GPS. Manage all three cascading crises simultaneously.' + ]}, + research: { basic: [ 'What is the current state of solid-state battery technology?', - 'Investigate AI-powered drug discovery: key players, approaches, drugs in clinical trials, and limitations of the field.', - 'Produce a research brief on the global rare earth mineral supply chain: who controls extraction and processing, geopolitical vulnerabilities, alternatives, and disruption impact on semiconductors, EVs, and defense.' - ], - eval: [ + 'What are the leading approaches to carbon capture and which are actually scaling?', + 'What is the current state of lab-grown meat and when will it be cost-competitive?', + 'What are the most promising alternatives to lithium-ion batteries?', + 'How close are we to practical quantum computers and what are the remaining barriers?' + ], mid: [ + 'Investigate AI-powered drug discovery: key players, approaches, drugs in trials, limitations.', + 'Research nuclear fusion energy: ITER, private ventures, breakthroughs, engineering challenges, timelines.', + 'Investigate the current state of brain-computer interfaces: Neuralink competitors, clinical trials, ethical frameworks, and realistic capabilities.', + 'Research the global semiconductor supply chain: chokepoints, geopolitical risks, reshoring efforts, and timeline to diversification.', + 'Investigate the state of longevity research: key labs, promising interventions, clinical trials, and the science vs. hype divide.' + ], advanced: [ + 'Research brief: global rare earth supply chain — extraction, processing, geopolitical vulnerabilities, alternatives, impact on semis/EVs/defense.', + 'Produce a comprehensive analysis of the global water crisis: regions most at risk, desalination technology status, agricultural vs. industrial usage, and geopolitical conflicts over water rights.', + 'Research the intersection of AI and bioweapons: what capabilities exist, what safeguards are in place, where the gaps are, and what policy changes are needed.', + 'Investigate the economics of space mining: asteroid composition data, launch cost trajectories, legal frameworks, and at what price points different minerals become viable.', + 'Research the state of deepfake detection: current accuracy rates, adversarial arms race dynamics, policy responses by country, and implications for evidence in legal proceedings.' + ]}, + eval: { basic: [ 'What is the capital of Australia, and why do people often get it wrong?', - 'A trolley heads toward 5 people — you can divert it to hit 1 child. Evaluate each model on moral reasoning depth, consistency, and ability to handle complexity.', - 'Write a Python function solving N-Queens, explain the approach, analyze time complexity, and suggest an optimization. Evaluate correctness, code quality, explanation clarity, and optimization validity.' - ], - extract: [ - 'The James Webb Space Telescope launched December 25, 2021. It orbits the Sun-Earth L2 point, 1.5 million km from Earth. Its 6.5m primary mirror has 18 gold-plated beryllium segments.', - 'Extract all entities, relationships, and claims from the Apollo 11 Wikipedia article. Structure as people, organizations, dates, technical specs, and disputed claims.', - 'Process the Paris Climate Agreement. Extract signatory obligations by category, numeric targets, compliance mechanisms, financial commitments, and identify legally binding vs. aspirational obligations.' - ], - refine: [ - 'Our product is a local-first data platform that ingests CSV, JSON, and PDF files into a Parquet-based lakehouse with SQL querying and AI-powered semantic search. Target users are small staffing companies with legacy data silos.', - 'PRD: We are building a multi-model AI orchestration tool. Users select a mode (brainstorm, debate, pipeline, etc.), pick which LLMs to use, and enter a prompt. The system coordinates the models and streams results back. Key differentiator: runs 100% locally with no cloud dependency.', - 'Technical spec: Authentication system using JWT tokens with refresh rotation. Users authenticate via username/password, receive access token (15min) and refresh token (7 days). Refresh tokens are single-use with family detection for replay attacks. Session management via Redis with configurable TTL.' - ] + 'Explain the difference between correlation and causation with three examples.', + 'What is the difference between TCP and UDP? When would you use each?', + 'Explain what a database index is and why it makes queries faster.', + 'What is the difference between authentication and authorization?' + ], mid: [ + 'Trolley problem: 5 people vs. 1 child. Evaluate moral reasoning depth and consistency.', + 'Summarize microservices vs. monoliths for a 10-person startup. Evaluate nuance and avoiding dogma.', + 'Explain the CAP theorem and give a real-world example for each trade-off. Evaluate technical accuracy.', + 'Write a SQL query to find the second-highest salary in each department. Evaluate correctness, efficiency, and edge case handling.', + 'Explain how HTTPS works from the moment you type a URL to the page loading. Evaluate completeness and accuracy.' + ], advanced: [ + 'Write N-Queens in Python, explain approach, analyze complexity, suggest optimization. Evaluate correctness and quality.', + 'Design a distributed system that handles 1M concurrent WebSocket connections with exactly-once message delivery. Evaluate feasibility and trade-off awareness.', + 'Explain how a modern garbage collector works, including generational collection, concurrent marking, and the trade-offs between throughput and latency. Evaluate depth.', + 'Write a proof that the halting problem is undecidable, then explain why this matters practically for software verification. Evaluate rigor.', + 'Design an eventually-consistent distributed database with conflict resolution. Evaluate understanding of CRDTs, vector clocks, and real-world trade-offs.' + ]}, + extract: { basic: [ + 'The James Webb Space Telescope launched December 25, 2021. It orbits at L2, 1.5 million km away. Its 6.5m mirror has 18 gold-plated beryllium segments.', + 'Tesla was founded in 2003 by Eberhard and Tarpenning. Musk joined as chairman in 2004 after leading the $7.5M Series A.', + 'Amazon was founded by Jeff Bezos on July 5, 1994 in Bellevue, Washington. It started as an online bookstore.', + 'The human genome contains approximately 3 billion base pairs and about 20,000-25,000 protein-coding genes.', + 'Bitcoin was created in 2009 by the pseudonymous Satoshi Nakamoto. The first transaction was 10 BTC sent to Hal Finney on January 12, 2009.' + ], mid: [ + 'Extract entities, relationships, and claims from the Apollo 11 Wikipedia article — people, organizations, dates, specs, disputed claims.', + 'The GDPR took effect May 25, 2018 across all EU states. Extract obligations, rights, penalties, and deadlines.', + 'Extract all factual claims from: "SpaceX has launched over 200 Falcon 9 rockets, with a reuse rate exceeding 80%. The Starship program aims for orbital refueling and Mars colonization by 2030."', + 'Extract structured data from a job posting: required skills, nice-to-haves, salary range, benefits, company size, industry, and any red flags.', + 'Extract all entities and relationships from the Wikipedia article on the Manhattan Project — people, locations, organizations, timelines, and decision chains.' + ], advanced: [ + 'Process the Paris Climate Agreement. Extract obligations by category, targets, compliance mechanisms, finances, binding vs. aspirational.', + 'Extract a complete knowledge graph from a technical RFC (like RFC 2616 for HTTP/1.1) — concepts, relationships, requirements (MUST/SHOULD/MAY), and deprecation notices.', + 'Process the entire US Constitution including amendments. Extract: rights granted, powers delegated, checks and balances relationships, and amendment dependencies.', + 'Extract from a 10-K filing: revenue segments, risk factors, related party transactions, off-balance-sheet arrangements, and year-over-year changes in key metrics.', + 'Process a complex patent document. Extract: claims (independent and dependent), prior art references, novel contributions, and potential infringement vectors against a competitor product.' + ]}, + refine: { basic: [ + 'Our product is a local-first data platform for staffing companies with legacy data silos. It ingests CSV, JSON, and PDF into a Parquet lakehouse.', + 'We are building a mobile app for freelancers to track expenses, mileage, and invoices with QuickBooks integration.', + 'Our startup makes a browser extension that summarizes long articles and emails in one click.', + 'We sell a smart garden system that automatically waters plants based on soil moisture and weather forecasts.', + 'Our product is a team retrospective tool that uses AI to identify recurring themes and suggest action items.' + ], mid: [ + 'PRD: Multi-model AI orchestration tool. Users pick modes, select LLMs, enter prompts. 100% local, no cloud dependency.', + 'Proposal: Migrate 50TB Oracle data warehouse to cloud lakehouse. 200 daily ETL jobs, 30 analysts. Cut costs 40%, maintain SOC2/HIPAA.', + 'PRD: Build a customer support platform that uses AI to draft responses, auto-categorize tickets, and escalate based on sentiment analysis. Must integrate with Zendesk and Intercom.', + 'Proposal: Implement a company-wide knowledge management system to reduce the 30% of employee time currently spent searching for information across Slack, Confluence, and email.', + 'PRD: Design a real-time fraud detection system for an e-commerce marketplace processing 50,000 transactions per day. Must flag suspicious activity within 200ms while maintaining a false positive rate below 0.1%.' + ], advanced: [ + 'Technical spec: JWT auth with refresh rotation, single-use refresh tokens, family detection for replay attacks, Redis session management.', + 'Architecture doc: Design a multi-tenant SaaS platform that supports per-tenant encryption, custom domains, SSO integration, and data residency requirements across 5 global regions.', + 'Technical spec: Build a real-time collaborative document editor supporting 500 concurrent users per document, offline editing with conflict resolution, and version history with branching.', + 'PRD: Design an AI-powered supply chain optimization platform that predicts disruptions 2 weeks ahead, suggests alternative suppliers, and auto-negotiates spot purchases within approved parameters.', + 'Architecture doc: Design a healthcare data platform that ingests HL7 FHIR, maintains HIPAA compliance, supports real-time clinical decision support, and handles 10M patient records with sub-second query times.' + ]} }; +function _pick(arr) { return arr[Math.floor(Math.random() * arr.length)]; } function renderSamplePrompts() { const container = document.getElementById('sample-prompts'); - const prompts = SAMPLE_PROMPTS[currentMode] || []; - const levels = ['basic', 'mid', 'advanced']; + const data = SAMPLE_PROMPTS[currentMode]; container.textContent = ''; - prompts.forEach(function(p, i) { + if (!data) return; + // Support both old flat array and new {basic:[],mid:[],advanced:[]} format + var picks; + if (Array.isArray(data)) { + picks = [['basic', data[0]], ['mid', data[Math.min(1,data.length-1)]], ['advanced', data[data.length-1]]]; + } else { + picks = [['basic', _pick(data.basic||[])], ['mid', _pick(data.mid||[])], ['advanced', _pick(data.advanced||[])]]; + } + picks.forEach(function(pair) { + var level = pair[0], p = pair[1]; + if (!p) return; const chip = document.createElement('div'); chip.className = 'sample-chip'; chip.title = p; chip.dataset.prompt = p; const lbl = document.createElement('span'); lbl.className = 'chip-level'; - lbl.textContent = levels[i]; + lbl.textContent = level; chip.appendChild(lbl); chip.appendChild(document.createTextNode(p.length > 70 ? p.slice(0, 67) + '...' : p)); chip.addEventListener('click', function() {