Why I Built Stunt Double

The Problem I Kept Hitting

I kept noticing the same thing: AI agents like OpenAI's Operator and Claude's computer use are already out there using products in the wild. And most products break silently when the user is not human.

The sign-up flow would vanish. A button would stop responding. An error message would confuse the automation. Then you'd find out weeks later through a GitHub issue, or worse, someone's frustrated tweet.

I needed a better way to catch this friction before real users (and bots) hit it. Not flaky end-to-end tests. Not expensive QA teams. But realistic AI personas: actual agents that behaved like real users, with goals and preferences and different levels of technical skill.

So I built Stunt Double.

The Idea

The concept is straightforward: create an actor (give it a name, background, and goal), define a checklist ("can a new user sign up?" or "can I pay?"), run it, and get results in seconds. If something breaks, it auto-creates an issue in Linear or GitHub.

Think of it as your QA team, but they're always available and actually understand user intent.

Each actor is a realistic AI persona. Sarah is a product manager with 10 years of experience. Thomas is a junior developer who gets frustrated with confusing interfaces. Priya is an accessibility-focused user who navigates with a screen reader. They each interact with your product differently, and they each find different problems.

What Makes It Different

What I'm most proud of is the MCP integration. You can trigger a test from Claude, Cursor, or Windsurf without leaving your IDE. No API keys. No extra dashboard. Just natural conversation with your AI assistant.

You: "Run the sign-up checklist for the Product workspace"

Claude:
Checklist run initiated.
Status: Complete

Results:
Pass - Create account
Pass - Verify email
Warning - Navigate to dashboard (2.4 seconds, expected under 2s)
Pass - View workspace settings

That's it. You asked, it tested, you got results. All without switching context.

Under the Hood

The technical implementation uses Browserbase for managed browser sessions, Stagehand for natural language browser automation via Claude, and Trigger.dev for the agentic task loop. Results stream back via Supabase realtime.

The agentic loop works like this: send a screenshot and instructions to Claude, the model returns tool-use blocks (click, type, navigate, extract), we execute each action in the browser, take another screenshot, and repeat until the task is complete or something goes wrong. Each step is logged with screenshots for debugging.

Browser automation is not magic. Some interactions are genuinely ambiguous. Is that button actually clickable? Should the agent use the keyboard or the mouse? Stagehand handles most cases well, but edge cases exist. We lean into human augmentation rather than claiming full autonomy.

The Trade-Offs

Session costs (Browserbase) mean this is not free. But it is cheaper than hiring QA. And the MCP angle makes it useful even for solo developers: ship fast, test with AI, catch friction before users do.

I have been honest about what Stunt Double can and cannot do. It is not a load testing tool. It is not a replacement for a human who can tell you whether your copy feels right. It is a tool for catching the friction that matters, quickly and at scale, before your users encounter it.

What I Have Learned

Building this has taught me a few things.

First, AI agents are a real and growing user category. Products that work well for agents will have an edge, because the user base is expanding beyond humans.

Second, the best testing strategy uses multiple approaches. Scripted automation for regression. AI-powered testing for exploration. Manual testing for nuance. They are complementary, not competing.

Third, developer experience matters enormously. The MCP integration is the reason people actually use Stunt Double regularly. Removing context switching is not a nice-to-have, it is the core value proposition.

What Comes Next

I am building in public and shipping fast. The roadmap includes deeper integrations with GitHub and Linear, self-hosted workers for enterprise teams, and a native macOS app. If you have ideas for what would make this more useful, I genuinely want to hear them.

Try it at stuntdouble.io and send feedback. I am reading everything.