Skip to main content

How We Built a Self-Healing Product Loop

Routines, cloud containers, and triage agents that turn overnight error signals and feedback into reviewable fixes by morning.

Michael Parker8 min read

Illustrative figures in this post are drawn from our internal dashboards over a two-week window and rounded for readability. They are directional, not benchmarks.

For most of last year, Stunt Double's engineering rhythm looked like this. A user hits an error. Sentry pings the channel. I (or whoever's on it) drops the feature work in flight, opens the trace, tries to reproduce, files a Linear issue, decides whether it needs fixing now or in the next cycle, and tries to remember where I was before the ping arrived.

Every individual step was cheap. The aggregate was brutal. We tracked it for a fortnight and found we were paying roughly forty per cent of our engineering time on reactive triage, not because the bugs were hard, but because the context switches were constant.

A few weeks ago we shipped a different shape of system. The error reports and feedback signals still arrive at the same rate. We just don't react to them anymore. They flow into a loop that triages, dedupes, and proposes fixes overnight, and we review the queue with our morning coffee.

This post is how we built it.

The four pieces

We didn't set out to build a "self-healing" anything. We set out to stop being interrupted. The system that emerged has four moving parts.

  1. Error boundaries in the product that emit structured, agent-readable signals (not just stack traces).
  2. Claude Code routines as the orchestration layer that listens for those signals and decides what to dispatch.
  3. Cloud containers so every agent run gets a clean, isolated checkout of the repo.
  4. Triage and coding agents that turn raw signals into Linear issues and draft pull requests.

A fifth piece, easy to miss, is the human review pattern that sits on top. The whole point of the loop is to deliver reviewable work, not autonomous merges. We'll come back to that.

The flow

  Error boundary / feedback signal
              │
              ▼
       Routine trigger
              │
              ▼
   Triage agent (cloud container)
              │
       ┌──────┴───────────────────────────────────┐
   duplicate?                                      new
       │                                            │
       ▼                                            ▼
 update Linear issue,                 create Linear issue (scope + repro)
 increment signal count                             │
                                                    ▼
                                        well-scoped and safe?
                                                    │
                                  ┌─────────────────┴─────────────────┐
                                 no                                  yes
                                  │                                   │
                                  ▼                                   ▼
                        hold for human triage          Coding agent (cloud container)
                                                                      │
                                                                      ▼
                                                        Draft PR (fix + reasoning)
                                                                      │
                                                                      ▼
                                                          Morning review queue
                                                                      │
                                 ┌────────────────────────────────────┼───────────────────────┐
                              merge                                redirect                   close
                                 │                                     │                        │
                                 ▼                                     ▼                        ▼
                               ship                       re-dispatch with context       mark wrong-tree

It looks more complicated drawn out than it feels in practice. From the human seat it's just two surfaces. Linear, where issues appear, get deduped, and accumulate context. And GitHub, where draft PRs queue up overnight.

Error boundaries as signals, not just safety nets

The first thing we changed wasn't the agents. It was the data they had to work with.

Our error boundaries used to do what error boundaries usually do. Catch, render a fallback, log to Sentry. Useful for users, mediocre for agents. A stack trace tells you where something blew up. It tells you almost nothing about what the user was trying to do.

We rewrote our boundaries to emit a structured event that includes the route, the actor and workspace context, the last few user actions in the session, the relevant feature flag state, and a compact summary of what the user was likely attempting. The boundary still catches and renders the fallback. It just also speaks a language a triage agent can reason about.

Feedback signals get the same treatment. Customer messages that hit our inbox are normalised into the same envelope. From the routine's perspective, "the dashboard crashed when I clicked refresh" and "I tried to refresh the dashboard and it spun forever" arrive as comparable inputs.

The lesson here, and we wish we'd internalised it sooner, is that agents are only as good as the structure of what they're handed. The clearer the signal, the less the agent has to guess.

Routines as the orchestration layer

The new routine triggers in Claude Code are the spine. We have one routine subscribed to error events, one to customer feedback, and one to a deduped-issue queue inside Linear. Each routine is short. It reads the signal, decides which agent profile to dispatch, and hands off.

Routines aren't agents. That distinction matters. They're the policy layer that says "this kind of signal goes to this kind of worker in this kind of container with these guardrails." When we want to change behaviour, we change a routine, not an agent prompt. That separation has been the single biggest reason the system stays maintainable.

Cloud containers as the substrate

Every dispatch runs in its own cloud container. Clean checkout, scoped credentials, a fresh state. We use containers rather than long-lived workers for three reasons.

First, isolation. A coding agent that's exploring a fix should not pollute the workspace another agent is reasoning about.

Second, reproducibility. When we review a proposal, we can rerun the exact container if we want to see what the agent saw.

Third, the cost shape suits the workload. Most of these runs are short, bursty, and parallel. Long-running workers are the wrong primitive.

The boring practical detail is that we cache the dependency layer aggressively. A cold checkout to "ready to run tests" is around twenty seconds for our monorepo, which is fast enough that no one notices.

Triage agents and the deduplication problem

This is where the system earns its keep, and it's the piece we underestimated most.

The triage agent's job sounds simple. Read the signal. Search Linear for similar existing issues. If there's a match, update it and increment the signal count. If not, create a new issue with a clean scope, repro steps where possible, and a confidence rating.

The reason this matters is that most "new" errors aren't new. In our two-week measurement window, roughly seventy per cent of inbound error signals were duplicates of an existing issue, or duplicates of each other arriving in the same hour. Before the loop, those duplicates landed as fresh Sentry alerts and got reacted to individually. After the loop, they aggregate against a single issue with a rising count, and we get to make one decision instead of twenty.

Deduplication turns out to be a precondition for everything downstream. If the triage agent gets dedup wrong, the coding agent gets dispatched repeatedly against the same root cause, and the review queue fills up with redundant proposals. We spent more time tuning the dedup heuristics than any other part of the system.

The agent uses three signals to match. Structural similarity in the captured event envelope, semantic similarity against the issue title and description, and a stack-trace fingerprint where one exists. A match on any two is treated as a duplicate. A match on one flags it for human confirmation. This has held up well.

Coding agents and the proposal contract

Only a subset of issues get dispatched to a coding agent automatically. The routine applies a safety filter. Scope clear, blast radius bounded, no production data dependencies, no auth or billing surfaces. Anything that fails the filter sits in a "hold for human triage" lane.

For issues that pass, the coding agent gets the Linear ticket, a clean container with the repo checked out, and a contract. Reproduce the fault, propose a fix, open a draft PR, and write up the reasoning in the PR description. If you can't reproduce it, say so and stop. If the fix touches more than the change you set out to make, stop.

That last rule was hard-won. Early versions of the agent would notice tangentially related smells and start refactoring. The PRs were technically correct and impossible to review. Constraining the agent to its declared scope made the review queue tractable.

The morning review pattern

The human side of the loop is the part I want to dwell on, because it's the part I didn't expect to enjoy.

Each morning there's a queue. Usually somewhere between five and twelve proposals, plus a handful of issues that were filed but didn't progress to a fix. I work through it in one sitting, which takes roughly thirty to forty minutes.

For each proposal I'm making one of three calls. Merge, redirect, or close. Merge is self-explanatory. Redirect means the fix is in the wrong direction and the agent needs more context, which I provide in a PR comment and re-dispatch. Close means the agent took a wrong-tree path and the issue needs human attention or shouldn't be fixed at all.

In our window, the split has settled at roughly a third, a third, a third. Slightly better than that on small bugs, slightly worse on anything that touches state or background jobs. The "wrong-tree" rate is the one I watch most carefully, because it's the canary for whether the upstream triage is degrading.

What's changed in my day isn't that bugs get fixed faster, although they do. It's that I no longer carry the open loop. Bugs that used to nibble at attention all day now sit in a single review surface I visit once. The cognitive load drop has been the real payoff.

What the numbers look like

A snapshot from the most recent two-week window, with the caveat at the top of this post.

  • Inbound error and feedback signals: ~340
  • Unique issues after dedup: ~95
  • Auto-dispatched to a coding agent: ~52
  • Draft PRs produced overnight: ~48
  • Merged after review: ~17
  • Redirected (re-dispatched with context): ~14
  • Closed as wrong-tree: ~17
  • Mean time from signal to merged fix for the merged set: ~14 hours, most of it overnight

The signal-to-merged-fix number is the one that surprised us. It used to live in the three-to-seven-day range for non-critical bugs, because the bottleneck wasn't fix complexity, it was attention.

What we got wrong

A few things, in case it saves you time.

We over-trusted the coding agent early. The first week we let it open non-draft PRs and CI noticed. Switching to draft-only and gating merges behind explicit human review was the right call.

We under-invested in the safety filter. Our first version of "well-scoped and safe" let through changes to our background jobs runner, and the resulting PRs were genuinely scary. The filter is now conservative and we're glad of it.

We assumed the triage agent could work from stack traces alone. It can't. The work we put into structured error boundaries was the highest-leverage investment in the whole system.

Where this is going

The loop currently handles errors and inbound feedback. The obvious next surface is regressions caught by our own test suite during scheduled runs, which fit the same envelope. Beyond that, the pattern generalises. Anything that emits a structured signal and lands in a tracker is candidate work for the same triage-and-propose loop.

The bigger shift, the one I keep thinking about, is that the unit of engineering work has quietly stopped being "find and fix the bug." For a class of bugs, the unit is now "review and accept the proposal." The skill that matters has moved one rung up.

If you're running a small team and feeling permanently behind on triage, this pattern is worth an afternoon of your time. The pieces are all available. The hard part is letting go of doing the triage yourself for long enough to see whether the loop closes.

If you want to chat about the setup, or you're stuck on the error-boundary structuring step, I'm on LinkedIn or via the Stunt Double site. Happy to compare notes.