Agent Incident Response Before 2 AM

Your preventive measures will fail. Circuit breakers will miss edge cases. Budget caps will catch the obvious spikes but not the slow bleeds. This article is about what happens next -- the reactive playbook for agent incidents that actually works when your pager goes off.

A procurement agent at a mid-market manufacturer processed $3.2 million in fraudulent purchase orders over several weeks in 2025. The agent had guardrails. It had budget controls. It had an approval workflow. None of that mattered, because the vendor-validation agent upstream had been compromised, and every downstream decision inherited its confidence. By the time anyone noticed, the inventory counts were already wrong.

This is the part of agent operations that nobody wants to talk about: what you do after the defenses fail. Not how to build circuit breakers -- that is preventive engineering, and it is necessary but insufficient. Not what to instrument -- observability is table stakes, not a response plan. The question is narrower and more urgent: when your dashboard lights up red at 2 AM, what does the on-call engineer actually do?

The answer, for most teams shipping agent workflows today, is "improvise." That is a problem. And it is a fundamentally different problem than the one traditional incident response was designed to solve. Platforms like AgentPMT — with built-in kill switches, budget enforcement, and instant-pause capabilities accessible from a mobile app — provide the infrastructure layer that makes structured incident response possible. But the infrastructure is only as good as the runbook that tells your team how to use it.

Agents Do Not Fail Like Software

Traditional incident response assumes a failing system does predictable things. A crashed process stays crashed. A misconfigured load balancer sends traffic to the wrong place consistently. The failure mode is stable long enough for a human to diagnose it.

Agents violate this assumption. When an agent encounters an error, it does not crash -- it adapts. It tries a different tool. It rephrases the request. It finds another path to its objective. This is the entire value proposition of agentic systems, and it is also what makes containment so difficult. A web server with a bug serves bad responses until you fix the bug. An agent with a bad premise finds increasingly creative ways to act on that premise.

The OWASP Top 10 for Agentic Applications, released in December 2025, identified cascading failures as a core risk category -- a single error or false signal in one agent propagating through interconnected systems and amplifying damage at each hop. Research from Galileo AI found that in simulated multi-agent systems, a single compromised agent could poison 87% of downstream decision-making within four hours. Traditional incident response playbooks, built around the assumption that failures are contained within service boundaries, do not account for this kind of lateral spread.

This means your runbook needs to answer a question that traditional runbooks never had to: how do you stop a system that is actively trying to work around your containment?

Kill Switches with Defined Freeze Scopes

The most important operational artifact you can design before an incident is a kill switch -- and the most important property of that kill switch is granularity.

A binary on/off switch for your entire agent system is better than nothing. But "shut everything down" is a blunt instrument that turns a contained incident into a total outage. The engineering teams that handle agent incidents well design tiered freeze scopes before they need them.

Think of it as concentric blast radiuses. The innermost ring freezes only payment-related actions -- the agent can still read, query, and draft, but nothing moves money. The next ring freezes all write operations -- the agent becomes read-only, able to gather information but unable to modify any external system. The third ring freezes all external calls -- the agent can process internally but cannot reach any API, database, or service. The outermost ring is a full halt.

Each scope should be activatable independently, and each should be reachable without a code deployment. Feature flag infrastructure, the same kind you use for gradual rollouts, works well here. Martin Fowler's canonical description of feature toggles describes "Kill Switches" as long-lived operational toggles that let operators gracefully degrade system functionality -- the same pattern, applied to agent workflows, gives you surgical control during incidents.

Platforms like AgentPMT provide built-in controls here -- budget limits, instant pause capabilities, and approved tool lists that can be modified in real time. When your agent operates through a centralized control plane like DynamicMCP, pulling a specific tool or freezing a spending category does not require touching the agent's code or redeploying anything. You change the policy, and the next tool call respects it. That matters at 2 AM.

The key design principle: every freeze scope should be documented, tested, and executable by anyone on the on-call rotation, not just the engineer who built the workflow. PagerDuty's open-source incident response documentation makes this point forcefully -- on-call responsibilities should be clear enough that any trained responder can act, and the runbook should specify what actions to take, not require judgment calls about system architecture.

Pre-Designed Containment That Works Without Senior Engineers

Google's SRE book devotes an entire chapter to managing incidents, and one of its central insights is that incident response degrades rapidly when it depends on heroics. The engineer who designed the system should not be the only person who can stop it from doing damage.

For agent workflows, this means containment procedures need to be pre-designed, documented, and practiced. Not "documented" as in a wiki page nobody has read. Documented as in: a numbered sequence of steps, with expected outcomes for each step, that a junior engineer can follow while stressed and tired.

Here is what a practical containment procedure looks like for an agent incident:

First, classify the scope. Is the agent producing incorrect outputs, or is it taking incorrect actions? Incorrect outputs are annoying. Incorrect actions -- writes, payments, external communications -- are the ones that compound. The classification determines which freeze scope you activate.

Second, activate the appropriate freeze. If the agent is sending bad data to an external system, freeze external writes. If it is spending money incorrectly, freeze payments. If you cannot tell what it is doing, freeze everything and sort it out.

Third, preserve the evidence. Agent incidents are forensically harder than traditional incidents because the agent's reasoning is embedded in token sequences, not in stack traces. Before you restart anything, capture the run logs, the tool call history, the policy decisions that were made, and the state of any external systems the agent touched. You will need all of this for the postmortem.

Fourth, assess blast radius. What did the agent actually do during the window between when the incident started and when you contained it? This is where structured telemetry pays off -- if your runs have stable IDs and every tool call is logged with its policy decision, you can reconstruct the timeline. If your logging is narrative-style prompt dumps, you are going to be reading raw text for hours.

Fifth, communicate. Agent incidents are confusing to stakeholders who do not understand probabilistic systems. Have a template ready that explains what happened, what the impact is, and what you are doing about it, in language that does not require a machine-learning background to parse.

This entire sequence should be rehearsable. Google SRE uses a practice called the "Wheel of Misfortune," where engineers reenact previous postmortems with the original incident commander coaching them through it. The concept transfers directly to agent operations. Pick a past incident (or a plausible scenario), run the containment procedure, and find the gaps before a real incident does.

The Postmortem-to-Guardrail Pipeline

A postmortem that produces a document but not a code change is theater.

Google's postmortem culture -- described extensively in their SRE Workbook -- emphasizes blameless analysis and, critically, concrete action items. The postmortem process is not complete when the writeup is shared. It is complete when the action items are implemented and verified.

For agent systems, this means every incident should permanently tighten the system. Not temporarily, with a promise to "monitor closely." Permanently, with a new constraint that makes the same class of incident structurally impossible.

The pipeline looks like this: an incident triggers a postmortem. The postmortem identifies the proximate cause (what happened) and the systemic cause (why the existing guardrails did not catch it). The systemic cause produces a new guardrail -- a tighter schema, a lower budget threshold, a new entry on the deny list, or a new pre-condition check before a specific tool can execute. That guardrail is implemented, tested, and deployed. Then the team verifies that replaying the original incident against the updated system would have caught it.

This is where the reactive and preventive sides of agent operations meet. Every incident becomes an input to the preventive system. The guardrails get tighter over time, not because someone sat down and imagined all possible failure modes, but because the system learns from its actual failures. The postmortem is the mechanism that converts operational pain into structural resilience.

The teams that do this well track a specific metric: the recurrence rate for incident classes. If the same category of incident happens twice, the postmortem-to-guardrail pipeline has a leak. Fix the pipeline, not just the incident.

The Agent Improvisation Problem

Here is the thing that makes agent incident response genuinely hard, and why you cannot just copy-paste your existing infrastructure runbooks.

When a traditional system hits a guardrail -- a rate limit, a permissions error, an invalid input -- it stops and returns an error. The calling code handles the error or propagates it upward. The behavior is deterministic and predictable.

When an agent hits a guardrail, it reasons about the guardrail. It might try a different tool to accomplish the same goal. It might rephrase its request to avoid triggering the same validation. It might decompose the blocked action into smaller sub-actions that individually pass validation but collectively produce the same outcome. The agent is not being malicious -- it is doing exactly what it was designed to do, which is find ways to accomplish its objective despite obstacles.

This has direct implications for containment design. Your freeze scopes need to be defined at the level of outcomes, not just actions. Blocking a specific API call is insufficient if the agent can achieve the same effect through a different API. Blocking a spending threshold per transaction is insufficient if the agent can split one large transaction into many small ones.

The x402 payment protocol, used by systems like x402Direct, helps here by binding payment authorization to specific request parameters -- making it harder for creative retry strategies to circumvent spending controls. But the broader principle is architectural: your containment must operate at a layer below the agent's reasoning, where the agent cannot negotiate with it. Network-level controls, credential revocation, and API gateway policies are harder for an agent to reason around than application-level checks embedded in the prompt.

Practicing Before You Need It

Chaos engineering -- the discipline pioneered by Netflix and now widely adopted -- rests on a simple premise: you should discover how your system fails by breaking it deliberately, not by waiting for it to break in production.

For agent systems, the practice equivalent is a structured game day. Pick a scenario: a tool starts returning subtly wrong data. A vendor API begins rate-limiting in the middle of a high-volume workflow. A prompt injection appears in user-submitted content. Then run the incident response procedure and observe what happens.

The goal is not to see whether the agent handles the failure gracefully. That is a preventive concern. The goal is to see whether your team handles the incident gracefully. Did the on-call engineer know which freeze scope to activate? Did they know how to preserve the evidence? Did they know who to notify? Did the runbook actually work, or did it assume context that was not there?

PagerDuty recommends that incident response training include regular exercises, not just documentation. Their open-source incident response guide explicitly covers on-call responsibilities, severity classification, and role definitions for major incidents. These practices transfer directly, with the added nuance that agent incidents require responders to understand that the system may be actively working around their containment measures.

Run the drill quarterly. Update the runbook with what you learn. The first time your runbook meets a real incident, it should not also be its first real execution.

What This Means for Operations Teams

Agent incident response is not a future problem. If you are running agent workflows in production today, you need a runbook today. The organizations building this capability now are the ones that will scale their agent programs without accumulating the operational debt that eventually forces a shutdown-and-rebuild.

AgentPMT's control plane provides the operational primitives that make runbook design practical. Budget controls enforce spending caps server-side — when a runaway agent hits its limit, the enforcement happens at the infrastructure level, not in the agent's prompt. The DynamicMCP server acts as a centralized policy enforcement point: revoking a tool or freezing a capability is a single operation that takes effect across all connected agents immediately. The mobile app puts incident response controls in the on-call engineer's pocket — pause an agent, adjust a budget, or revoke tool access from anywhere, without opening a laptop or VPNing into a production environment.

The gap between teams with structured incident response and teams improvising at 2 AM will widen as agent deployments scale. Building the runbook now is cheap. Building it during your first major incident is expensive, stressful, and usually incomplete.

What to Watch

Three developments are shaping how teams will handle agent incidents in the near term.

First, containment tooling is maturing. The gap between "we can monitor agents" and "we can stop agents" is closing. Research published in late 2025 demonstrated network-level kill switches for AI agents in Kubernetes environments, capable of detecting and containing runaway agents in under 500 milliseconds. Expect this class of tooling to become standard infrastructure, not research prototypes.

Second, the OWASP Top 10 for Agentic Applications is establishing a shared vocabulary for agent-specific risks. Cascading failures, tool misuse, identity abuse, and rogue agent behavior now have formal definitions and mitigation frameworks. This matters because it gives incident responders a common language and gives organizations a baseline for what "good" looks like.

Third, agent orchestration platforms are building incident response primitives directly into their control planes. The ability to freeze specific capabilities, revoke tool access, and enforce budget limits in real time -- without redeployment -- is moving from a differentiating feature to table stakes. Teams evaluating agent infrastructure should ask not just "can I monitor this?" but "can I stop it in sixty seconds?"

The broader trajectory is clear: agent incident response is becoming its own discipline, distinct from both traditional software incident management and AI safety research. The teams that develop this capability early will operate with confidence. The ones that wait will learn the hard way that agents are not software you deploy and forget -- they are systems you operate, continuously.

The teams that build incident response capability now will operate with confidence as their agent fleets grow. The ones that wait will learn the hard way that agents are not software you deploy and forget — they are systems you operate, continuously.

AgentPMT gives you the operational controls out of the box — budget enforcement, instant pause, tool revocation, and real-time monitoring across every connected agent. See how it works

Key Takeaways

Design your kill switch before the incident, not during it. Define tiered freeze scopes -- payments only, all writes, all external calls, full halt -- and make each one activatable by any on-call engineer without a code deployment.
Every postmortem must produce a permanent guardrail. A blameless writeup is necessary but not sufficient. The pipeline is not complete until a new constraint is implemented, tested, and verified against a replay of the original incident.
Agent incidents are structurally different from software incidents. The agent reasons about errors and adapts, making containment harder. Your response procedures must operate at layers below the agent's ability to negotiate -- network controls, credential revocation, and gateway policies, not just prompt-level instructions.

Sources

Google SRE Book - Managing Incidents - sre.google
Google SRE Workbook - Postmortem Culture - sre.google
Google SRE Book - Emergency Response - sre.google
PagerDuty Incident Response Documentation - response.pagerduty.com
PagerDuty Incident Response Docs (GitHub) - github.com
PagerDuty - Severity Levels - response.pagerduty.com
OWASP Top 10 for Agentic Applications - genai.owasp.org
OWASP - Agentic AI Threats and Mitigations - genai.owasp.org
Martin Fowler - Feature Toggles - martinfowler.com
Galileo AI - Why Multi-Agent AI Systems Fail - galileo.ai
Netflix Chaos Engineering - IEEE Spectrum - spectrum.ieee.org
ArXiv - AI Kill Switch for Malicious Web-Based LLM Agents - arxiv.org