AI DevOps Agents Handle 35,000 Incidents a Month. Most IT Teams Can't Deploy Even One.

Microsoft's Azure SRE Agent went generally available on March 10. Internally, Microsoft runs a large fleet of these agents across its own services, where they mitigate 35,000 incidents every month. Each agent can restart services, correlate alerts, write diagnostic scripts, and generate post-incident reports — work that previously required a significant share of Microsoft's site reliability engineering capacity. Ecolab, an early external customer, reported that the agents cut daily performance alerts dramatically, reducing noise to a level that human teams could actually manage. Autonomous incident response is running in production at hyperscaler scale.

Within the same quarter, AWS shipped its DevOps Agent, which correlates signals across CloudWatch, Datadog, New Relic, and Splunk to pinpoint root causes during outages. AWS also open-sourced Agent Plugins for deployment automation, enabling infrastructure provisioning in under ten minutes through tools like Claude Code and Cursor. On March 16, Azure's Foundry Agent Service reached general availability with end-to-end private networking, durable orchestration, and human-in-the-loop approval workflows.

When three hyperscalers ship the same category of product within weeks of each other, the technology is no longer speculative. AI DevOps and IT automation AI have crossed from internal tooling into generally available cloud automation infrastructure. The capability question is answered.

The deployment question is not.

The Gap Between What Works and What Ships

Deloitte's 2026 Tech Trends report found that only 11% of enterprises actively run AI agents in production. A meaningful share are piloting, but the majority are still developing strategy roadmaps or have no formal strategy at all. The gap is not about interest — it is about readiness.

Microsoft's agents work. Amazon's agents work. The tooling is available, documented, and commercially supported. The obstacle is not capability — it is the operational infrastructure required to let autonomous software make decisions inside production environments.

The barriers are specific and architectural. Legacy systems built on batch ETL processes lack the real-time APIs that agents need to observe and act on system state. Data architectures designed for human querying — searchable by analysts, not parseable by autonomous systems — leave agents unable to access the information they need. Nearly half of respondents in the Deloitte survey cited data searchability and reusability as direct obstacles.

Then there is the governance question. An agent that can restart a failing service can also restart the wrong one. An agent with access to production telemetry has access to production data. The autonomy that makes these tools useful is the same autonomy that makes them dangerous without proper controls. And most organizations have no framework for managing what an autonomous system is allowed to do, how much it can spend, or how to audit what it did after the fact.

Gartner predicts that more than 40% of agentic AI projects will be canceled by the end of 2027, specifically because of legacy system incompatibility. The models will perform fine. The environments they are dropped into will not.

Toyota offers a counterexample. The company deployed an agentic system that replaced dozens of mainframe screens, handling processes from pre-manufacturing through delivery. But Toyota had the architectural preconditions: systems that could expose real-time state to autonomous actors. Most enterprises don't. As Intel's VP of AI Strategy Brent Collins put it: "Don't simply pave the cow path. Instead, take advantage of this AI evolution to reimagine how agents can best collaborate."

Telecom's Parallel Bet

The pattern extends beyond cloud IT. Telecommunications operators are rebuilding their networks around AI agents with even larger physical infrastructure at stake.

At Mobile World Congress in Barcelona, NVIDIA released three agentic AI blueprints for autonomous network management, including an open-source reasoning model fine-tuned on telecom operational data. Cassava Technologies is using the blueprints to build an autonomous network platform for Africa. Telenor Group became the first adopter of NVIDIA's BubbleRAN-enhanced configuration blueprint. NTT DATA is applying them to traffic regulation in Japan.

Google launched three new telecom AI agents — a Data Steward Agent for automated governance, Autonomous Network Agents for voice core management, and Operational Support Systems agents for network orchestration — with Deutsche Telekom and Vodafone as integration partners. Nokia joined a separate initiative for network-as-code natural language access.

The NVIDIA State of AI in Telecommunications survey found that network automation has overtaken customer experience as the top AI use case in telecom. Operators are not bolting AI onto existing network operations. They are designing network architectures where autonomous agents are the primary operators — what the industry calls "AI-native" networking. Sebastian Barros, Managing Director at Circles, described the shift as telecom evolving "beyond moving bits toward moving intelligence."

The convergence matters because it shows AI DevOps agents are not a cloud-specific phenomenon. Whether the infrastructure is virtual machines or cell towers, the operational logic is the same: autonomous systems observing state, diagnosing problems, and taking corrective action without waiting for a human to intervene.

The Security Dimension Nobody Staffed For

Every AI agent deployment is also a security deployment, and most organizations are treating it as an afterthought.

On March 20, Microsoft published an end-to-end security framework specifically for agentic AI, spanning identity governance through Entra, threat detection through Defender, and data protection across the agent lifecycle. Microsoft building an entire security architecture for AI agents — rather than extending existing frameworks — signals that the threat model is genuinely new. Agents that take autonomous actions create attack surfaces that traditional security tools were not designed to monitor.

Protos Labs launched Protos AI the same week, deploying specialized AI agents for cyber threat intelligence. The platform is vendor-agnostic, supporting Azure OpenAI, Anthropic, and Google Gemini, and builds persistent "organizational intelligence memory" that compounds across investigations. CEO Joel Lee argued that the next phase of AI cybersecurity depends on how effectively threat intelligence compounds across investigations — headcount alone cannot close the gap. Given a persistent global shortage of qualified cybersecurity professionals, the argument is hard to dismiss.

The security challenge compounds the governance problem. An enterprise deploying AI DevOps agents needs not only operational guardrails — what the agent can do, what it can spend, what gets logged — but also security controls that treat every agent as a potential attack vector. Credential management becomes critical: agents need access to production systems without directly holding sensitive secrets. Audit trails need to capture full request-and-response context, not just success-or-failure flags.

This is where the gap between vendor-specific tools and cross-platform reality gets consequential. Azure's SRE Agent works on Azure. AWS's DevOps Agent works on AWS. Microsoft's new security framework secures Microsoft's agents. But most enterprises operate across multiple clouds, and the governance, spending controls, and audit infrastructure they need cannot be locked to a single vendor's ecosystem. The organizations that solve cloud automation governance across their full environment — not just within one provider's walled garden — will be the ones that actually deploy at scale.

AgentPMT's architecture addresses this gap directly. The platform's audit system captures full request-and-response context for every agent action across any model provider. Budget controls enforce per-agent spending limits with hard caps. Credential management ensures agents never see sensitive secrets — payment information and access credentials are injected server-side, never entering the agent's context. These are the governance primitives that Deloitte's research identifies as missing from most organizations' agent strategies.

Start automating your workflows today.

Build your first agent in 60 seconds.

Browse agents

No credit card required.

What Readiness Actually Requires

The technology shipped this quarter. AI agents are already running IT operations at Microsoft, Amazon, and Google. Analyst firms expect autonomous decision-making by software agents to become routine in enterprise operations within two to three years.

The organizations that deploy will not be the ones with the most sophisticated models. They will be the ones that treated agent governance as AI infrastructure — real-time APIs for agent access, structured audit trails, cross-platform spending controls, and security architectures that account for autonomous actors. The prerequisite is not better AI. It is the operational scaffolding that lets AI operate under meaningful oversight.

The hyperscalers built their agents for their own environments. The rest of the market needs governance infrastructure that works across all of them. That difference — between capability and deployability — is where the next phase of AI infrastructure will be decided.

Sources

"Announcing general availability for the Azure SRE Agent" — Microsoft Tech Community
"AWS DevOps Agent helps you accelerate incident response" — AWS News Blog
"AWS Launches Agent Plugins to Automate Cloud Deployment" — InfoQ
"NVIDIA Advances Autonomous Networks With Agentic AI Blueprints" — NVIDIA Blog
"Survey Reveals AI Advances in Telecom" — NVIDIA Blog
"Google's newest AI agents bring telcos a step closer to autonomous network operations" — SiliconANGLE
"The Agentic Reality Check: Preparing for a Silicon-Based Workforce" — Deloitte Insights
"Secure agentic AI end-to-end" — Microsoft Security Blog
"Protos AI delivers agent-driven threat intelligence without vendor lock-in" — Help Net Security
"How Agentic AI Will Reshape Engineering Workflows in 2026" — CIO
"Azure Foundry Agent Service Hits GA" — Dev Journal

AI DevOps Agents Handle 35,000 Incidents a Month. Most IT Teams Can't Deploy Even One.