The Agent Infrastructure
Conversation

Why your cloud architecture needs to change before your agents go to production

The Shift

From AI that answers → AI that acts

AI Assistants (Yesterday)
  • Answer questions
  • Summarize documents
  • Augment human decisions
  • Human remains in the loop
AI Agents (Now)
  • Take actions: write code, call APIs, trigger workflows
  • Operate autonomously across long time horizons
  • Orchestrate other agents
  • Human is out of the loop — by design
The infrastructure that worked for an AI assistant — a stateless API call — doesn't work for an agent that runs for hours, spawns subagents, and touches your production systems.

The Bottleneck

The problem isn't the model. It's the infrastructure around it.

Engineering leaders tell us the same things. The model works. The demo looks great. Then production hits three walls:

Security

  • Agents execute code and call APIs — who authorized that?
  • Non-human identities have no standard auth model
  • Prompt injection and tool misuse are unsolved at scale

Scale

  • Agents are long-running, stateful, concurrent
  • Traditional serverless is ephemeral — no memory, no persistence
  • 10 developers running 5 agents each = 50 concurrent workflows, all day

Governance

  • Which agents can access which systems?
  • How do you audit what an agent did and why?
  • MCP and other tool-calling protocols have no enterprise controls
These aren't AI problems. They're infrastructure problems. And they need infrastructure answers.

The New Architecture

What agent-native infrastructure actually requires

Compute
Persistent execution environments
Isolated sandboxes per agent / task
Version-controlled code artifacts
Concurrent workflow orchestration at scale
Security — Zero Trust for Agents
Identity for non-human workloads
Private networking between agents and services
Policy enforcement at every tool call
Auditable execution trails
Tooling
Persistent memory across agent sessions
Multi-model access (avoid vendor lock-in)
Voice / email / communication primitives
SDK that abstracts agent complexity
The architecture diagram looks simple. The integration is where teams get stuck. You need these three layers to talk to each other from day one — not stitched together later.

Cloudflare's Approach

Cloud 2.0 — Infrastructure designed for the agentic era

We call this Cloud 2.0. Not because the old cloud is going away — but because the old cloud wasn't designed for agents.

Edge-native by design
Agents run closest to where work happens — low latency, global distribution, no cold-start penalty for persistent workloads.
Zero-trust by default
Every agent action is authenticated and policy-governed. Non-human identities are first-class. Security isn't a layer you add — it's the substrate.
Integrated, not assembled
Compute, security, and tooling from one platform. No glue code. No three-vendor blame-shifting when something breaks.

The same properties that made Cloudflare the right choice for securing the web make it the right shape for agent infrastructure: global, secure, programmable at the edge.

Key Capabilities

Four pillars — and what they mean for your teams

Persistent, Isolated Compute

Agents run in sandboxed Linux environments with versioned code storage, capable of 50,000 concurrent workflows.

Your agents can run production workloads — not just demos. Scale is not a ceiling you'll hit on day one.

Zero Trust for Non-Human Identities

Machine-to-machine auth built on open standards (RFC 9728). Private networking connects agents to internal services without public exposure.

Your security team can approve agents for production. No more "we don't have an auth model for this."

Governed Tool Access (MCP)

Enterprise governance layer for Model Context Protocol. Define which agents can call which tools, with full audit logs.

When your compliance team asks "what did the agent do?" — you can answer.

Unified AI Platform + Memory

Access to 14+ model providers through a single interface, plus persistent memory that survives across agent sessions.

You're not locked into one model vendor. And your agents remember context — they get smarter over time, not just faster.

Enterprise Readiness

The questions your security and compliance teams will ask — answered

How do we control which agents can access which internal systems?
Private mesh networking keeps agent traffic off the public internet. Policy enforcement at the tool layer means access is granted per-capability, not per-agent broadly.
What happens when an agent is compromised or acts unexpectedly?
Isolated execution environments mean blast radius is contained by default. Every action is logged. You can audit, replay, and revoke.
Our compliance team will never approve a third-party system calling our internal APIs.
RFC 9728 Managed OAuth gives agents verifiable, revocable identities — the same model your compliance team uses for human SSO. Non-human identities are governed, not assumed.
This is what enterprise-grade agent infrastructure looks like

Getting Started

A 90-day pathway from exploration to production

01
Identify
  • Map existing AI usage and pain points
  • Pick one use case with clear success criteria
  • Engage security and compliance early
Clear scope — not "all agents"
02
Pilot
  • Stand up a sandboxed compute environment
  • Implement machine identity and audit logging from day one
  • Run the agent against a non-production system
Security team has signed off
03
Scale
  • Expand to production with governance controls in place
  • Connect persistent memory and multi-model access
  • Define runbooks for agent failures and escalations
Repeatable pattern for future agents
Most teams start with internal tooling — code review, documentation, ticket triage. High value, lower risk, and a natural on-ramp to the governance model you'll need for customer-facing agents.

Discussion

Three questions worth bringing back to your team

Where are your agents in the journey — demo, pilot, or heading toward production? What's the blocker?
Does your current cloud architecture have an answer for non-human identity and agent governance?
Which internal use case would deliver the most value in the next 90 days — and what would it take to get your security team to yes?