Akamai rolls out NVIDIA-powered AI Grid at the edge
Akamai has launched AI Grid, an intelligent orchestration system for its Akamai Inference Cloud. It is positioning the service as the first global-scale implementation of the NVIDIA AI Grid reference architecture, spanning more than 4,400 edge locations.
The launch expands Akamai Inference Cloud, introduced late last year. The platform is designed to run AI inference across edge, regional, and core infrastructure, with routing decisions made in real time.
Akamai has integrated NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs into its global network, creating a distributed inference grid aimed at workloads that require low latency and consistent response times.
Orchestration Layer
The centrepiece is an orchestrator that brokers AI requests. Akamai describes it as workload-aware, routing requests across compute tiers based on demand and location.
Akamai also linked the system to the economics of inference, using the term "tokenomics" to cover measures such as cost per token, time-to-first-token, and throughput. It says the orchestration layer uses techniques including semantic caching and intelligent routing.
Akamai's edge footprint is central to the architecture. Customers can access fine-tuned or sparsified models through the edge network, rather than relying on centralised infrastructure for every request.
Beyond the edge, Akamai is rolling out multi-thousand GPU clusters built on the same NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. It is positioning these clusters for larger, sustained workloads, including multimodal inference and continuous post-training.
Edge To Core
The platform uses NVIDIA AI Enterprise software and the NVIDIA Blackwell architecture, and it uses NVIDIA BlueField DPUs for networking and security.
At the edge, the system uses semantic caching and serverless services, including Akamai Functions (WebAssembly-based compute) and EdgeWorkers. Akamai says these components are designed to keep performance stable close to where users connect.
In core cloud infrastructure, Akamai says its IaaS environment provides a foundation for large-scale workloads. It also offers dedicated GPU-cluster "pods" for more compute-heavy requirements.
"AI factories have been purpose-built for training and frontier model workloads - and centralised infrastructure will continue to deliver the best tokenomics for those use cases," said Adam Karon, Chief Operating Officer and General Manager, Cloud Technology Group, Akamai.
"But real-time video, physical AI, and highly concurrent personalised experiences demand inference at the point of contact, not a round trip to a centralised cluster. Our AI Grid intelligent orchestration gives AI factories a way to scale inference outward - leveraging the same distributed architecture that revolutionised content delivery to route AI workloads across 4,400 locations, at the right cost, at the right time," Karon said.
Use Cases
Akamai linked the distributed approach to emerging applications, including real-time AI agents, gaming scenarios that depend on quick response times, and fraud detection and personalised banking services.
It also highlighted live video processing, translation and dubbing, and in-store retail AI and commerce applications-workloads where delay and congestion can undermine user experience.
Akamai said early adoption has come from gaming, financial services, media and video, and retail. Examples include sub-50-millisecond inference for non-player character interactions in games and real-time dubbing for broadcasters.
Akamai also pointed to commercial validation from technology providers, citing a $200 million, four-year service agreement covering a multi-thousand GPU cluster in a data centre built for enterprise AI infrastructure at the metro edge.
NVIDIA Partnership
NVIDIA framed the move as part of a broader shift in where AI workloads run, as inference volumes rise and more applications require predictable latency.
"New AI-native applications demand predictable latency and better cost efficiency at planetary scale," said Chris Penrose, Global VP - Business Development - Telco at NVIDIA. "By operationalising the NVIDIA AI Grid, Akamai is building the connective tissue for generative, agentic, and physical AI, moving intelligence directly to the data to unlock the next wave of real-time applications," Penrose said.
Akamai cast the announcement as a response to the limits of a model built around a small number of large, centralised GPU clusters, arguing that earlier waves of internet services faced similar scaling pressures as traffic and user expectations increased.
Akamai said it will continue expanding its distributed inference footprint across edge, regional, and core locations, using orchestration to route workloads based on latency requirements and available resources.