IT Brief Ireland - Technology news for CIOs & IT decision-makers
Cinematic worldmap glowing datacenter network low latency ai

Akamai unveils Inference Cloud built on Nvidia AI Grid

Tue, 17th Mar 2026

Akamai has introduced Akamai Inference Cloud, which it describes as the first global-scale implementation of Nvidia's AI Grid reference architecture for distributing AI inference across data centres and edge infrastructure.

The launch outlines a model for running inference workloads in a more distributed way than traditional centralised deployments. Nvidia's AI Grid reference architecture is designed to spread inference work across multiple locations, including edge sites, to reduce latency and improve cost efficiency.

Akamai's approach combines GPU capacity with orchestration software that routes requests across its footprint. The platform integrates thousands of GPUs and uses a network of more than 4,400 edge locations alongside regional cloud sites and core data centres.

Inference is the stage of AI where a trained model generates outputs in response to live prompts or data. It has different infrastructure demands from training, which often concentrates large amounts of compute in one place for extended periods. Inference workloads can vary widely in size and urgency, and many applications depend on response times measured in milliseconds.

Akamai framed the release as a shift away from "isolated AI factories" towards a unified inference grid. The message aligns with a broader industry push to place compute closer to end users and data sources, particularly for interactive and time-sensitive services.

Use cases

The distributed design targets real-time applications where delays can harm user experience or reduce accuracy. Akamai cited examples such as non-player character interactions in games, fraud detection, and live media processing.

These workloads require fast decision-making at scale. Gaming interactions often need consistent low latency across regions. Fraud detection depends on rapid scoring of transactions and behavioural signals. Live media processing can involve real-time analysis or transformation of video streams, which may be sensitive to network conditions and capacity constraints.

The routing layer is central to the pitch. It directs inference requests across the network based on factors such as proximity and available resources. A distributed approach can also improve resilience, allowing traffic to shift when a location faces congestion or reduced capacity.

Recent moves

The introduction of Akamai Inference Cloud follows a series of announcements tied to the company's AI inference strategy. Akamai launched its Inference Cloud platform last October and has since outlined plans to expand GPU resources and build a globally distributed compute grid for inference workloads.

Akamai has also highlighted a four-year service agreement valued at USD $200 million with a major US technology company. Under the deal, it will deploy Nvidia Blackwell GPU clusters. Separately, it has said the Inference Cloud platform is powered by Nvidia Blackwell AI architecture.

Together, these steps underscore the growing focus on inference infrastructure as AI adoption moves beyond pilots and into production. Many organisations are now weighing where inference should run, balancing response time, cost, data movement, and operational complexity.

Akamai is best known for content delivery and edge services, and it operates a large distributed network designed to place compute and storage close to internet users. That footprint is becoming a strategic asset as more applications demand local processing. It also puts Akamai in more direct competition with hyperscale cloud providers and specialist AI infrastructure firms that are building GPU capacity and edge-adjacent offerings.

Nvidia's AI Grid reference architecture provides a framework that can map onto such networks. The core idea is that inference does not always need to run in one central region; workloads can shift between core data centres and edge locations depending on performance requirements and resource availability.

Akamai has not disclosed how many GPUs are integrated into the service or how capacity is distributed between edge locations, regional clouds, and core sites. It has also not detailed pricing, customer availability, or service-level terms.

As demand for real-time AI grows across media, gaming, and financial services, Akamai is expected to provide more information on deployment patterns and customer adoption of Akamai Inference Cloud.