Virtualisation Semiconductors Hyperscale

AMD pitches EPYC CPUs as orchestration engines for AI

Wed, 18th Mar 2026

AMD is positioning central processing units as an increasingly important part of AI data centres, arguing that newer "agentic AI" systems put more responsibility on CPUs to manage multi-step inference workflows and keep accelerators fully utilised.

It described agentic AI as a shift from single-response inference to sequences of actions, where software agents repeatedly call tools, query memory, invoke APIs, and loop back to models for updated output. Those patterns can increase orchestration work in a cluster, which often runs on the host CPU alongside operating system and data-handling tasks.

At AMD's Advancing AI event last June, CEO and Chair Dr. Lisa Su described agentic AI as "a new class of user: systems that are always active, continuously accessing data, applications, and services to make decisions and complete complex tasks."

AMD argues that this always-on behaviour increases the need for balanced system design, with CPUs, GPUs, networking, and software each handling distinct parts of an AI deployment. GPUs remain the primary engines for training and many inference operations, but production deployments also depend on scheduling, data preparation, memory and input-output management, and control flow.

CPU orchestration

Modern AI clusters typically pair accelerators with server CPUs that handle host-side processing. That includes feeding data to GPUs, coordinating data movement between memory and devices, and managing the software stack that dispatches work to accelerators. AMD also highlighted the need to run conventional enterprise applications alongside AI models in production, adding further demands on the CPU layer.

AMD compared the CPU-GPU relationship to coaching: the CPU directs the overall play while GPUs execute parallel compute tasks. The framing reflects common data centre architectures, where GPUs run matrix-heavy neural network operations and CPUs handle serial tasks, systems management, and integration with external services.

Training and inference

AMD distinguished between training and inference workloads. Training relies heavily on GPU throughput and typically runs on large batches of data with tight, repeatable computational patterns. In that setting, CPUs run the operating system, manage memory, schedule tasks, and feed data to GPUs.

Inference can look different, particularly when systems combine retrieval, tool use, and decision logic with model output. AMD argued that this trend elevates the CPU's role in collecting data, routing information, interpreting results, and deciding next actions. Agentic workflows can add more loops, increasing the amount of control and coordination required alongside core model compute.

AMD positioned its EPYC server CPUs as a key component of what it calls "balanced, open AI infrastructure," alongside Instinct GPUs, Pensando networking products, and the ROCm software stack. Together, the components span the compute, accelerator, network, and software layers of a cluster.

Benchmark claims

AMD highlighted performance and efficiency comparisons with Nvidia's Grace CPU Superchip. It estimated that a 5th Gen EPYC CPU-based system delivers up to 2.1 times higher performance per core than comparable Grace-based systems, and up to a 2.26 times uplift on SPECpower, which measures operations per watt.

The figures referenced public and estimated results, pointing to SPEC CPU and SPECpower submissions for EPYC systems, alongside an Nvidia claim for Grace performance. AMD did not provide a like-for-like bill of materials for the systems in its summary, and results can vary with configuration choices and software settings.

x86 and ecosystems

AMD also emphasised that EPYC uses the x86 architecture, which it said gives customers access to a broad software ecosystem, with many enterprise workloads already running across on-premises and cloud environments. It contrasted that with the work sometimes required to introduce Arm-based systems, such as refactoring or maintaining multiple code bases.

AMD also pointed to its chiplet approach as a way to tune compute, I/O, and memory bandwidth. It said the modular design provides flexibility across use cases, from traditional enterprise applications and virtualisation to GPU orchestration and multi-step AI workflows.

Next platforms

AMD said its next-generation EPYC CPUs, codenamed "Venice," are positioned to run its upcoming "Helios" rack-scale AI architecture. It expects "Venice" to improve performance, density, and energy efficiency for AI and general-purpose workloads.

AMD linked these plans to rising compute demand and a broader server refresh cycle. It framed agentic AI as one driver, arguing that it increases the amount of non-model work around AI systems and, in turn, the importance of CPU resources and overall system design.

"Dr. Lisa Su described agentic AI as a new class of user: systems that are always active, continuously accessing data, applications, and services to make decisions and complete complex tasks."

The next EPYC generation and the Helios rack design would sit alongside Instinct GPUs and the ROCm software stack as AMD develops its approach to AI infrastructure.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google