Virtana launches system-wide observability for AI chaos
Virtana has launched an application observability product after research it commissioned found widespread instability in enterprise AI workloads and a growing gap between executive confidence and engineering experience.
A survey of 351 senior IT and technology leaders found 75% of enterprises reported AI job failure rates in the double digits, while a third said failures exceeded 25%. The study linked those rates to retries, wasted compute and production delays.
The research also showed differing views inside organisations on readiness for AI-scale operations. It found 59% of executives considered their organisations AI-ready, but 62% of practitioners reported fragmented observability and visibility gaps.
Fewer than half of practitioners said they were confident current observability tools could handle AI-scale workloads. Cost governance emerged as another area of misalignment, with a 16-point gap in confidence between executives (67%) and practitioners (47%).
Virtana said the results suggest existing operational approaches are struggling as AI workloads expand across hybrid and multi-cloud systems. "The data is unambiguous. While executive confidence is rising, operational fragility is rising faster," said Paul Appleby, Virtana's chief executive.
"When three-quarters of enterprises report double-digit AI job failure rates and one-third exceed 25%, the operating model is clearly outdated. At enterprise scale, these rates translate into thousands of failed executions per day, driving retries, wasted compute capacity, cascading delays, and escalating operational risk. As AI workloads expand and agentic systems begin operating autonomously, modest failure percentages compound into systemic volatility," Appleby said.
Operational strain
The findings also highlighted infrastructure and platform constraints that practitioners said were limiting AI deployments. In the survey, 56% cited storage and networking bottlenecks as their top AI constraint, and 41% reported GPU inefficiency and contention.
Container environments added to the operational burden, according to the research. It found 76% of practitioners experienced multiple container-related failures, and more than half reported three or more simultaneous failures.
Virtana's report described modern applications as distributed systems spanning services, clusters and cloud services, with storage and networks spread across hybrid and multi-cloud footprints. It argued that observability architectures have not kept pace.
"Practitioners are confronting unprecedented complexity as the definition of 'application' has fundamentally shifted from discrete code to distributed delivery systems," said Amitkumar Rathi, Virtana's chief product officer.
"Modern applications now span infrastructure, cloud platforms, Kubernetes, storage, networks, data pipelines, and AI workloads operating simultaneously. As organizations race toward AI adoption, these systems are scaling faster than operational models can support, exposing the limits of fragmented observability. At machine scale, teams cannot manage what they cannot see end-to-end, making continuous, real-time system context essential for reliable AI operations," Rathi said.
New product
Against that backdrop, Virtana introduced a new Application Observability offering. It said the product automatically correlates performance issues across applications, infrastructure, networks, storage and AI workloads, tracing failures from application code to underlying dependencies and identifying root cause without manual correlation.
Virtana positioned the launch as a shift away from code-centric application performance monitoring, arguing that legacy APM tools may flag slow transactions but often fail to pinpoint constraints when the cause sits in storage behaviour, network paths, Kubernetes resource pressure, noisy neighbours or GPU contention.
The product includes request flow visibility, service interaction monitoring, and correlation across downstream dependencies. Virtana also described a System Dependency Graph that maps relationships across applications, services, Kubernetes workloads, infrastructure, networks, storage and AI platforms.
Virtana said the product supports natural language analysis via its MCP Server and works with AI assistants including ChatGPT, Claude, Gemini and Copilot. It also includes transaction tracing, log correlation, synthetic monitoring and Kubernetes-aware observability features.
Appleby said the shift in application architecture has changed what IT teams need from observability tools.
"Mission-critical applications such as airline reservation systems, payment processing systems, health care delivery systems, and emergency dispatch are no longer just code, but complex systems spanning software, services, infrastructure, and AI workloads," said Paul Appleby, Virtana's chief executive.
"At this scale and complexity, legacy APM focused on code and human-only operations is no longer a credible way to understand how applications behave. Our research shows that this trajectory will accelerate as AI workloads, new dependencies, greater infrastructure strain, and failure modes that legacy tools cannot explain continue to multiply. The only viable path forward is open, agentic, system-level observability," Appleby said.
Customer view
NWN, which describes itself as an AI-powered technology solutions provider, backed the focus on full-stack visibility. "As a leading AI-powered technology solutions provider supporting more than 6,000 CIOs across public sector and enterprise organizations, we cannot operate with visibility that stops at the code," said Doug Syer, chief engineer for AI monitoring and observability at NWN.
"Modern applications are distributed systems, and performance constraints frequently originate in infrastructure, network, or platform layers that traditional APM was never designed to see. Virtana Application Observability offers true system-level visibility, correlating signals across the full stack, enabling the immediate transition from symptoms to evidence-backed root cause," Syer said.
Virtana said the new Application Observability offering is available immediately.