Gartner sees explainable AI boosting model oversight

Mon, 30th Mar 2026

Gartner expects explainable AI to drive investment in large language model observability across 50% of generative AI deployments by 2028, up from 15% today.

The forecast signals a shift in how companies evaluate generative AI systems as adoption spreads across business functions. Gartner defines explainable AI as tools and methods that describe how a model works, reveal its strengths and weaknesses, predict likely behaviour, and identify potential bias.

Large language model observability refers to monitoring and analysing model behaviour once systems are in use. These tools now track issues such as hallucinations, bias and token use, extending oversight beyond standard IT measures like response times.

The area is becoming more relevant not only for teams building AI systems, but also for IT operations and site reliability engineering teams responsible for resilience and performance in production environments.

Gartner also expects global spending on generative AI models to rise sharply, forecasting the market will exceed $25 billion in 2026 and reach $75 billion by 2029.

That growth is increasing pressure on companies to show that AI-generated outputs can be checked and trusted. Hallucinations, factual errors and biased reasoning remain major concerns for organisations moving beyond trials and limited internal use.

Pankaj Prasad, Senior Principal Analyst at Gartner, said stronger controls will be necessary if businesses want to expand generative AI into more important processes.

"As enterprises scale GenAI, the trust requirement grows faster than the technology itself," Prasad said. "XAI provides visibility into why a model responded a certain way, while LLM observability validates how that response was generated and whether it can be relied on.

"Without robust XAI and observability foundations, GenAI initiatives will be restricted to low-risk, internal or noncritical tasks where output verification is easily managed or inconsequential, severely limiting the potential return on investment."

Trust layers

According to Gartner, explainability and observability are emerging as two linked layers of oversight. One focuses on why a model produced a response; the other tracks whether it continues to perform predictably over time.

That distinction matters as organisations move from experiments to systems used in customer service, knowledge work and decision support. In those settings, reliability and traceability are likely to matter more than raw model speed alone.

Prasad said monitoring priorities are also changing.

"Traditional observability is focused on speed and cost, but the priority is now moving toward deeper quality measures such as factual accuracy, logical correctness and sycophancy. This shift requires new governance-focused metrics and evaluation methods, such as human-in-the-loop validation of the generated content's narrative and citation accuracy," he said.

He added that explainability and observability together are central to wider deployment.

"Explainability turns a GenAI output into a defensible, auditable insight. LLM observability ensures the model behaves as expected over time. Without both, GenAI cannot mature beyond controlled lab environments," Prasad said.

Operational steps

Gartner recommends applying explainable AI tracing to high-impact use cases so the reasoning steps and source data behind outputs can be documented. It also calls for broader observability platforms that track latency, drift, token use, costs, error rates and output quality.

Businesses should also embed model evaluation metrics into continuous integration and continuous delivery pipelines. That would allow factual accuracy and safety checks to be tested before deployment rather than after problems emerge in live systems.

Another recommendation is to educate legal, compliance and other internal stakeholders on explainability requirements. Alignment on governance expectations and implementation challenges will be necessary as companies formalise the use of generative AI in regulated or higher-risk settings.

The forecast reflects a wider industry debate over whether current generative AI systems can be deployed at scale without stronger technical and governance controls. Gartner expects spending on observability to rise as companies seek evidence that outputs are accurate, traceable and suitable for business use.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google