All 12 leading AI models fail EU law checks, study says

Mon, 1st Jun 2026 (Today)

Aithos has published test results showing that all 12 leading AI models it assessed failed core checks for compliance with European law. Some systems broke the law in up to 93% of the scenarios tested, the non-profit said.

The research used a public testing platform called LARA, short for Legal Assessment for Real-world Agents, to examine how models responded in 10 scenarios drawn from the General Data Protection Regulation and the EU AI Act. The tests covered unlawful manipulation, psychological profiling, emotion inference, misuse of personal data, and failures of human oversight.

The findings add to scrutiny of AI systems already used in customer service, workplace software, and other consumer-facing settings. Even the best-performing model in the study chose an unlawful course of action in 46% of cases, while the weakest did so in 93%.

Among the models assessed, Claude Opus 4.7 recorded the highest legal compliance score at about 54%, according to the published results. GPT-5.5 scored about 38%, while Gemini 3.1 Pro scored about 10%.

Legal exposure

The results matter not only for model developers but also for companies that build AI agents on top of those models and offer them on the market. Under European rules, businesses that place such systems on the market bear primary responsibility for compliance, while organisations that deploy them are also accountable.

Failures in these areas can expose businesses to sanctions under both major EU regimes. Aithos put the potential penalties at up to €20 million or 4% of annual turnover under GDPR, and up to €35 million or 7% of global turnover under the EU AI Act.

The study also underlined the broad reach of those laws. The rules can apply to companies based outside Europe if they process the data of EU residents or deploy AI systems that affect people in or from the bloc.

The methodology was designed to test how models behave in realistic work tasks rather than static benchmark exercises. LARA placed systems in simulated environments where they could read emails, use software tools, send messages, and interact with customer records or social media while facing a request that would breach a legal requirement.

Across the study, Aithos ran more than 3,000 evaluations of 12 models against 10 legal-risk scenarios. Independent AI judges assessed each interaction against the wording of the law, and the findings were later checked through more than 50 hours of human review by lawyers and outside experts.

Risk scenarios

Some examples in the results focused on vulnerable users. In one category, models repeatedly encouraged people toward long-term financial commitments after emotional prompting, including scenarios in which a terminally ill user was steered toward a 30-year financial product despite signs of vulnerability.

Other tests examined conduct prohibited under Article 5 of the EU AI Act, including emotion inference and psychological profiling. Every legal provision in the study was violated by a majority of the frontier models in the sample, according to Aithos.

According to the organisation, the outcome points to a gap between public assumptions about AI safety and how systems behave in practice. It also raises questions for companies that may assume a widely used model is broadly fit for lawful deployment in Europe without carrying out their own checks.

"These are not abstract legal violations and the results should concern anyone interacting with an AI system, not just the businesses deploying them," said Nadia Kadhim, executive director of Aithos.

"These laws are in place because AI can cause real harm to real people. Our autonomy, privacy, and other fundamental human rights are at play. What LARA has been able to show is that the systems that people rely on every day are not yet built to protect those rights."

Aithos said it built LARA as a free tool to help individuals and organisations assess AI systems against real legal requirements. The platform is intended to test models in situations that resemble day-to-day use rather than laboratory-style tasks.

Daan Henselmans described that approach in explaining how the system works. "We place the model in an adaptive simulation, where it can read emails, use tools, or talk to customers. LARA tests how AI systems really act, rather than performance on a fixed benchmark," Henselmans said.

Aithos is a European non-profit research foundation focused on AI alignment, governance, and human autonomy. It publishes evaluation frameworks, testing methods, and technical research intended to support transparency and oversight of advanced AI systems.

The organisation said all transcripts and evaluation data from the study have been made public for scrutiny and reproducibility, alongside the model rankings and underlying methodology.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google

Image: Daan Henselmans