Cloud Security Application Security SOCs

Claude Code flaw leaves deny rules vulnerable in long workflows

Wed, 8th Apr 2026

Anthropic is facing fresh scrutiny from security researchers and vendors after a leaked version of its Claude Code agent revealed a flaw that let deny rules degrade under complex workloads.

The vulnerability affected how Claude Code enforced its own safety controls when handling long chains of subcommands in automated workflows.

Researchers who examined the leaked source code found a hard limit on the number of security checks the agent performed. After 50 subcommands, Claude Code stopped enforcing deny rules directly and instead shifted to asking the user for approval rather than blocking risky actions.

This raised concerns among security specialists focused on AI-assisted development and CI/CD pipelines. Many production environments already rely on auto-approvals and non-interactive modes, reducing the value of user confirmation as a safeguard.

The issue also emerged against broader concern over the Claude Code leak itself. The exposed files described internal logic for permission handling, prompt-injection defences, and workflow orchestration inside one of the most widely deployed AI coding agents.

Vendors in adjacent security markets said the incident highlighted structural weaknesses in how organisations govern agentic AI systems, rather than a single implementation error.

Deny Rules Under Strain

Experts who analysed the leaked code and the vulnerability said the incident was a warning against relying on agent-level configuration alone. Complex workflows, they argued, can exhaust or bypass internal safety routines, especially when agents execute long-running task chains.

Some security leaders said enterprises should not depend solely on built-in safety prompts and deny lists exposed through configuration screens. Instead, they called for controls that treat AI agents as untrusted components interacting with sensitive data and production infrastructure.

Gidi Cohen, Chief Executive Officer and Co-Founder of Bonfy.AI, said the incident showed the limits of inline safety checks when they sit too close to the agent's own logic.

"The Claude Code deny-rule bypass is a clear illustration of why enterprises need independent, data-centric controls around AI agents, not just configuration toggles and inline "safety" prompts. When a hard cap on security subcommands causes deny rules to degrade into "just ask the user," any environment with auto-approval patterns or alert fatigue effectively converts a supposed block into a soft suggestion that attackers can route around via prompt injection."

"Thoughtful AI governance should then treat agent-level permissions and deny lists as one layer of defense, and prioritize content-aware guardrails that monitor what agents read, transform, and emit across tools, pipelines, and channels, regardless of which framework, version, or built-in safety parser is in play."

The flaw also intersected with how organisations structure their CI/CD systems. Many use non-interactive builds in which no human operator reviews an AI agent's proposed actions before execution.

Yagub Rahimov, Chief Executive Officer of Polygraf AI, said CI/CD environments created a particularly exposed attack surface.

"The risk in this vulnerability lies at the CI/CD pipelines running in non-interactive mode with no human approval layer, where a poisoned config file could instruct the agent to exfiltrate credentials via curl or pivot to internal services inside what looks like a normal build. None of it trips an alert. The problem is that Anthropic already has the fix internally. But attackers exploit the gap between a working mitigation and a shipped public build."

"The defence has to sit at the input/output layer, inspecting what goes into the model and what comes out before any command reaches execution. SLM-powered guardrails that flag suspicious input patterns, including prompt injection attempts embedded in config files, can catch this class of attack before the subcommand chain ever runs."

Blueprint for Attackers

The leaked Claude Code repository contained internal system prompts and logic for handling prompt-injection attempts and permission checks. Security analysts said this gave adversaries a reference point for testing and refining attack techniques.

Enterprise security vendors stressed that the leak did not expose model weights, proprietary training data, or direct customer information. It did, however, document how Anthropic structured some of the tool's safety logic and workflow orchestration.

Melissa Bischoping, Senior Director of Security & Product Design Research at Tanium, described the leak as more of an operational map of Claude Code than a direct breach of core intellectual property.

"Source code leaks often evoke fear of proprietary information and crown jewels being exposed. While the Claude Code leak does present real risk, it is not the same as model weights, training data or customer data being compromised. What was exposed is something more like an operational blueprint of how the current version of Claude Code is designed to work."

"Although the leaked files may have been taken offline from their original source, the code has persisted across the internet in repositories with over 1,900 forks. Security researchers, adversaries, and curious tinkerers have been diving into the contents, some of which are humorous and relatable to anyone who's ever written a line of code. Other pieces, however, create a blueprint for understanding Claude Code's design logic. It is not a foolproof roadmap to exploitation, but it is meaningful insight into how the tool handles inputs, enforces permissions and resists abuse."

"Real technical controls to include sandbox isolation, policy enforcement and other traditional protections do exist, but a meaningful portion of the defences rely on model judgment rather than deterministic controls. The actual system prompt text for handling prompt injection begins: "If you suspect ... " The pattern shows up throughout: 'be careful,' 'use judgment,' 'flag to the user,' which is a shift away from how we'd typically block or validate. In my analysis, I prompted Claude to assess its security posture in light of the leaked code. It characterised this approach as "defence-by-vibes." Maybe Claude is onto something, for better or worse, and we need to recognise that model judgment will be a factor in how we secure and defend systems going forward."

"Another layer of risk from this leak is that adversaries may use the blueprint to build lookalikes that appear and behave like Claude Code on the surface but install malware or harvest credentials and data."

"For enterprises, the exposure is fundamentally about visibility. Your adversaries now have detailed knowledge of one of the most widely deployed AI tools on the market and will be probing it for weaknesses. The defensive response is straightforward to describe but harder to execute: know where these tools are running, what data they can access, and which identities and tokens they are associated with, and establish governance that improves iteratively. These technologies are advancing faster than security programs can keep pace with. We don't have the luxury of waiting for the dust to settle before getting a handle on them," said Bischoping.

Agentic Security Shift

The debate around Claude Code's permissions logic is unfolding alongside broader moves in the security industry toward autonomous or semi-autonomous AI agents in security operations centres.

Anthropic's Claude Mythos offering has become one focal point in that discussion. Security practitioners are examining how agents that navigate complex environments and decide on action sequences alter both defensive and offensive dynamics.

Hanah-Marie Darley, Chief AI Officer and Co-Founder of Geordie AI, said security automation is moving away from fixed playbooks toward systems that interpret and act with greater independence.

"It looks similar to existing SOC analysts and triage agents at first glance, but the impact on the market is different in a few important ways. Most security automation today is built around structured workflows. You define how alerts are handled, how enrichment occurs, and where decisions are made. Even when AI is involved, it tends to operate inside that structure. What's being described here moves decision-making out of predefined flows and into systems that can explore, interpret, and decide as they go. That changes how these systems are built and how much you can rely on them behaving consistently."

"There's also a difference in what they can actually see. Traditional SOC tooling works off logs, alerts, and telemetry. This kind of system works directly on the environment itself. It can read code, interpret how systems are put together, and reason across components that don't naturally produce signals. You're no longer limited to what's already surfaced."

"Risk starts to look different as well. In structured workflows, mistakes tend to remain contained within a step or decision point. In systems like this, behaviour builds across actions. Context is carried forward, reused, and adapted. A decision in one place can influence what happens later in ways that aren't always obvious. That makes outcomes harder to reason about and harder to contain."

"It also changes how vendors compete. Detection quality, signal coverage, and response speed have long been the focus. Those still matter, but they don't fully describe what's happening here. As systems like this do more of the work, the question becomes how well you can understand and manage behavior across a workflow, not just how quickly you can detect an event."

"Most existing controls are built around individual actions: allow, block, alert. That works when actions are clearly defined and bounded. It becomes less effective when behaviour emerges across a sequence of decisions. You're no longer just asking whether an action was allowed, but whether the overall behaviour makes sense. This is the challenge with AI being a dual-use technology in that the advancements for defenders naturally translate to advancements for attackers as well, especially in accessible tools like Claude/frontier models," said Darley.