Upwind finds prompt detection can run under millisecond
Upwind has published research on a system for detecting malicious prompts aimed at large language models. It reported about 95% precision with sub-millisecond inference using Nvidia technology.
The work addresses a growing security issue as companies put generative AI tools into production and expose systems to attacks written in natural language rather than code. Upwind argues that existing security controls are poorly suited to prompt injection, jailbreak attempts, data exfiltration and social engineering carried out through model inputs.
The research outlines a three-stage architecture that screens traffic first, then applies deeper analysis only where needed. The goal is to limit the cost and delay of sending every request to a large reasoning model.
In the first stage, a lightweight classifier determines whether a request is actually heading to a large language model. This filtering step ran in under a millisecond and reached 99.88% accuracy in Upwind's testing, allowing the wider system to avoid semantic inspection of traffic unrelated to AI models.
The second stage focuses on semantic threat detection. Requests identified as LLM-bound are analysed using Nvidia's nv-embedcode-7b-v1 model through NVIDIA NIM microservices, which performed best in Upwind's tests at separating benign prompts from malicious ones, including indirect jailbreaks and prompt injection attempts.
According to Upwind, this second layer reached 94.53% detection accuracy while keeping inference times below 0.1 milliseconds. Those figures underpin its argument that prompt screening can be carried out on live production traffic without creating an operational bottleneck.
A final validation stage is used only for cases classed as high-risk or ambiguous. Those prompts are escalated to Nvidia's Nemotron-3-Nano-30B model, combined with NVIDIA NeMo Guardrails, to verify findings, reduce false positives and generate explanations tied to security frameworks.
Language threat
The broader argument behind the research is that AI systems change the nature of application security because the attack surface becomes language itself. Rather than exploiting software flaws directly, attackers can try to manipulate a model's interpretation of intent, persuade it to ignore rules, or coax it into disclosing data.
That shift matters because many corporate AI deployments now sit inside customer support, internal search, coding assistance and workflow automation. In those settings, a malicious prompt may not appear suspicious to conventional network or application controls, even though it could trigger harmful actions or reveal sensitive information.
Upwind's aim is to treat a malicious prompt not as an isolated model event but as part of a wider cloud security picture. By embedding detection into its runtime and cloud visibility platform, flagged prompts can be linked to broader operational context, such as workload behaviour and environment activity.
Selective escalation is central to the design. Instead of relying on one large model for every decision, the system reserves more intensive reasoning for a small portion of requests, which helps balance accuracy with throughput.
That routing approach reflects a broader industry effort to make AI safeguards practical in production settings. Security teams have often struggled to apply heavy inspection methods in high-volume environments because even small delays can affect user-facing systems and increase compute spending.
Upwind's findings also highlight the role of infrastructure providers in the emerging AI security market. Nvidia's models and deployment tools are positioned here not just for building AI applications, but also for inspecting and validating the traffic those applications receive.
For cloud security vendors, the commercial opportunity lies in translating prompt-level analysis into standard security workflows. Enterprises typically want alerts to feed into existing investigation and response processes rather than sit in a separate AI monitoring console.
"LLMs don't just process input, they interpret intent," said Mose Hassan, VP Research & Innovation at Upwind. "That changes the security model entirely. Organisations aren't just trying to block bad code anymore, they have to stop attempts that twist language and manipulate systems. Our research with NVIDIA shows you can do that effectively in live production environments without slowing things down or driving up costs."