← Back to Attack Research

The lethal trifecta is an architecture problem: reading OWASP's State of Agentic AI v2.01

This week OWASP published the State of Agentic AI Security and Governance, version 2.01. The first version, a year ago, treated autonomous agents as an emerging exposure worth watching. The new one reads a full year of field evidence and lands on a blunter message: agentic AI is no longer hypothetical, there are now production incidents, vendor advisories, and CVEs for nearly every class of agentic risk the project tracks, and the single hardest problem, prompt injection, remains unsolved. The report's coverage notes that prompt injection maps to six of the ten categories in the Top 10 for Agentic Applications. That is not ten separate problems. It is one problem wearing ten hats.

For defenders the useful takeaway is not a longer checklist. It is a reframing. The most dangerous agentic risks all reduce to the same small set of architectural properties, and once you see them that way, you stop trying to filter your way to safety and start designing the danger out. This is a read of what the lethal trifecta and the Agents Rule of Two mean in practice, and how to audit an agent deployment with evidence.

Why prompt injection cannot be filtered away

The instinct, when you first meet prompt injection, is to treat it like cross-site scripting: build a good enough input filter, sanitize the untrusted text, and the problem goes away. It does not, and the reason is structural. A language model has no reliable separation between instructions and data. Everything it receives, the developer's system prompt, the user's question, a web page it fetched, a row it read from a database, an email it summarized, arrives as one stream of tokens, and the model decides what to obey based on meaning, not on a trust label. There is no equivalent of a parameterized query that keeps the developer's intent and the attacker's text in separate channels.

That means any content the agent ingests is potentially an instruction. An attacker who can get text in front of the model, directly by typing it or indirectly by planting it somewhere the agent will later read, can attempt to redirect the agent's behavior. A filter raises the cost of the obvious payloads and does nothing for the clever ones, because the space of ways to phrase an instruction is unbounded. The report's stance, echoed by the researchers behind it, is the honest one: treat prompt injection as unsolved and design as if the model will sometimes be successfully steered.

The lethal trifecta: three properties, not ten risks

The clarifying idea, which the report builds on, is the lethal trifecta. An agent becomes capable of being turned into an exfiltration or sabotage tool by a single injected instruction when it combines three properties:

Any one of these alone is harmless. Any two are usually manageable. All three together are the trap, because now a single piece of attacker-controlled content can instruct the agent to read the private data and send it somewhere the attacker controls, and the agent has every capability it needs to comply. The report maps its three highest-priority risks, goal hijack, tool misuse, and identity-and-privilege abuse, directly onto these conditions. That mapping is the gift. It tells you that auditing an agent is not about scoring ten abstract risks. It is about finding the deployments that hold all three properties at once.

The Agents Rule of Two: spend the budget, keep a human at the limit

The complementary design rule, which the report discusses, treats the three properties as a budget. An agent operating without a human in the loop is allowed to satisfy at most two of the three within a session. Combining all three requires a human approval step. The framing is deliberately simple so that engineers can apply it without a threat-modeling PhD: untrusted input plus private data is fine if the agent cannot send anything out; private data plus external communication is fine if everything it touches is trusted; untrusted input plus external communication is fine if it holds no secrets. Want all three? Put a person at the gate.

This is the same defensive instinct that runs through every good security architecture: when you cannot make a component trustworthy, constrain what it is allowed to do so that being wrong is not catastrophic. You cannot make the model immune to being talked into something. You can absolutely ensure that the agent which reads the attacker's web page is not the same agent, in the same session, that holds the database credentials and can reach the open internet.

How to audit an agent deployment, evidence-first

The trifecta turns an abstract worry into a concrete inventory question. For each agent in your estate, you establish three facts, and the finding is the intersection. None of this requires firing a malicious prompt.

# For each deployed agent, inventory the three properties (config review):
#   PRIVATE   - which data sources / tools can it read? any sensitive?
#   UNTRUSTED - which inputs can carry content you do not control?
#               (web fetch, email, tickets, user text, third-party data)
#   EGRESS    - can it make outbound calls / write externally / message out?
#
# FINDING posture: an agent that holds ALL THREE in one session with no
#   human approval at the boundary. Two-of-three is a PASS to record.

The second axis is identity. The report elevates identity-and-privilege abuse to a top-three risk for good reason: an agent acts with whatever credentials you handed it, and over-broad agent identity turns a successful injection into a much larger blast radius. So for each agent you also confirm what it can do as itself.

# Agent identity / privilege review (config + grants):
#   - Does the agent run with least privilege, or with a broad token?
#   - Are its tool permissions scoped to the task, or wide open?
#   - Is there a human-approval gate before any high-impact tool call?
# A broad-token agent inside the lethal trifecta is the high-severity case.

The third axis is the indirect surface, which is the one teams forget. Map every channel through which untrusted content can reach the agent's context, including the quiet ones: a document the agent retrieves, a database row a low-trust form can write, a calendar invite, a code comment. The indirect path is where the dangerous injections live, because it is invisible to anyone watching only the chat box.

How we approach it, and why the boundary is the product

We treat agent security as a property of the architecture, not the prompt. When we assess an agentic deployment we inventory the three trifecta properties per agent, we read the agent's identity and tool scopes, and we map the indirect-injection channels, all from configuration and served artifacts. The finding we raise is structural: this agent reads untrusted content, holds private data, and can reach the outside, in one session, with no human at the limit, and here is the credential scope that sets the blast radius. The remediation is architectural too: split the agent, narrow the identity, gate the third capability behind approval. When a deployment already keeps to two of three, or already puts a human at the boundary, we record a PASS, because an agent designed inside the rule is not a finding. The lesson the report drives home is the one we keep returning to across every domain: you do not secure an untrusted component by asking it to behave. You secure it by constraining what its misbehavior can reach.

What this report says about the year ahead

The most quoted line from the coverage is that prompt injection still drives most agentic failures in production, and only a minority of organizations can even detect the unsanctioned agents already running in their environment. Both halves of that matter. The first says the core problem is not getting solved at the model layer, so the defense has to live at the architecture layer. The second says many of the riskiest agents are deployed without anyone in security knowing, which means step zero is discovery: find the agents, then apply the trifecta test to each one. The technology will keep moving and the next framework will ship the same mistakes, because the mistakes are structural. The defensible pattern is stable and it predates language models by decades: keep untrusted input, sensitive data, and external reach from meeting in one unconstrained place. We audit for exactly that meeting, and we prove which agents keep the three apart.

Sources

Get your exposure check: full report in 4-24 hours

Real assessment on production-grade infrastructure. We prove what is exploitable and attach the fix. Paying customers get priority capacity.

Queue My Assessment