What Agentjacking Is (A New Attack That Hijacks AI Agents)
In June 2026 a new attack technique aimed at AI coding agents was disclosed. Let's lay out what happened and how widespread it is.
Scale confirmed in the Agentjacking research
What Agentjacking Is (The New Attack Tenet Security Disclosed)
Agentjacking is a new attack technique that hijacks an AI coding agent with a single fake error report and runs attacker-controlled code on the developer's machine. The security firm Tenet Security disclosed it in June 2026. The targets are coding assistants like Claude Code, Cursor, and Codex that work autonomously while connecting to external tools.
At the root is a weakness: an AI coding agent cannot tell the difference between data it reads and an instruction telling it to act. It receives the text returned by an error-monitoring service the same way a developer would, then runs it without checking whether it is a real bug or a fake planted by an attacker.
a new class of attack 'Agentjacking' that hijacks AI coding agents into running attacker-controlled code on a developer's machine, triggered by a single fake error report and invisible to every security control. / AI coding agents cannot tell the difference between the data they read and an instruction to act. — from the definition of Agentjacking
The Scale of the Attack and the Agents It Targeted
How serious this attack is shows up clearly in the numbers. Tenet Security's research found 2,388 organizations with a DSN (the public key used to send errors) that an attacker could exploit. When the team actually sent fake errors, 85% of the widely used major agents—Claude Code, Cursor, and Codex—did exactly what the attacker told them to.
What is frightening is that the attack needs no real hacking. Nobody broke into a system or cracked a password. It works simply by sending a fake error to a public API that anyone can use, exactly as the API is meant to be used. The cause is not a bug in one particular piece of software but the very design in which an AI agent trusts and runs whatever text it receives from an external tool. That is why fixing any single product does not end it.
2,388 organizations exposed with valid injectable DSNs / an 85% exploitation success rate against injected errors, across the most widely-used agents on the market — from the research findings
How Agentjacking Works (Indirect Prompt Injection via a Fake Sentry Error)
Agentjacking works by using the public key of the error-monitoring service Sentry as its entry point. Let's walk through the steps in order.
How Agentjacking comes together
Why Sentry's Public Key (DSN) Gets Abused
The entry point is the public key Sentry calls a DSN. A DSN is a public, write-only credential that Sentry officially says is safe to embed in frontend JavaScript. It was meant only as a safe way to send errors.
But an attacker who has that public DSN can send a fake error event to Sentry's intake endpoint without any authentication. The contents of the event they send are entirely under the attacker's control and are indistinguishable from a real error. A key designed to be safely public becomes a foothold for the attack.
a public, write-only credential that Sentry intentionally documents as safe to embed in frontend JavaScript. / No authentication beyond the DSN is required. The attacker controls the entire event payload. — from the attack steps (Discovery / Event Creation)
How an "Error" Turns Into an "Instruction"
The planted fake error contains formatted markdown. When an AI agent queries Sentry to investigate an unresolved error, this event comes back through the integration layer. On the agent's side the markdown renders as structured content, indistinguishable from a real diagnostic guide.
As a result, the agent runs the attacker's commands believing they are legitimate steps to fix the bug. This is the technique known as indirect prompt injection—manipulating an agent with instructions slipped into the output of an external tool. The more you connect an AI to outside services, the more each trusted service becomes a possible entry point for instructions.
The injected event contains carefully formatted markdown in the message field and context key names. When the Sentry MCP server returns this event to an AI agent, the markdown renders as structured content. — from the attack steps (Markdown Injection)
Why a Prompt Cannot Stop It
It seems like a system prompt that says "ignore untrusted data" should prevent this. But testing showed the opposite. Tenet Security reports that even with explicit instructions through system prompts and skills, agents still ran the attacker's code.
In other words, writing a better prompt will not stop this attack. The weakness is not in any individual product but in the very way agents handle tool output. That is why defenses are needed at a layer other than written instructions.
Prompt-layer defenses failed. Agents executed the payload even when explicitly instructed – through detailed system prompts and skills – to ignore untrusted data. You cannot fix this with a better prompt. — from the point that prompt-layer defenses failed
Defending Against Agentjacking, and a Summary
Since a prompt cannot stop it, defenses have to be built into the mechanics. Let's lay out the steps developers and teams can take now, along with the tooling that is available.
Key defenses against Agentjacking
Defenses Developers and Teams Can Take
The most effective step is to require explicit human confirmation before installing a package or running a shell command. That alone breaks the fully automated execution Agentjacking depends on. Not leaving everything to the agent to run automatically is itself a defense.
On top of that, running the agent in a permission-restricted sandbox limits the information that can be stolen even if code does run. Limiting connected MCP servers to verified ones, and treating a tool's output as "potentially adversarial input" rather than a "trusted instruction," is the fundamental safeguard.
Requiring explicit confirmation before package installation or shell command execution removes the fully automated execution step on which agentjacking depends. / Most current agent implementations treat MCP-sourced content as authoritative data rather than as potentially adversarial input. — from the recommended defenses
Sentry's Response and the "agent-jackstop" Tool
Sentry, the provider, acknowledged the issue the same day it was reported on June 3, 2026. However, it declined a root-cause fix at the platform level, calling it "technically not defensible," and only added a filter that blocks a specific payload string. That is a reactive measure against a known attack string, and the path that allows the injection itself remains.
Meanwhile, Tenet Security, which disclosed the attack, released an open-source set of config files called "agent-jackstop" that hardens Cursor and Claude Code against it. It is offered as a drop-in setting that lowers the risk of ingesting untrusted logs and telemetry. When you read official sources or advisories, converting the page to markdown keeps the structure of steps and quotes intact and easier to follow.
Wrapping Up This Attack
Agentjacking is a prime example of indirect prompt injection that hijacks an AI coding agent with a single fake error report. Widely used agents like Claude Code, Cursor, and Codex reached execution at a high 85% rate, and prompt instructions were shown to be no defense. The more you bring AI agents into your work, the more essential it is to design around not trusting external tool output unconditionally.
From the standpoint of someone who builds AI agents into daily development, where to draw the line between the convenience of automatic execution and safety is a question worth keeping in mind. Starting with the single step of "have a person confirm before it runs" is a practical defense with outsized impact. The situation will keep moving, so check each source (the primary sources) in the body for accurate, up-to-date information.