sakutto
Generative AI

What Is Agentjacking? The Attack That Hijacks AI Coding Agents

AgentjackingAI SecurityPrompt Injection

What Agentjacking Is (A New Attack That Hijacks AI Agents)

In June 2026 a new attack technique aimed at AI coding agents was disclosed. Let's lay out what happened and how widespread it is.

Scale confirmed in the Agentjacking research

2,388 orgs Organizations holding a valid, injectable DSN (public key) and exposed to the attack
85% Success rate against Claude Code, Cursor, and Codex
100+ cases Confirmed cases of an agent actually reaching execution (from large enterprises to solo developers)

What Agentjacking Is (The New Attack Tenet Security Disclosed)

Agentjacking is a new attack technique that hijacks an AI coding agent with a single fake error report and runs attacker-controlled code on the developer's machine. The security firm Tenet Security disclosed it in June 2026. The targets are coding assistants like Claude Code, Cursor, and Codex that work autonomously while connecting to external tools.

At the root is a weakness: an AI coding agent cannot tell the difference between data it reads and an instruction telling it to act. It receives the text returned by an error-monitoring service the same way a developer would, then runs it without checking whether it is a real bug or a fake planted by an attacker.

View official source →
a new class of attack 'Agentjacking' that hijacks AI coding agents into running attacker-controlled code on a developer's machine, triggered by a single fake error report and invisible to every security control. / AI coding agents cannot tell the difference between the data they read and an instruction to act. — from the definition of Agentjacking

The Scale of the Attack and the Agents It Targeted

How serious this attack is shows up clearly in the numbers. Tenet Security's research found 2,388 organizations with a DSN (the public key used to send errors) that an attacker could exploit. When the team actually sent fake errors, 85% of the widely used major agents—Claude Code, Cursor, and Codex—did exactly what the attacker told them to.

What is frightening is that the attack needs no real hacking. Nobody broke into a system or cracked a password. It works simply by sending a fake error to a public API that anyone can use, exactly as the API is meant to be used. The cause is not a bug in one particular piece of software but the very design in which an AI agent trusts and runs whatever text it receives from an external tool. That is why fixing any single product does not end it.

View official source →
2,388 organizations exposed with valid injectable DSNs / an 85% exploitation success rate against injected errors, across the most widely-used agents on the market — from the research findings

How Agentjacking Works (Indirect Prompt Injection via a Fake Sentry Error)

Agentjacking works by using the public key of the error-monitoring service Sentry as its entry point. Let's walk through the steps in order.

How Agentjacking comes together

1. Find a public DSN Locate the public, write-only key embedded in the frontend
2. Send a fake error POST an attacker-crafted error event to Sentry's intake with no authentication
3. Inject instructions as markdown Plant formatted markdown instructions in the error body
4. The agent executes The agent inspects the error and mistakes the instructions for legitimate fixes
5. Steal credentials Probe cloud and auth config files and exfiltrate them to the attacker

Why Sentry's Public Key (DSN) Gets Abused

The entry point is the public key Sentry calls a DSN. A DSN is a public, write-only credential that Sentry officially says is safe to embed in frontend JavaScript. It was meant only as a safe way to send errors.

But an attacker who has that public DSN can send a fake error event to Sentry's intake endpoint without any authentication. The contents of the event they send are entirely under the attacker's control and are indistinguishable from a real error. A key designed to be safely public becomes a foothold for the attack.

View official source →
a public, write-only credential that Sentry intentionally documents as safe to embed in frontend JavaScript. / No authentication beyond the DSN is required. The attacker controls the entire event payload. — from the attack steps (Discovery / Event Creation)

How an "Error" Turns Into an "Instruction"

The planted fake error contains formatted markdown. When an AI agent queries Sentry to investigate an unresolved error, this event comes back through the integration layer. On the agent's side the markdown renders as structured content, indistinguishable from a real diagnostic guide.

As a result, the agent runs the attacker's commands believing they are legitimate steps to fix the bug. This is the technique known as indirect prompt injection—manipulating an agent with instructions slipped into the output of an external tool. The more you connect an AI to outside services, the more each trusted service becomes a possible entry point for instructions.

View official source →
The injected event contains carefully formatted markdown in the message field and context key names. When the Sentry MCP server returns this event to an AI agent, the markdown renders as structured content. — from the attack steps (Markdown Injection)

Why a Prompt Cannot Stop It

It seems like a system prompt that says "ignore untrusted data" should prevent this. But testing showed the opposite. Tenet Security reports that even with explicit instructions through system prompts and skills, agents still ran the attacker's code.

In other words, writing a better prompt will not stop this attack. The weakness is not in any individual product but in the very way agents handle tool output. That is why defenses are needed at a layer other than written instructions.

View official source →
Prompt-layer defenses failed. Agents executed the payload even when explicitly instructed – through detailed system prompts and skills – to ignore untrusted data. You cannot fix this with a better prompt. — from the point that prompt-layer defenses failed

Defending Against Agentjacking, and a Summary

Since a prompt cannot stop it, defenses have to be built into the mechanics. Let's lay out the steps developers and teams can take now, along with the tooling that is available.

Key defenses against Agentjacking

Require human approval Make explicit confirmation mandatory before installing a package or running a command (most important)
Limit privileges Run the agent in a sandbox or isolated environment with minimal reach
Restrict connections Limit connected MCP servers to verified ones and do not over-trust their output
Add a hardening config Adopt settings that blunt the attack, such as Tenet's open-source "agent-jackstop"

Defenses Developers and Teams Can Take

The most effective step is to require explicit human confirmation before installing a package or running a shell command. That alone breaks the fully automated execution Agentjacking depends on. Not leaving everything to the agent to run automatically is itself a defense.

On top of that, running the agent in a permission-restricted sandbox limits the information that can be stolen even if code does run. Limiting connected MCP servers to verified ones, and treating a tool's output as "potentially adversarial input" rather than a "trusted instruction," is the fundamental safeguard.

View official source →
Requiring explicit confirmation before package installation or shell command execution removes the fully automated execution step on which agentjacking depends. / Most current agent implementations treat MCP-sourced content as authoritative data rather than as potentially adversarial input. — from the recommended defenses

Sentry's Response and the "agent-jackstop" Tool

Sentry, the provider, acknowledged the issue the same day it was reported on June 3, 2026. However, it declined a root-cause fix at the platform level, calling it "technically not defensible," and only added a filter that blocks a specific payload string. That is a reactive measure against a known attack string, and the path that allows the injection itself remains.

Meanwhile, Tenet Security, which disclosed the attack, released an open-source set of config files called "agent-jackstop" that hardens Cursor and Claude Code against it. It is offered as a drop-in setting that lowers the risk of ingesting untrusted logs and telemetry. When you read official sources or advisories, converting the page to markdown keeps the structure of steps and quotes intact and easier to follow.

Free ToolURL to Markdown ConverterConvert any public web page URL to Markdown. Preserves headings, tables, lists, and links — perfect for LLM and RAG preprocessing, research notes, and archiving web articles.Try it now →

Wrapping Up This Attack

Agentjacking is a prime example of indirect prompt injection that hijacks an AI coding agent with a single fake error report. Widely used agents like Claude Code, Cursor, and Codex reached execution at a high 85% rate, and prompt instructions were shown to be no defense. The more you bring AI agents into your work, the more essential it is to design around not trusting external tool output unconditionally.

From the standpoint of someone who builds AI agents into daily development, where to draw the line between the convenience of automatic execution and safety is a question worth keeping in mind. Starting with the single step of "have a person confirm before it runs" is a practical defense with outsized impact. The situation will keep moving, so check each source (the primary sources) in the body for accurate, up-to-date information.

Free ToolURL to Markdown ConverterConvert any public web page URL to Markdown. Preserves headings, tables, lists, and links — perfect for LLM and RAG preprocessing, research notes, and archiving web articles.Try it now →

FAQ

Q. What is Agentjacking?
Agentjacking is a new class of attack that hijacks an AI coding agent with a single fake error report and runs attacker-controlled code on the developer's machine. The security firm Tenet Security disclosed it in June 2026. It exploits the fact that an agent cannot tell the difference between data it reads and an instruction to act.
a new class of attack 'Agentjacking' that hijacks AI coding agents into running attacker-controlled code on a developer's machine, triggered by a single fake error report and invisible to every security control. Tenet Security blog
Q. Can a system prompt that says 'do not trust this' stop the attack?
No. In testing, agents ran the attacker's code even when they were explicitly told—through detailed system prompts and skills—to ignore untrusted data. Prompt-layer defenses do not stop it, and the researchers concluded you cannot fix this with a better prompt.
Prompt-layer defenses failed. Agents executed the payload even when explicitly instructed – through detailed system prompts and skills – to ignore untrusted data. You cannot fix this with a better prompt. Tenet Security blog
Q. How should developers and teams defend against it?
The most effective step is to require explicit human confirmation before installing a package or running a shell command. That breaks the fully automated execution Agentjacking depends on. Layer it with running the agent in a permission-restricted sandbox and limiting connected MCP servers to verified ones.
Requiring explicit confirmation before package installation or shell command execution removes the fully automated execution step on which agentjacking depends. Cloud Security Alliance research note

Related Tools

Related Tool Categories

Articles