What Is Project Fetch — An AI Autonomous Robot Control Experiment
Project Fetch — Progress from Phase 1 to Phase 2
Project Fetch is an experiment conducted by Anthropic's Frontier Red Team (safety research team) to measure AI's ability to autonomously control robots. Using an off-the-shelf robotic quadruped (robot dog), the experiment tests whether an AI model can operate the robot without human assistance, in a staged evaluation across multiple phases.
The experiment was designed not as a marketing exercise, but as a red team effort to assess AI capabilities and limitations from a safety perspective. The goal is to map what AI can and cannot do, including its failures, rather than to showcase success.
Phase 1 (2025) — AI Served Only as an Assistant
Phase 1 was conducted in August 2025 using Claude Opus 4.1. The result was clear: Opus 4.1 could not complete tasks independently. It got stuck at the very first step — figuring out how to connect to the robot.
Unquestionably, it could not. Much like our team without Claude, it got hung up on the preliminary task of figuring out how to connect to the robot. — From the Phase 1 results section
However, when human teams used Claude Opus 4.1 as a coding assistant, they significantly outperformed the team working without AI. At the Phase 1 stage, Claude was an effective assistant but not an independent operator.
Phase 2 (2026) — Opus 4.7 Achieved Fully Autonomous Robot Control
Phase 2, published on June 18, 2026, showed a dramatically different picture. Claude Opus 4.7, operating through Claude Code, autonomously controlled the robot dog and completed most of 7 tasks without human intervention.
The robot used was a commercially available quadruped equipped with the manufacturer's controller, a video camera, and lidar sensors. The researcher's role was strictly limited.
The role of our researcher was limited to plugging a laptop running Claude Code into the robodog, entering the initial prompt, approving commands, and approving the model to go to the next task. — From the experimental conditions section
A model that couldn't even connect to the robot in Phase 1 was, just 10 months later, autonomously handling sensor connections, writing control programs, and detecting objects. The pace of generational model improvement is striking.
Experimental Results — Speed, Code Efficiency, and Outcomes
4-Task Comparison — Completion Time (minutes)
Bar length proportional to time. Data from Anthropic Research. Shorter is faster.
Phase 2 set 7 tasks: operating the robot via the controller, connecting to video and lidar sensors, writing and running a manual control program, monitoring the robot's path, detecting a beach ball, and autonomously retrieving the beach ball. Below, we examine the 4-task comparison across all participants, code efficiency, and what failed.
About 20x Faster Than Human Teams (9 min 35 sec vs. 181 min)
The core data from this experiment comes from the 4 tasks that all participants completed. The team without AI took 361 minutes, the AI-assisted team took 181 minutes, and Opus 4.7 completed the same tasks in 9 minutes 35 seconds.
Claude Opus 4.7—operating without human assistance—was about 20 times faster than the fastest human team at all tasks completed by participants less than a year ago. — From the speed comparison conclusion
In ratio terms, that's about 37.7x faster than the team without AI and about 18.9x faster than the AI-assisted team. Expanding to 5 tasks, the AI-assisted team took 264 minutes versus Opus 4.7's 12 minutes 7 seconds — a similarly massive gap.
Across all tasks, Opus 4.7 was at least 10 times faster than any human team. This wasn't a case of excelling at one particular task — the speed advantage was consistently an order of magnitude across the board.
| Participant | 4-Task Completion Time | Speed Ratio vs. Opus 4.7 |
|---|---|---|
| Team Claude-less | 361 min | ~37.7x slower |
| Team Claude | 181 min | ~18.9x slower |
| Opus 4.7 (autonomous) | 9 min 35 sec | — |
Across three trials, Opus 4.7 showed low variance between runs, and most of its code worked correctly on the first attempt.
Code Volume Was One-Tenth of Humans (1,045 Lines vs. 10,309 Lines)
The difference extended beyond speed to the amount of code produced.
it was as or more successful than both human teams while producing almost ten times less code than Team Claude. — From the code efficiency section
| Team | Lines of Code |
|---|---|
| Team Claude (AI-assisted) | 10,309 |
| Team Claude-less | 1,136 |
| Opus 4.7 | 1,045 |
Notably, the code volume from the team without AI (1,136 lines) and Opus 4.7 (1,045 lines) are nearly identical. The AI-assisted team incorporated Claude's suggestions and expanded their codebase, while Opus 4.7 working alone wrote only what was necessary. While less code doesn't automatically mean better code, achieving equal or better outcomes without accumulating redundant code demonstrates practical efficiency in AI code generation.
Beach Ball Retrieval Failed — The Closed-Loop Control Barrier
The final task was to detect a beach ball and autonomously retrieve it to the starting turf. This task was not fully achieved.
Opus 4.7 handled sensor connections, ball detection, and positioning (maneuvering behind the ball). However, precisely moving the ball became unstable.
But the efforts to do so were poorly controlled and (again, like our human participants) not successful. — From the beach ball retrieval results
The root cause lies in the difficulty of closed-loop control — a control approach that requires continuously adjusting movements based on real-time visual feedback. While LLMs excel at open-loop tasks like reading sensor data and generating code, continuously fine-tuning physical actions in the real world remains an unsolved challenge for current large language models.
The researchers noted that a human with robotics experience could achieve this task by adding additional scaffolding. This means the challenge isn't impossible — it's at a level where human-AI collaboration can bridge the gap.
Possibilities and Limitations of AI Robotics
What Opus 4.7 Could and Could Not Do in Robot Control
Project Fetch Phase 2 maps out where AI stands today with measured data. Software-side tasks (connections, program creation, data processing) showed overwhelming speed, while continuous physical-world control hit clear limits. Here's what the results tell us about what's possible and what isn't.
What It Means That No Robotics-Specific Training Was Needed
A key detail of this experiment: Opus 4.7 received no robotics-specific fine-tuning. It wasn't trained on robot control datasets. General improvements in model capability transferred directly to physical tasks.
It is worth underscoring (as we did in our previous post) that this progress is not the result of a concerted effort to improve the robotics capabilities of our models. These improvements, like so many others in the history of LLM development, have emerged from much more general scaling. — From the generalization section
This result supports the idea that once an AI model's general capabilities cross a certain threshold, it can handle new domains without specialized training. The same principle could apply beyond robotics to manufacturing equipment operation, IoT device management, and remote physical tool control.
Opus 4.7 quickly handled decisions that humans found tricky (such as selecting the right sensor interface approach), and most of its code worked on the first try. Programming's inherent fast feedback loop — write, run, observe, fix — aligns well with what AI models excel at.
The Challenge of Real-Time Feedback Control
The beach ball retrieval failure revealed the contours of what current LLMs struggle with. Low-level robot control — specifically, formulating actuation policies (detailed motor movement plans) — remains outside LLM capabilities.
In programming, the cycle of writing code, checking results, and making corrections is clearly separated into discrete steps. Robot control, however, requires continuously adjusting motors while simultaneously processing visual input. This "adjust while watching" real-time feedback processing is structurally difficult for LLMs, which think sequentially through text.
The experiment doesn't conclude that LLMs have "solved robotics" or that they "can't." Since the beach ball retrieval was achievable with human scaffolding, gradual automation through human-AI collaboration emerges as the realistic path forward.
Practical Implications and Future Outlook
Looking at Project Fetch's results from a practical standpoint, two points stand out.
we now seem much closer to a world where models will be able to use off-the-shelf physical tools with relative ease — From the overall discussion
First, the pace of generational model evolution. A model that couldn't even connect to the robot in August 2025 was completing autonomous operations 20x faster by June 2026. Since this progress came from general model scaling rather than robotics-specific training, next-generation models may handle even more complex physical tasks.
Second, the ease of pairing with off-the-shelf hardware. The experiment used a commercially available robot dog with no custom drivers or middleware — and the AI could still operate it. This lowers the barrier to adoption in industrial applications.
From the perspective of someone who uses Claude Code extensively in daily work, the speed and first-try accuracy of code generation matches practical experience. The finding that this capability is extending into physical-world control suggests that AI's role in business is beginning to move beyond the boundaries of software.
Summary — Where AI Robotics Stands Today
Project Fetch Phase 2 is an experiment that maps AI's autonomous control capabilities with hard numbers. Claude Opus 4.7 operated an off-the-shelf robot dog about 20 times faster than human teams, using one-tenth the code. However, it did not achieve precise physical control for beach ball retrieval.
What this experiment shows is not that "AI can fully replace robots," but that "AI can dramatically accelerate software-side tasks." Continuous physical-world control remains a human domain, though where that boundary moves with the next generation of models is impossible to predict.
Detailed methodology and data are available on Anthropic's Research page.
When researching the latest AI developments, converting official pages to Markdown format before reading preserves heading and table structure for more efficient analysis.