A multi-agent system released by Sakana AI on June 22, 2026 that bundles multiple high-performance AI models so they can be used like a single model.

How does Sakana Fugu work?

It automatically combines multiple frontier models internally, and you only call a single API.

Which models does Sakana Fugu outperform?

It is said to consistently outperform the generally available Gemini 3.1 Pro, Opus 4.8, and GPT 5.5.

Is Sakana Fugu more capable than Claude Fable 5?

Sakana AI does not claim to surpass Fable 5. It describes Fugu Ultra as standing 'shoulder-to-shoulder' with the top-tier Fable 5 and Mythos Preview, and only clearly outperforming the generally available models.

Why isn't Claude Fable 5 in the comparison pool?

Fable 5 and Mythos Preview are not publicly accessible, so they are explicitly noted as not being in Fugu's agent pool.

How many versions of Sakana Fugu are there?

Two: the standard Fugu and the higher-tier Fugu Ultra, so you can match the system to your workload.

Can I see which model was used for each response?

No. Which models Fugu selects and how it coordinates them is proprietary, and the routing information is not exposed by design.

How much does Sakana Fugu cost?

There are subscription tiers for everyday use (USD 20 to 200 per month) and a pay-as-you-go plan for heavier and enterprise workloads.

Why bundle multiple models in the first place?

Because access to a specific model can shift or disappear overnight due to export controls and similar factors, the design aims to reduce that dependency risk.

What Is Sakana Fugu? Performance vs Fable 5, Pricing & AI Comparison

Article summary

A multi-agent system that bundles multiple frontier AIs behind a single API and orchestrates them automatically. It comes in two versions: the standard Fugu and the higher-tier Fugu Ultra.
Fugu Ultra outperforms Opus 4.8, GPT 5.5, and Gemini 3.1 Pro on many benchmarks, scoring 73.7 on SWE-Bench Pro, for example.
However, it stands "shoulder-to-shoulder" with Anthropic's top model Fable 5. On SWE-Bench Pro (86.0) and Humanity's Last Exam (53.3), Fable 5 is ahead, and Sakana AI never claims to "beat Fable 5."
Pricing is a USD 20–200/month subscription plus pay-as-you-go, roughly the same level as major AIs.
Its biggest selling point is reduced dependency risk: if a specific model becomes unavailable due to export controls, it switches to another and keeps running.
Which model is used for each response is not disclosed, and it is currently unavailable in the EU/EEA.
It suits individual developers and small businesses who want to cut the effort of model selection and switching; if your use case and model are already fixed, calling a single model's API directly may be cheaper.

What Is Sakana Fugu (How It Works)

Architecture overview of Sakana Fugu (orchestration that bundles multiple models) — Sakana Fugu architecture overview (source: Sakana AI)

Sakana Fugu is a multi-agent system released on June 22, 2026 by Tokyo-based Sakana AI that lets you treat multiple high-performance AI models as if they were a single model. First, let's go through what it actually is, how it works, how it is offered, and the constraints worth knowing before you use it.

How Sakana Fugu Works (One API That Orchestrates Multiple Models)

Sakana Fugu is not a single large language model (LLM — an AI that understands and generates text). It groups multiple high-performance models into an "agent pool," automatically picks the most suitable models for the input, and combines them to produce an answer. Sakana Fugu bundles multiple frontier AI models behind a single API and behaves like one model. From the user's side you only call one endpoint, while internally several models split into roles such as "thinker," "worker," and "verifier" and cooperate.

Sakana AI releaseView official source →

"Sakana Fugu is a multi-agent system that behaves like a single model." / "accessible through a single model API." — from the Sakana Fugu release

This orchestration is notable because, rather than a human assembling a fixed workflow, it learns to compose specialized agents per task. Behind it are two research works referenced at ICLR 2026: "TRINITY," in which a lightweight coordinator manages multiple LLMs over multiple turns, and "Conductor," which uses reinforcement learning to discover natural-language coordination strategies.

A concrete flow makes it easier to picture. If you send a single request like "write test code from this spec," Fugu internally breaks the task down, has a model good at design draft the plan, has another model write the implementation, and has a verifier model check for mistakes. The user does not see this process, and only the final answer is returned from one API. Where a single model would handle the whole pipeline with one "head," Fugu splits it across multiple heads with different strengths to raise the average quality. For an overview of generative AI and how other models compare, see also our guide to Claude (Anthropic's generative AI).

Two Versions: Fugu and Fugu Ultra

Sakana Fugu has two models: the standard "Fugu" and the higher-tier "Fugu Ultra." Both are available via an OpenAI-compatible API, so you can choose based on your use case and load. The basic choice is Fugu for cost-conscious everyday use, and Fugu Ultra when you want maximum accuracy on hard tasks.

Sakana AI releaseView official source →

"At launch, Sakana Fugu comes in two models, so you can match the system to your workload." — from the Sakana Fugu release

The OpenAI-compatible API is a practical advantage: you can try it by just swapping the endpoint of code already written for OpenAI, keeping migration cost low.

A Downside of Sakana Fugu (Model Opacity)

For all its convenience, Sakana Fugu has an easy-to-miss constraint: which model was actually selected for each answer, and how they were coordinated, is not disclosed to the user. Because you cannot trace which model was used, it can be awkward for work where you need to strictly manage the basis of the output.

Sakana AI product pageView official source →

"The specific models Fugu selects and how it coordinates them are proprietary, so this routing information is not exposed by design." — from the Fugu product page

For domains where audit logs and reproducibility matter (medical, finance, legal, etc.), explicitly calling a single model may be a better fit. Conversely, for general work where you want to keep quality at a certain level while reducing effort, not having to worry about the internal model selection is itself a benefit.

Sakana Fugu Benchmark Performance

Fugu Ultra (Sakana) Fable 5 / Mythos (Anthropic top tier) Major AIs (Opus 4.8, GPT 5.5, Gemini 3.1 Pro)

Bar length is on a unified 0–100 score (higher is better). Figures are benchmark values published by Sakana AI.

SWE-Bench Pro

Fugu Ultra73.7

Fable 586.0

Opus 4.869.2

GPT 5.558.6

Gemini 3.1 Pro54.2

Terminal Bench 2.1

Fugu Ultra82.1

Fable 580.4

GPT 5.578.2

Opus 4.874.6

Gemini 3.1 Pro70.3

LiveCodeBench

Fugu Ultra93.2

Fable 589.8

Gemini 3.1 Pro88.5

Opus 4.887.8

GPT 5.585.3

CharXiv Reasoning

Fugu Ultra86.6

Mythos Preview86.1

Opus 4.884.2

GPT 5.584.1

Gemini 3.1 Pro83.3

GPQA-Diamond

Fugu Ultra95.5

Gemini 3.1 Pro94.3

GPT 5.593.6

Opus 4.892.0

Humanity's Last Exam (text)

Fugu Ultra50.0

Fable 553.3

Opus 4.849.8

Gemini 3.1 Pro44.4

GPT 5.541.4

Benchmark comparison of Sakana Fugu Ultra with major AI models and Anthropic's top tier (Fable 5 / Mythos) (figures source: Sakana AI)

Sakana Fugu's capability can be checked against the benchmarks Sakana AI published. Here we look at the comparison with major AIs, the relationship with Anthropic's top-tier models, and the cautions when reading the numbers.

Comparison With Major AI Models (Opus 4.8, GPT 5.5, Gemini 3.1 Pro)

According to Sakana AI, the Fugu models consistently outperform the generally available frontier models. Fugu Ultra scored 73.7 on the software-engineering benchmark SWE-Bench Pro, beating Opus 4.8's 69.2 and GPT 5.5's 58.6.

Sakana AI releaseView official source →

"Fugu Models consistently outperform frontier models Gemini 3.1 Pro (high), Opus 4.8 (max), and GPT 5.5 (xhigh)" — from the Sakana Fugu release

On the code-generation benchmark LiveCodeBench, the standard Fugu scored 92.9 and Fugu Ultra 93.2, beating Gemini 3.1 Pro's 88.5. On the graduate-level science benchmark GPQA-Diamond it reached 95.5, and on Terminal Bench 2.1 it reached 82.1 — outperforming the three major models in each case. On the hard coding, science, and reasoning benchmarks, Fugu Ultra generally outperforms Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

Why It Is Not "Beating Fable 5" (A Shoulder-to-Shoulder Relationship)

What matters here is that Sakana AI itself does not claim to "beat Fable 5." The official wording is "shoulder-to-shoulder," and it only clearly claims to outperform the generally available models. "Outperforms the major AIs, on par with Anthropic's top tier" is the precise position the company presents.

Sakana AI releaseView official source →

"Our Fugu Ultra model stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview" / "Neither of them is in Fugu's agent pool as they are not publicly accessible." — from the Sakana Fugu release

In fact, Fable 5 and Mythos Preview are not publicly available, so they are not in Fugu's agent pool. In other words, Fugu reaches a level comparable to top-tier models that are hard to access, using only the models that are available. Some overseas and domestic write-ups put it more strongly as "beats Fable 5," but going to the primary source, the claim is more measured. Reading it together with our release explainer for Claude Opus 4.8, another top-tier model, gives a fuller picture of where both stand.

How to Read the Benchmarks, and Cautions

Benchmarks are not an all-purpose metric. Looking at the chart above, on SWE-Bench Pro, Fable 5's 86.0 beats Fugu Ultra's 73.7, and on Humanity's Last Exam, Fable 5's 53.3 beats Fugu Ultra's 50.0. On Terminal Bench, LiveCodeBench, and CharXiv, on the other hand, Fugu Ultra is ahead, so the wins and losses swap by benchmark. This is what "shoulder-to-shoulder" actually looks like.

When reading the numbers, checking three things helps avoid misunderstanding: (1) which models are in the comparison, (2) whether it is a single run or an average of multiple runs, and (3) whether the benchmark is close to your own use case. If you mainly write code, weight the code benchmarks (SWE-Bench, LiveCodeBench); if you mainly do research and reasoning, weight GPQA and Humanity's Last Exam.

Free ToolURL to Markdown ConverterConvert any public web page URL to Markdown. Preserves headings, tables, lists, and links — perfect for LLM and RAG preprocessing, research notes, and archiving web articles.Try it now →

Sakana Fugu Pricing & Comparison

Standard-plan monthly price of major AI services (USD)

Sakana Fugu (blue) is the subscription floor. Bar length is the monthly price (USD). Source: each provider's official pricing (as of June 2026).

Sakana Fugu$20+

ChatGPT Plus$20

Claude Pro$20

Gemini AI Pro$19.99

Perplexity Pro$20

Grok SuperGrok$30

Comparison of the standard-plan monthly price of major AI services (Sakana Fugu in blue / as of June 2026, each provider's official pricing)

Alongside performance, pricing and Sakana Fugu's unique value are the next concerns. Here we cover the pricing plans, the comparison with major AI tools, and the design idea of "not stopping even if a specific model disappears."

Pricing Plans (Subscription + Pay-as-you-go)

Sakana Fugu's pricing has two parts: a subscription for everyday use and pay-as-you-go for heavier and enterprise workloads. With a USD 20–200/month subscription plus pay-as-you-go, it lands at roughly the same level as the standard plans of major AI tools (around USD 20/month).

Sakana AI releaseView official source →

"subscription tiers for everyday use and a pay-as-you-go plan for heavier and enterprise workloads." — from the Sakana Fugu release

Sakana Fugu plan	Price	Main use
Subscription	USD 20–200/month (multiple tiers)	Everyday use
Pay-as-you-go	Fugu Ultra: input $5 / output $30 (per 1M tokens)	Heavy / enterprise workloads

Even when multiple agents run at once, billing does not stack — only the top model's rate is charged. Note that within the EU/EEA it is currently unavailable while data-protection compliance is in progress.

Comparison With Major AI Tools

Placing Sakana Fugu's price band next to major AI services makes its position clearer. Sakana Fugu's pricing is about the same as the standard plans of ChatGPT, Claude, and Gemini, so price alone makes it neither a standout bargain nor expensive.

Tool (provider)	Standard plan (monthly)	Higher plan (monthly)
Sakana Fugu (Sakana AI)	$20+ (subscription) + pay-as-you-go	$200 + pay-as-you-go
ChatGPT (OpenAI)	Plus $20	Pro $200
Claude (Anthropic)	Pro $20	Max $200
Gemini (Google)	AI Pro $19.99	AI Ultra $250
Grok (xAI)	SuperGrok $30	Heavy $300
Perplexity	Pro $20	—

*Representative monthly prices as of June 2026 based on each provider's official pricing pages. They change with exchange rates and revisions, so check each provider's official page for the latest. For details on individual tools, see our ChatGPT (GPT-5) guide. Since pricing is at the same level, the deciding factor shifts from "price" to "whether you want to avoid depending on a single model" and "whether you want to know which model was used."

A Benefit of Sakana Fugu (Reduced Dependency Risk)

Sakana Fugu's biggest differentiator is that it can lower the risk of depending on a specific model. If a specific model suddenly becomes unavailable due to export controls and the like, it automatically switches to another model and keeps processing — that is the core aim of this design.

Sakana AI releaseView official source →

"access can shift or disappear overnight due to changing regulatory boundaries, export controls, and foreign policies." — from the Sakana Fugu release

Using the export controls imposed on Anthropic's Fable and Mythos as an example, Sakana AI says access can shift or disappear overnight due to changes in regulations and foreign policy. Because Fugu dynamically combines multiple models, even if one provider or model becomes unavailable, it can flexibly switch to another and route around it. If you want to avoid betting your business on a single model, this "not stopping" can be a more important factor than price or a momentary benchmark.

How to Use Sakana Fugu (Summary)

Sakana Fugu (source: Sakana AI)

Finally, here is how users can position Sakana Fugu, organized by audience, pre-adoption cautions, and sources.

How to Use Sakana Fugu / Use Cases

That Tokyo-born Sakana AI shipped an orchestration-type service is not a small thing for individual developers and small businesses. If you want to avoid depending on a single foreign model and not let price revisions, service discontinuation, or regulatory changes swing your business, Sakana Fugu is a strong option. Concretely, it shines when you want to embed AI into the backend of an internal tool or SaaS but can't decide which model is best, or want to cut the effort of keeping up with future model changes.

For example, consider automating inquiry handling or summarizing internal documents. For a small team run by a single person, just comparing each model's strengths and reviewing a switch every time prices change or a service ends is a burden. A mechanism like Fugu, which uses multiple models behind one API, lets you hand off the very decision of "which model to use." Conversely, if your use case and model are already fixed and you only want to minimize unit cost to the last cent, calling the target model's API directly may be cheaper. In short: Fugu when you want to cut the effort of selection and keeping up, and a single model directly once the optimal answer is fixed and you're squeezing cost.

Cautions Before Adoption

Checking three points before deciding to adopt helps you avoid mistakes. The three main caveats of Sakana Fugu are: "you can't trace which model was used," "it's currently unavailable in the EU/EEA," and "depending on the use case, calling a single model directly is cheaper." For work that requires audit logs and reproducibility, businesses with EU users, or use cases where you squeeze cost to the last cent, these constraints will bite. Conversely, for general work where you want to keep quality steady while reducing operational effort and the headache of model selection, these constraints often stay within an acceptable range. If you want to start by feeding your documents to an AI, our practical guide to local LLMs, which covers preparing materials, is also useful.

Sakana Fugu Summary & Official Source

This article is based on the following primary source (official announcement). Pricing, models, and specs are updated, so always check the latest official information before adopting.

Sakana Fugu: One Model to Command Them All (Sakana AI)View official source →

When feeding long documents to a large language model like Fugu, converting them to Markdown beforehand helps preserve heading structure and tables and tends to improve accuracy. To clean up a web page as-is, the following tool helps.

What Is Sakana Fugu? Performance vs Fable 5, Pricing & AI Comparison

What Is Sakana Fugu (How It Works)

How Sakana Fugu Works (One API That Orchestrates Multiple Models)

Two Versions: Fugu and Fugu Ultra

A Downside of Sakana Fugu (Model Opacity)

Sakana Fugu Benchmark Performance

Comparison With Major AI Models (Opus 4.8, GPT 5.5, Gemini 3.1 Pro)

Why It Is Not "Beating Fable 5" (A Shoulder-to-Shoulder Relationship)

How to Read the Benchmarks, and Cautions

Sakana Fugu Pricing & Comparison

Pricing Plans (Subscription + Pay-as-you-go)

Comparison With Major AI Tools

A Benefit of Sakana Fugu (Reduced Dependency Risk)

How to Use Sakana Fugu (Summary)

How to Use Sakana Fugu / Use Cases

Cautions Before Adoption

Sakana Fugu Summary & Official Source

FAQ

Related Tools

Related Tool Categories

Articles

What Is GPT-5 in ChatGPT? Free Tier, Pricing, and Choosing a Model

What Is Claude? Pricing, Choosing a Model, and Claude vs ChatGPT

Claude Opus 4.8, 4.7, 4.6 Pricing Compared: Cost Differences and How to Choose