The Basics of Claude Opus 4.8, 4.7, and 4.6 Pricing
Claude, from Anthropic, offers Opus 4.8, 4.7, and 4.6 in parallel. Pricing is easiest to understand by splitting it into two ways of using it.
Two Ways to Pay: Chat and the API
There is chat, used on a monthly plan, and the API, where you pay for what you use—and the pricing logic differs for each. The chat monthly fee is the same for every model, and the per-token rate varies by model only on the API. On chat, though, how fast you consume your usage limit does differ by model.
| Chat (monthly plan) | Using claude.ai or the Claude app for a fixed monthly price. The monthly fee is the same for every model, but usage-limit consumption varies by model. |
|---|---|
| API (pay-as-you-go) | Building it into your own app or program and paying for what you use. Cost varies by model. |
| What drives cost | On chat it's your plan; on the API it's the token count (how text is counted and how much the model thinks). |
The Difference Between the Three Models and How to Choose
The difference between the three models comes down to one point: the bigger the number, the newer the model. Newer models tend to be more capable.
| Model | Positioning | Best suited for |
|---|---|---|
| Opus 4.8 | Newest, top-tier model | The default when starting out. Strong on complex requests and fast mode |
| Opus 4.7 | One generation before 4.8 | Stable; also handles long-running autonomous work |
| Opus 4.6 | Older generation | Still available. Tends to be cheaper for heavy text input via the API |
For chat, since the monthly fee is the same, picking the newest and most capable 4.8 is the sensible default. For API use the best model depends on the task, which we cover below. For using Claude itself, see our guide to using Claude (claude.ai).
Pricing and Usage Limits on Chat
Most people use chat (a monthly plan). Here, keep "the fee" and "the usage limit" separate.
The Monthly Fee Is the Same for Every Model
A monthly chat plan is a fixed price you can use up to your limit. Whichever model you pick, the plan's monthly fee itself doesn't change—4.8 and 4.6 are billed the same. That is the part people mean by "all three are the same on chat."
For the record, the base rate when used via the API is also level across the three, as the official pricing table confirms.
| Claude Opus 4.8 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok | | Claude Opus 4.7 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok | | Claude Opus 4.6 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok | — from the "Model pricing" table (columns: Base Input / 5m Cache Writes / 1h Cache Writes / Cache Hits & Refreshes / Output Tokens)
But 4.7/4.8 Reach the Usage Limit Sooner
Even at the same price, how much you can actually do for that price can differ. Pro and Max plans have usage limits (caps on how much you can use within a time window and per week), and according to Anthropic, how fast you consume them depends on which model you're chatting with and the effort level you've selected.
Your usage is affected by several factors, including the length and complexity of your conversations, the features you use, which Claude model you're chatting with, and the effort level you've selected. — from the explanation of what affects your usage limit
And Opus 4.7/4.8 use a new tokenizer, so the same text is counted as more tokens than on 4.6 (officially, the same input maps to roughly 1.0–1.35×).
First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type. — from the explanation of Opus 4.7's tokenizer
So even though the monthly fee is the same, on 4.7/4.8 the same content tends to reach the usage limit sooner than on 4.6, meaning the amount you can actually get done for the same price tends to shrink a little. In our own hands-on chat use, the editor found that newer models reached the limit sooner than before for the same kind of work. Note that per-plan limits can change, so check the official pricing/plan page for the latest.
To Cut Cost, Reconsider Your Plan
What cuts cost on chat is your plan, not the model. Claude has a Free plan to try it, Pro for everyday use, and Max for heavy users, and higher-limit plans let you do more. You don't need a pricier plan to use the capable models, so a practical approach is to start on an affordable plan and step up only if you keep hitting the limit. Plan prices and contents can change, so check the latest on the official pricing page before subscribing.
Why API (Pay-as-You-Go) Cost Varies, and How to Cut It
From here it's about using Claude via the API. If you only use chat, you can skip this part.
How API Cost Is Determined (Rate × Tokens)
API cost is rate × tokens used. A token is the small unit text is split into for billing, and the rate is identical across the three models. So what moves cost on the API is the number of tokens used. The context window (1M tokens) and max output (128K tokens) are also the same across the three, and feeding in long text alone does not add a premium.
Claude Fable 5, ... Claude Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. (A 900k-token request is billed at the same per-token rate as a 9k-token request.) — from the "Long context pricing" section
Input Side: 4.7+ Uses More Tokens for the Same Text
One reason the same work can use different token counts is the input side. Opus 4.7 and later use a new tokenizer (the way text is split into tokens), so the same text can be counted as up to ~35% more tokens than on 4.6.
Opus 4.7 and later use a new tokenizer compared to previous models, contributing to their improved performance on a wide range of tasks. This new tokenizer may use up to 35% more tokens for the same fixed text. — from a Note on the Pricing page
Same rate, but more tokens means higher input cost. So for heavy text input via the API, the older-tokenizer 4.6 tends to be cheaper on input.
Output Side: It Varies with effort
The other reason is the output side: token volume depends on how much the model thinks and writes. The setting for "how hard it thinks" is called effort, shown as effort in the API and docs. On Opus 4.8 it defaults to high, so output tends to be heavier even for light requests.
On Claude Opus 4.8, the `effort` parameter defaults to `high` on all surfaces, including the Claude API and Claude Code. Set `effort` explicitly to use a different level. — from a Note in Models overview
Lowering effort to match the task trims output cost while keeping quality.
Tips to Cut Cost: effort, Caching, Batch, and Fast Mode
You can lower API cost in several ways. In order of impact:
- Lower the effort: the easiest lever on output cost. For light work, setting medium or low cuts token use. The xhigh level for long-running autonomous work is available on Opus 4.8 and 4.7, but not on 4.6.
- Prompt caching: instead of resending a long prefix or document each time, reading from cache cuts input cost—a cache read costs about 10% of the standard input rate.
- Batch API: for high-volume work that isn't time-sensitive, you get 50% off both input and output (it can't be combined with fast mode).
- Fast mode is cheaper on 4.8: standard rates are identical across the three, but fast mode (faster output at a premium) is the one exception.
`xhigh` ... Available on Claude Fable 5, Claude Mythos 5, Claude Opus 4.8, and Claude Opus 4.7. — from the Effort levels table, xhigh row
| 5-minute cache write | 1.25x base input price | Cache valid for 5 minutes | | 1-hour cache write | 2x base input price | Cache valid for 1 hour | | Cache read (hit) | 0.1x base input price | Same duration as the preceding write | — from the Prompt caching multiplier table
The Batch API allows asynchronous processing of large volumes of requests with a 50% discount on both input and output tokens. — from the "Batch processing" section
Fast-mode rates differ by model:
| Model | Input ($/MTok) | Output ($/MTok) |
|---|---|---|
| Opus 4.6 / 4.7 | $30 | $150 |
| Opus 4.8 | $10 | $50 |
| Claude Opus 4.6 / Claude Opus 4.7 | $30 / MTok | $150 / MTok | | Claude Opus 4.8 | $10 / MTok | $50 / MTok | — from the Fast mode pricing table
In fast mode, Opus 4.8 is about one-third the rate of 4.6/4.7. When cutting cost, check whether you can lower effort before changing models.
Getting the Most Out of Claude Opus
Finally, here are the practices that improve accuracy while keeping cost down, plus the conclusion on which model to choose.
Tips for Passing Long Documents
When feeding long materials into Claude for summaries or research, preparing them in Markdown format first improves accuracy. Heading hierarchy and table structure are preserved, making the content easier to read correctly. It also trims wasted tokens, which in turn lowers the cost of the same task.
Pasting web pages as-is can mix in styling and layout that get in the way. Converting the source to Markdown before handing it over is the reliable path to stable results.
Text and files you enter are processed within your own environment and are not sent to sakutto's servers. You can format confidential materials with peace of mind.
Which Model to Choose: Conclusion and Cautions
To summarize: on chat, picking the newest 4.8 puts you at no disadvantage on the monthly fee, and the thing to look at for savings is your plan. On the API the best model depends on the task—4.6 tends to be cheaper for heavy text input, while 4.8 is cheaper in fast mode. For details on the newest model, see our Claude Opus 4.8 release overview.
Official Sources on Claude Opus Pricing
This article is based on the following primary sources (Anthropic official documentation). Pricing and specs can change, so always check the official sources for the latest, accurate information.