sakutto
Generative AI· Opus 4.8

Claude Opus 4.8, 4.7, 4.6 Pricing Compared: Cost Differences and How to Choose

Source: Anthropic Official Docs (Pricing)
ClaudeAnthropicPricingLLM

The Basics of Claude Opus 4.8, 4.7, and 4.6 Pricing

Claude, from Anthropic, offers Opus 4.8, 4.7, and 4.6 in parallel. Pricing is easiest to understand by splitting it into two ways of using it.

Two Ways to Pay: Chat and the API

There is chat, used on a monthly plan, and the API, where you pay for what you use—and the pricing logic differs for each. The chat monthly fee is the same for every model, and the per-token rate varies by model only on the API. On chat, though, how fast you consume your usage limit does differ by model.

Chat (monthly plan)Using claude.ai or the Claude app for a fixed monthly price. The monthly fee is the same for every model, but usage-limit consumption varies by model.
API (pay-as-you-go)Building it into your own app or program and paying for what you use. Cost varies by model.
What drives costOn chat it's your plan; on the API it's the token count (how text is counted and how much the model thinks).

The Difference Between the Three Models and How to Choose

The difference between the three models comes down to one point: the bigger the number, the newer the model. Newer models tend to be more capable.

ModelPositioningBest suited for
Opus 4.8Newest, top-tier modelThe default when starting out. Strong on complex requests and fast mode
Opus 4.7One generation before 4.8Stable; also handles long-running autonomous work
Opus 4.6Older generationStill available. Tends to be cheaper for heavy text input via the API

For chat, since the monthly fee is the same, picking the newest and most capable 4.8 is the sensible default. For API use the best model depends on the task, which we cover below. For using Claude itself, see our guide to using Claude (claude.ai).

Pricing and Usage Limits on Chat

Most people use chat (a monthly plan). Here, keep "the fee" and "the usage limit" separate.

The Monthly Fee Is the Same for Every Model

A monthly chat plan is a fixed price you can use up to your limit. Whichever model you pick, the plan's monthly fee itself doesn't change—4.8 and 4.6 are billed the same. That is the part people mean by "all three are the same on chat."

For the record, the base rate when used via the API is also level across the three, as the official pricing table confirms.

View official source →
| Claude Opus 4.8 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok | | Claude Opus 4.7 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok | | Claude Opus 4.6 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok | — from the "Model pricing" table (columns: Base Input / 5m Cache Writes / 1h Cache Writes / Cache Hits & Refreshes / Output Tokens)

But 4.7/4.8 Reach the Usage Limit Sooner

Even at the same price, how much you can actually do for that price can differ. Pro and Max plans have usage limits (caps on how much you can use within a time window and per week), and according to Anthropic, how fast you consume them depends on which model you're chatting with and the effort level you've selected.

View official source →
Your usage is affected by several factors, including the length and complexity of your conversations, the features you use, which Claude model you're chatting with, and the effort level you've selected. — from the explanation of what affects your usage limit

And Opus 4.7/4.8 use a new tokenizer, so the same text is counted as more tokens than on 4.6 (officially, the same input maps to roughly 1.0–1.35×).

View official source →
First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type. — from the explanation of Opus 4.7's tokenizer

So even though the monthly fee is the same, on 4.7/4.8 the same content tends to reach the usage limit sooner than on 4.6, meaning the amount you can actually get done for the same price tends to shrink a little. In our own hands-on chat use, the editor found that newer models reached the limit sooner than before for the same kind of work. Note that per-plan limits can change, so check the official pricing/plan page for the latest.

To Cut Cost, Reconsider Your Plan

What cuts cost on chat is your plan, not the model. Claude has a Free plan to try it, Pro for everyday use, and Max for heavy users, and higher-limit plans let you do more. You don't need a pricier plan to use the capable models, so a practical approach is to start on an affordable plan and step up only if you keep hitting the limit. Plan prices and contents can change, so check the latest on the official pricing page before subscribing.

Why API (Pay-as-You-Go) Cost Varies, and How to Cut It

From here it's about using Claude via the API. If you only use chat, you can skip this part.

How API Cost Is Determined (Rate × Tokens)

API cost is rate × tokens used. A token is the small unit text is split into for billing, and the rate is identical across the three models. So what moves cost on the API is the number of tokens used. The context window (1M tokens) and max output (128K tokens) are also the same across the three, and feeding in long text alone does not add a premium.

View official source →
Claude Fable 5, ... Claude Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. (A 900k-token request is billed at the same per-token rate as a 9k-token request.) — from the "Long context pricing" section

Input Side: 4.7+ Uses More Tokens for the Same Text

One reason the same work can use different token counts is the input side. Opus 4.7 and later use a new tokenizer (the way text is split into tokens), so the same text can be counted as up to ~35% more tokens than on 4.6.

View official source →
Opus 4.7 and later use a new tokenizer compared to previous models, contributing to their improved performance on a wide range of tasks. This new tokenizer may use up to 35% more tokens for the same fixed text. — from a Note on the Pricing page

Same rate, but more tokens means higher input cost. So for heavy text input via the API, the older-tokenizer 4.6 tends to be cheaper on input.

Output Side: It Varies with effort

The other reason is the output side: token volume depends on how much the model thinks and writes. The setting for "how hard it thinks" is called effort, shown as effort in the API and docs. On Opus 4.8 it defaults to high, so output tends to be heavier even for light requests.

View official source →
On Claude Opus 4.8, the `effort` parameter defaults to `high` on all surfaces, including the Claude API and Claude Code. Set `effort` explicitly to use a different level. — from a Note in Models overview

Lowering effort to match the task trims output cost while keeping quality.

Tips to Cut Cost: effort, Caching, Batch, and Fast Mode

You can lower API cost in several ways. In order of impact:

  • Lower the effort: the easiest lever on output cost. For light work, setting medium or low cuts token use. The xhigh level for long-running autonomous work is available on Opus 4.8 and 4.7, but not on 4.6.
  • Prompt caching: instead of resending a long prefix or document each time, reading from cache cuts input cost—a cache read costs about 10% of the standard input rate.
  • Batch API: for high-volume work that isn't time-sensitive, you get 50% off both input and output (it can't be combined with fast mode).
  • Fast mode is cheaper on 4.8: standard rates are identical across the three, but fast mode (faster output at a premium) is the one exception.
View official source →
`xhigh` ... Available on Claude Fable 5, Claude Mythos 5, Claude Opus 4.8, and Claude Opus 4.7. — from the Effort levels table, xhigh row
View official source →
| 5-minute cache write | 1.25x base input price | Cache valid for 5 minutes | | 1-hour cache write | 2x base input price | Cache valid for 1 hour | | Cache read (hit) | 0.1x base input price | Same duration as the preceding write | — from the Prompt caching multiplier table
View official source →
The Batch API allows asynchronous processing of large volumes of requests with a 50% discount on both input and output tokens. — from the "Batch processing" section

Fast-mode rates differ by model:

ModelInput ($/MTok)Output ($/MTok)
Opus 4.6 / 4.7$30$150
Opus 4.8$10$50
View official source →
| Claude Opus 4.6 / Claude Opus 4.7 | $30 / MTok | $150 / MTok | | Claude Opus 4.8 | $10 / MTok | $50 / MTok | — from the Fast mode pricing table

In fast mode, Opus 4.8 is about one-third the rate of 4.6/4.7. When cutting cost, check whether you can lower effort before changing models.

Getting the Most Out of Claude Opus

Finally, here are the practices that improve accuracy while keeping cost down, plus the conclusion on which model to choose.

Tips for Passing Long Documents

When feeding long materials into Claude for summaries or research, preparing them in Markdown format first improves accuracy. Heading hierarchy and table structure are preserved, making the content easier to read correctly. It also trims wasted tokens, which in turn lowers the cost of the same task.

Pasting web pages as-is can mix in styling and layout that get in the way. Converting the source to Markdown before handing it over is the reliable path to stable results.

Free ToolURL to Markdown ConverterConvert any public web page URL to Markdown. Preserves headings, tables, lists, and links — perfect for LLM and RAG preprocessing, research notes, and archiving web articles.Try it now →

Text and files you enter are processed within your own environment and are not sent to sakutto's servers. You can format confidential materials with peace of mind.

Which Model to Choose: Conclusion and Cautions

To summarize: on chat, picking the newest 4.8 puts you at no disadvantage on the monthly fee, and the thing to look at for savings is your plan. On the API the best model depends on the task—4.6 tends to be cheaper for heavy text input, while 4.8 is cheaper in fast mode. For details on the newest model, see our Claude Opus 4.8 release overview.

Official Sources on Claude Opus Pricing

This article is based on the following primary sources (Anthropic official documentation). Pricing and specs can change, so always check the official sources for the latest, accurate information.

Anthropic Official Docs — PricingView official source →
Anthropic Official Docs — Models overviewView official source →
Anthropic Official Docs — EffortView official source →

FAQ

Q. On chat, does Claude Opus 4.8, 4.7, or 4.6 cost different amounts?
The plan's monthly fee itself doesn't change whichever model you pick. The thing to watch is your usage limit. Pro and Max plans have usage caps, and how fast you consume them depends on which model you're chatting with. Because Opus 4.7/4.8 use a new tokenizer, the same content uses more tokens than on 4.6, so you tend to reach the usage limit sooner. The per-token rate itself varies by model only when you use the API (pay-as-you-go).
Q. Which of Opus 4.8, 4.7, and 4.6 should I choose?
For chat, the newest 4.8 is the standard choice, since all three cost the same and 4.8 is the most capable. For API use it depends on the task: 4.6 tends to be cheaper for heavy text input, while 4.8 is cheaper in fast mode.
Q. Is it true that the older 4.6 is cheaper?
On a monthly chat plan, 4.6 and 4.8 have the same monthly fee (though usage-limit consumption varies by model—4.7/4.8 reach the cap sooner for the same content). On the API rate, 4.6 tends to be cheaper when you input a lot of text, because from Opus 4.7 onward the tokenizer changed and the same text can be counted as up to ~35% more tokens than on 4.6.

Related Tools

Related Tool Categories

Articles