What Mistral OCR 4 Is (Bounding-Box Document AI)
What Mistral OCR 4 returns from a single document
Mistral OCR 4 is a document-reading (OCR) AI model that the French AI company Mistral AI released on June 23, 2026. OCR (Optical Character Recognition) is the technology that converts the characters inside an image or PDF into text a computer can work with. Let's start with what OCR 4 returns, what a bounding box is, and how it differs from traditional OCR.
What Mistral OCR 4 Is and the Data It Returns
Mistral OCR 4 pulls the contents of a document out as structured data, not as a flat run of characters. Along with the extracted text, it returns bounding boxes, typed blocks, and confidence scores that show how reliable each reading is.
returns bounding boxes, typed-block classification (titles, tables, equations, signatures, and more), and inline confidence scores — from the Mistral OCR 4 announcement
In other words, when you feed it a paper invoice, a contract, or a research PDF, the prose comes back as Markdown text, tables come back as tables, and equations come back as equations — each tagged with what kind of block it is and where it sits on the page. This structured output sharply cuts the effort of handling the data downstream.
What a Bounding Box Is (a Box That Marks Where Text Sits)
A bounding box is a rectangular coordinate box that marks where extracted text or a figure sits on the page. In OCR 4, each block comes back with top-left and bottom-right coordinates (top_left_x, top_left_y, bottom_right_x, bottom_right_y). With this position information, you can point with a highlight to exactly which part of the source an AI answer is based on.
Bounding boxes, our most-requested capability, localize text for in-context highlighting and reliable data pipelines. — from the Mistral OCR 4 announcement
Mistral calls bounding boxes its most-requested capability. If you ask an AI "where is the termination clause in this contract?" and it can show the spot in the original with a box rather than just an answer, the content is far easier for a person to verify. In practice, when you run invoices and contracts through an AI, it can read the characters but often cannot trace back to "where it was written," which slows down review. Bounding boxes are what cut that rework. The less room for error in automated document processing, the more this ability to trace "what was read, and where" pays off.
How It Differs from Traditional OCR (Block Classification and Confidence Scores)
Traditional OCR mainly aimed to turn characters and tables into text. OCR 4 takes a step further: it splits what it reads into units called blocks and classifies each one by type. Each block is localized with a bounding box and returned as structured data classified into types such as title, table, equation, and signature.
Each block is localized with a bounding box, classified by type, and inline confidence scores are generated per-page and per-word. — from the Mistral OCR 4 announcement
The difference from traditional OCR lines up like this.
| Aspect | Traditional OCR | Mistral OCR 4 |
|---|---|---|
| Main output | Text from characters and tables | Markdown text plus structured data |
| Position of text | Often not provided | Returns coordinates as bounding boxes |
| Block classification | Limited | Sorts into titles, tables, equations, signatures, and more |
| Reading reliability | Often not shown | Confidence scores per page and per word |
| Handling downstream | Needs custom cleanup and splitting | Easy to pass straight to search and data work |
*The traditional-OCR column is a general characterization; coverage varies by product.
On top of that, because confidence scores come per page and per word, you can focus later checks on the spots where the reading looks shaky. Where the old approach only transcribed characters, OCR 4 returns "what is where, and how certain" all at once — which is what makes it easy to work with in practice.
What Mistral OCR 4 Can Do and Its Benchmark Performance
Mistral OCR 4 public benchmarks and evaluation
Bars are scaled 0–100. OlmOCRBench and OmniDocBench are benchmark scores; the win rate is the share by which human reviewers preferred it over leading OCR and document-AI systems. The metrics differ, so they are not directly comparable side by side. Figures are Mistral's official values (as of June 2026).
You can gauge what OCR 4 can do from both its breadth of coverage and its public benchmarks. This section looks at multilingual support and block classification, how confidence scores are put to use, and how it scores on benchmarks.
170-Language Support and Block Classification
OCR 4 covers a wide range of languages. It supports 170 languages and spans 10 language groups. That makes it easy to use even when you want to process multilingual documents, including Japanese, together.
170 languages across 10 language groups — from the Mistral OCR 4 announcement
Block classification helps in practice too. Because titles, tables, equations, and signatures come back separately, it is easier to build later steps that, say, pull out only the tables to load into a spreadsheet, or just check whether a signature field is present. Rather than treating a whole document as one lump of text, you can target and extract the parts you need.
Confidence Scores and Source-Grounded Citations (RAG and Grounding)
Block types and confidence scores pair well with setups that have an AI read documents and answer from them. Block types and confidence scores drive source-grounded citations, redactions, and human-in-the-loop verification.
block types and confidence scores drive source-grounded citations, redactions, and human-in-the-loop verification. — from the Mistral OCR 4 announcement
Confidence scores come not only per page but per word. So you can have a person check only the spots where a single wrong character would be critical, such as figures or proper nouns. Being able to apply human review to the low-confidence spots rather than eyeballing everything pays off most when you handle documents at scale.
This is exactly what helps as a preprocessing step for RAG (retrieval-augmented generation — having an AI search your own documents and use them as grounds for its answers). Blocks that are sorted by type and easy to trace back to a source make higher-quality "retrieval units" to feed an AI. For the bigger picture of feeding your PDFs to an AI, see our guide on how to load PDFs into ChatGPT.
Benchmark Results (OlmOCRBench 85.20 and a 72% Win Rate)
In published evaluations, OCR 4 posts strong numbers. It scored 85.20 on the public OlmOCRBench, which Mistral reports as the top overall score among the models it tested.
the top overall score amongst the models we tested on the public OlmOCRBench (85.20) — from the Mistral OCR 4 announcement
It also scores high on a broad document-parsing benchmark. On OmniDocBench, OCR 4 records a score of 93.07.
On OmniDocBench, OCR 4 achieves a score of 93.07. — from the Mistral OCR 4 announcement
Human reviewers preferred it too. According to Mistral, independent annotators preferred OCR 4 over every leading OCR and document-AI system it tested, with an average win rate of 72%. That said, benchmark numbers shift with the systems compared and the test conditions. Since each metric measures something different, it is more practical to weight the evaluations closest to your own use.
Mistral OCR 4 Pricing and How to Use the API
Mistral OCR 4 pricing (per 1,000 pages, USD)
Bars are the price in dollars. The Batch API is a discount for processing in bulk. Document AI adds structured annotations. Source: Mistral official (as of June 2026).
Pricing and usage are essential when you weigh adoption. This section covers the per-page price, how the API is called, and self-hosting and the platforms that offer it.
Pricing ($4 per 1,000 Pages, $2 with Batch)
OCR 4 is billed by usage, based on the number of pages processed. Standard OCR is $4 per 1,000 pages, and processing in bulk with the Batch API brings it to $2 per 1,000 pages.
$4 per 1,000 pages, dropping to $2 with the Batch-API discount. — from the Mistral OCR 4 announcement (Pricing)
| Plan | Price (per 1,000 pages) | Main use |
|---|---|---|
| Standard OCR | $4 | Everyday document reading |
| Batch API | $2 | Processing large volumes in bulk |
| Document AI (annotated) | $5 | Adds structured annotations |
If you can process large volumes that are not time-critical in bulk, the Batch API cuts the cost in half. Pricing can be revised, so check the official pricing page before you commit.
How to Use the API (mistral-ocr-latest and Supported Formats)
OCR 4 works with a single API call. From code, you call client.ocr.process() and set the model name to mistral-ocr-latest. To use OCR 4–specific features, set the model name to mistral-ocr-4-0 or newer.
client.ocr.process() … model="mistral-ocr-latest" — from the OCR Processor documentation
You can pass the input as a public URL (document_url), as Base64-encoded data, or as a file uploaded to the cloud. What comes back is one bundle of data: the Markdown text, per-page information, images, tables, blocks, and confidence scores.
The range of input formats is broad: documents such as PDF, PowerPoint, and Word, and images such as PNG, JPEG, and AVIF.
pdf, pptx, docx and more... / png, jpeg/jpg, avif and more... — from the OCR Processor documentation (supported formats)
When you want to describe figures or charts, you can add an annotation step. There are two modes. One describes each extracted image individually (bbox_annotation), processing each box separately after OCR. The other handles the whole document together (document_annotation), processing the Markdown text alongside the extracted images. You can use per-part annotation and whole-document summaries depending on the task.
Before you process anything through the API, if you just want to tidy a PDF into Markdown, the following browser-only tool is the quick way.
Self-Hosting and the Platforms That Offer It
OCR 4 is flexible in how it is offered. It can be packaged into a single container and run fully self-hosted on your own infrastructure. For organizations that want to process sensitive documents without sending them outside, that self-hosting option is a good fit.
There are several platforms, too. Beyond Mistral's own Mistral Studio and API, it is available through Amazon SageMaker and Microsoft Foundry, and it is planned for Snowflake's Parse Document. You can pick the entry point that matches the cloud you already use. For where Mistral sits within generative AI overall, reading it alongside our guide on what Claude is (Anthropic's generative AI) helps put it in context.
Where Mistral OCR 4 Fits, Cautions, and Summary
Where Mistral OCR 4 fits
To close, here is where Mistral OCR 4 delivers the most value, what to check before adopting it, and the sources.
Where It Fits (Document Digitization and RAG Preprocessing)
OCR 4 is built on the premise that you not only "read" a document but "hand it to the next step." Mistral positions it as the part that moves AI agents from merely reading documents to acting on them — form filling, invoice processing, and compliance checks.
agents move from reading documents to acting on them (form filling, invoice processing, compliance checks) — from the Mistral OCR 4 announcement
In concrete terms, it shines at digitizing paper and PDF forms into a core system, shaping papers and reports into a searchable form, and having an AI answer from internal documents with grounds. For the whole flow of digitizing paper documents, see our guide to digitizing business documents as well. If you want to try the flow of feeding documents to an AI, starting by converting PDFs into a clean form makes it easier to get going.
What to Check Before Using It
For all its convenience, there are a few things worth keeping in mind before adopting it. OCR 4's reading is described as highly accurate, but it is safer to build on the assumption that low-confidence spots and important figures get a final human check. Mistral itself anticipates uses that support human review.
Pricing, model names, and the platforms on offer can also change. In particular, if you pin a model name in production, confirm in the official documentation that the model is still offered and revisit it as needed. If cost is a concern, simply routing non-urgent jobs through the Batch API already lowers the unit price. Once your use and document volume settle, reselecting which fits — standard, batch, or self-hosted — keeps things lean.
Mistral OCR 4 Summary
Mistral OCR 4 is a document AI that not only reads characters but returns position, type, and reliability as structured data. Localizing with bounding boxes and classifying blocks is what underpins the quality of downstream work such as RAG, agents, and internal-document search. Pricing runs from $2 to $5 per 1,000 pages, and you can choose batch processing or self-hosting depending on the use. Starting from document digitization, it is a strong option when you want to broaden how you use AI.
Before feeding documents to an AI, tidying a PDF into Markdown while keeping the structure of headings and tables tends to make the downstream processing more stable. When you want to keep it entirely in the browser, the following tool helps.