sakutto
Generative AI· Mistral OCR 4

What Is Mistral OCR 4? Bounding-Box Document AI Explained

MistralOCRDocument AI

What Mistral OCR 4 Is (Bounding-Box Document AI)

What Mistral OCR 4 returns from a single document

Input: PDFs, images, and Office files (Word / PowerPoint, etc.)
↓ Read by Mistral OCR 4
Text … extracted as Markdown, keeping the structure of headings and tables
Bounding boxes … coordinates for where each element sits on the page
Block classification … types such as titles, tables, equations, and signatures
Confidence scores … how reliable the reading is, per page and per word

Mistral OCR 4 is a document-reading (OCR) AI model that the French AI company Mistral AI released on June 23, 2026. OCR (Optical Character Recognition) is the technology that converts the characters inside an image or PDF into text a computer can work with. Let's start with what OCR 4 returns, what a bounding box is, and how it differs from traditional OCR.

What Mistral OCR 4 Is and the Data It Returns

Mistral OCR 4 pulls the contents of a document out as structured data, not as a flat run of characters. Along with the extracted text, it returns bounding boxes, typed blocks, and confidence scores that show how reliable each reading is.

View official source →
returns bounding boxes, typed-block classification (titles, tables, equations, signatures, and more), and inline confidence scores — from the Mistral OCR 4 announcement

In other words, when you feed it a paper invoice, a contract, or a research PDF, the prose comes back as Markdown text, tables come back as tables, and equations come back as equations — each tagged with what kind of block it is and where it sits on the page. This structured output sharply cuts the effort of handling the data downstream.

What a Bounding Box Is (a Box That Marks Where Text Sits)

A bounding box is a rectangular coordinate box that marks where extracted text or a figure sits on the page. In OCR 4, each block comes back with top-left and bottom-right coordinates (top_left_x, top_left_y, bottom_right_x, bottom_right_y). With this position information, you can point with a highlight to exactly which part of the source an AI answer is based on.

View official source →
Bounding boxes, our most-requested capability, localize text for in-context highlighting and reliable data pipelines. — from the Mistral OCR 4 announcement

Mistral calls bounding boxes its most-requested capability. If you ask an AI "where is the termination clause in this contract?" and it can show the spot in the original with a box rather than just an answer, the content is far easier for a person to verify. In practice, when you run invoices and contracts through an AI, it can read the characters but often cannot trace back to "where it was written," which slows down review. Bounding boxes are what cut that rework. The less room for error in automated document processing, the more this ability to trace "what was read, and where" pays off.

How It Differs from Traditional OCR (Block Classification and Confidence Scores)

Traditional OCR mainly aimed to turn characters and tables into text. OCR 4 takes a step further: it splits what it reads into units called blocks and classifies each one by type. Each block is localized with a bounding box and returned as structured data classified into types such as title, table, equation, and signature.

View official source →
Each block is localized with a bounding box, classified by type, and inline confidence scores are generated per-page and per-word. — from the Mistral OCR 4 announcement

The difference from traditional OCR lines up like this.

AspectTraditional OCRMistral OCR 4
Main outputText from characters and tablesMarkdown text plus structured data
Position of textOften not providedReturns coordinates as bounding boxes
Block classificationLimitedSorts into titles, tables, equations, signatures, and more
Reading reliabilityOften not shownConfidence scores per page and per word
Handling downstreamNeeds custom cleanup and splittingEasy to pass straight to search and data work

*The traditional-OCR column is a general characterization; coverage varies by product.

On top of that, because confidence scores come per page and per word, you can focus later checks on the spots where the reading looks shaky. Where the old approach only transcribed characters, OCR 4 returns "what is where, and how certain" all at once — which is what makes it easy to work with in practice.

What Mistral OCR 4 Can Do and Its Benchmark Performance

Mistral OCR 4 public benchmarks and evaluation

Bars are scaled 0–100. OlmOCRBench and OmniDocBench are benchmark scores; the win rate is the share by which human reviewers preferred it over leading OCR and document-AI systems. The metrics differ, so they are not directly comparable side by side. Figures are Mistral's official values (as of June 2026).

OlmOCRBench score85.20
OmniDocBench score93.07
Avg. human-preference win rate (%)72

You can gauge what OCR 4 can do from both its breadth of coverage and its public benchmarks. This section looks at multilingual support and block classification, how confidence scores are put to use, and how it scores on benchmarks.

170-Language Support and Block Classification

OCR 4 covers a wide range of languages. It supports 170 languages and spans 10 language groups. That makes it easy to use even when you want to process multilingual documents, including Japanese, together.

View official source →
170 languages across 10 language groups — from the Mistral OCR 4 announcement

Block classification helps in practice too. Because titles, tables, equations, and signatures come back separately, it is easier to build later steps that, say, pull out only the tables to load into a spreadsheet, or just check whether a signature field is present. Rather than treating a whole document as one lump of text, you can target and extract the parts you need.

Confidence Scores and Source-Grounded Citations (RAG and Grounding)

Block types and confidence scores pair well with setups that have an AI read documents and answer from them. Block types and confidence scores drive source-grounded citations, redactions, and human-in-the-loop verification.

View official source →
block types and confidence scores drive source-grounded citations, redactions, and human-in-the-loop verification. — from the Mistral OCR 4 announcement

Confidence scores come not only per page but per word. So you can have a person check only the spots where a single wrong character would be critical, such as figures or proper nouns. Being able to apply human review to the low-confidence spots rather than eyeballing everything pays off most when you handle documents at scale.

This is exactly what helps as a preprocessing step for RAG (retrieval-augmented generation — having an AI search your own documents and use them as grounds for its answers). Blocks that are sorted by type and easy to trace back to a source make higher-quality "retrieval units" to feed an AI. For the bigger picture of feeding your PDFs to an AI, see our guide on how to load PDFs into ChatGPT.

Benchmark Results (OlmOCRBench 85.20 and a 72% Win Rate)

In published evaluations, OCR 4 posts strong numbers. It scored 85.20 on the public OlmOCRBench, which Mistral reports as the top overall score among the models it tested.

View official source →
the top overall score amongst the models we tested on the public OlmOCRBench (85.20) — from the Mistral OCR 4 announcement

It also scores high on a broad document-parsing benchmark. On OmniDocBench, OCR 4 records a score of 93.07.

View official source →
On OmniDocBench, OCR 4 achieves a score of 93.07. — from the Mistral OCR 4 announcement

Human reviewers preferred it too. According to Mistral, independent annotators preferred OCR 4 over every leading OCR and document-AI system it tested, with an average win rate of 72%. That said, benchmark numbers shift with the systems compared and the test conditions. Since each metric measures something different, it is more practical to weight the evaluations closest to your own use.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

Mistral OCR 4 Pricing and How to Use the API

Mistral OCR 4 pricing (per 1,000 pages, USD)

Bars are the price in dollars. The Batch API is a discount for processing in bulk. Document AI adds structured annotations. Source: Mistral official (as of June 2026).

Batch API (bulk processing)$2
Standard OCR$4
Document AI (annotated)$5

Pricing and usage are essential when you weigh adoption. This section covers the per-page price, how the API is called, and self-hosting and the platforms that offer it.

Pricing ($4 per 1,000 Pages, $2 with Batch)

OCR 4 is billed by usage, based on the number of pages processed. Standard OCR is $4 per 1,000 pages, and processing in bulk with the Batch API brings it to $2 per 1,000 pages.

View official source →
$4 per 1,000 pages, dropping to $2 with the Batch-API discount. — from the Mistral OCR 4 announcement (Pricing)
PlanPrice (per 1,000 pages)Main use
Standard OCR$4Everyday document reading
Batch API$2Processing large volumes in bulk
Document AI (annotated)$5Adds structured annotations

If you can process large volumes that are not time-critical in bulk, the Batch API cuts the cost in half. Pricing can be revised, so check the official pricing page before you commit.

How to Use the API (mistral-ocr-latest and Supported Formats)

OCR 4 works with a single API call. From code, you call client.ocr.process() and set the model name to mistral-ocr-latest. To use OCR 4–specific features, set the model name to mistral-ocr-4-0 or newer.

View official source →
client.ocr.process() … model="mistral-ocr-latest" — from the OCR Processor documentation

You can pass the input as a public URL (document_url), as Base64-encoded data, or as a file uploaded to the cloud. What comes back is one bundle of data: the Markdown text, per-page information, images, tables, blocks, and confidence scores.

The range of input formats is broad: documents such as PDF, PowerPoint, and Word, and images such as PNG, JPEG, and AVIF.

View official source →
pdf, pptx, docx and more... / png, jpeg/jpg, avif and more... — from the OCR Processor documentation (supported formats)

When you want to describe figures or charts, you can add an annotation step. There are two modes. One describes each extracted image individually (bbox_annotation), processing each box separately after OCR. The other handles the whole document together (document_annotation), processing the Markdown text alongside the extracted images. You can use per-part annotation and whole-document summaries depending on the task.

Before you process anything through the API, if you just want to tidy a PDF into Markdown, the following browser-only tool is the quick way.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

Self-Hosting and the Platforms That Offer It

OCR 4 is flexible in how it is offered. It can be packaged into a single container and run fully self-hosted on your own infrastructure. For organizations that want to process sensitive documents without sending them outside, that self-hosting option is a good fit.

There are several platforms, too. Beyond Mistral's own Mistral Studio and API, it is available through Amazon SageMaker and Microsoft Foundry, and it is planned for Snowflake's Parse Document. You can pick the entry point that matches the cloud you already use. For where Mistral sits within generative AI overall, reading it alongside our guide on what Claude is (Anthropic's generative AI) helps put it in context.

Where Mistral OCR 4 Fits, Cautions, and Summary

Where Mistral OCR 4 fits

Document digitization … turn invoices, contracts, and forms into text and tables while keeping structure
RAG preprocessing … prepare classified, source-traceable blocks as easy "retrieval units" for AI search
Agent input … the reading step that starts form filling, invoice processing, and compliance checks
Enterprise search ingestion … the front end that takes large volumes of internal documents into a searchable form

To close, here is where Mistral OCR 4 delivers the most value, what to check before adopting it, and the sources.

Where It Fits (Document Digitization and RAG Preprocessing)

OCR 4 is built on the premise that you not only "read" a document but "hand it to the next step." Mistral positions it as the part that moves AI agents from merely reading documents to acting on them — form filling, invoice processing, and compliance checks.

View official source →
agents move from reading documents to acting on them (form filling, invoice processing, compliance checks) — from the Mistral OCR 4 announcement

In concrete terms, it shines at digitizing paper and PDF forms into a core system, shaping papers and reports into a searchable form, and having an AI answer from internal documents with grounds. For the whole flow of digitizing paper documents, see our guide to digitizing business documents as well. If you want to try the flow of feeding documents to an AI, starting by converting PDFs into a clean form makes it easier to get going.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

What to Check Before Using It

For all its convenience, there are a few things worth keeping in mind before adopting it. OCR 4's reading is described as highly accurate, but it is safer to build on the assumption that low-confidence spots and important figures get a final human check. Mistral itself anticipates uses that support human review.

Pricing, model names, and the platforms on offer can also change. In particular, if you pin a model name in production, confirm in the official documentation that the model is still offered and revisit it as needed. If cost is a concern, simply routing non-urgent jobs through the Batch API already lowers the unit price. Once your use and document volume settle, reselecting which fits — standard, batch, or self-hosted — keeps things lean.

Mistral OCR 4 Summary

Mistral OCR 4 is a document AI that not only reads characters but returns position, type, and reliability as structured data. Localizing with bounding boxes and classifying blocks is what underpins the quality of downstream work such as RAG, agents, and internal-document search. Pricing runs from $2 to $5 per 1,000 pages, and you can choose batch processing or self-hosting depending on the use. Starting from document digitization, it is a strong option when you want to broaden how you use AI.

Before feeding documents to an AI, tidying a PDF into Markdown while keeping the structure of headings and tables tends to make the downstream processing more stable. When you want to keep it entirely in the browser, the following tool helps.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

FAQ

Q. What is Mistral OCR 4?
It is a document-reading (OCR) AI that Mistral released on June 23, 2026. Beyond pulling out text, it returns bounding boxes that mark where text sits on the page, typed-block classification for titles, tables, and equations, and confidence scores that show how reliable each reading is.
returns bounding boxes, typed-block classification (titles, tables, equations, signatures, and more), and inline confidence scores Mistral AI Official News (Mistral OCR 4)
Q. What is a bounding box, and why does it matter?
It is a coordinate box that marks where extracted text or a figure sits on the page. Because you can highlight exactly which part of the source was used, it helps with source-grounded citations and with building reliable data pipelines.
Bounding boxes, our most-requested capability, localize text for in-context highlighting and reliable data pipelines. Mistral AI Official News (Mistral OCR 4)
Q. How much does Mistral OCR 4 cost?
Standard OCR is $4 per 1,000 pages, dropping to $2 per 1,000 pages with the Batch-API discount. Document AI, which adds structured annotations, is $5 per 1,000 pages. Pricing can change, so check the official page for the latest.
$4 per 1,000 pages, dropping to $2 with the Batch-API discount. Mistral AI Official News (Mistral OCR 4)
Q. How many languages does Mistral OCR 4 support?
It supports 170 languages across 10 language groups, so it works well even when you need to process multilingual documents together.
170 languages across 10 language groups Mistral AI Official News (Mistral OCR 4)
Q. How is it different from traditional OCR?
Traditional OCR mainly turns characters and tables into text. OCR 4 goes further: it localizes each block with a bounding box and returns it as structured data classified by type, so the output is easy to use downstream.
Each block is localized with a bounding box, classified by type Mistral AI Official News (Mistral OCR 4)
Q. Can I run Mistral OCR 4 on my own servers?
Yes. It can be deployed in a single container and run on your own infrastructure, which suits teams that want to process sensitive documents without sending them outside.
As a compact model deployable in a single container, it is suited to both cost-sensitive and high-volume deployments. Mistral AI Official News (Mistral OCR 4)
Q. Which file formats does Mistral OCR 4 support?
Documents such as PDF, PowerPoint, and Word, and images such as PNG, JPEG, and AVIF. The official documentation lists the full set of supported formats.
pdf, pptx, docx and more... / png, jpeg/jpg, avif and more... Mistral Official Docs (OCR Processor)

Related Tools

Related Tool Categories

Articles