What is a bounding box, and why does it matter?

It is a coordinate box that marks where extracted text or a figure sits on the page. Because you can highlight exactly which part of the source was used, it helps with source-grounded citations and with building reliable data pipelines.

How much does Mistral OCR 4 cost?

Standard OCR is $4 per 1,000 pages, dropping to $2 per 1,000 pages with the Batch-API discount. Document AI, which adds structured annotations, is $5 per 1,000 pages. Pricing can change, so check the official page for the latest.

How many languages does Mistral OCR 4 support?

It supports 170 languages across 10 language groups, so it works well even when you need to process multilingual documents together.

How is it different from traditional OCR?

Traditional OCR mainly turns characters and tables into text. OCR 4 goes further: it localizes each block with a bounding box and returns it as structured data classified by type, so the output is easy to use downstream.

Can I run Mistral OCR 4 on my own servers?

Yes. It can be deployed in a single container and run on your own infrastructure, which suits teams that want to process sensitive documents without sending them outside.

Which file formats does Mistral OCR 4 support?

Documents such as PDF, PowerPoint, and Word, and images such as PNG, JPEG, and AVIF. The official documentation lists the full set of supported formats.

What Is Mistral OCR 4? Bounding-Box Document AI Explained

Q: What is Mistral OCR 4?

It is a document-reading (OCR) AI that Mistral released on June 23, 2026. Beyond pulling out text, it returns bounding boxes that mark where text sits on the page, typed-block classification for titles, tables, and equations, and confidence scores that show how reliable each reading is.

Article summary

Mistral OCR 4, released on June 23, 2026, is a document AI that returns bounding boxes, block classification, and confidence scores on top of plain text.
A bounding box marks where text or a figure sits, which helps with source-grounded citations and reliable data pipelines.
It supports 170 languages. It scored 85.20 on the public OlmOCRBench, and human reviewers preferred it over leading OCR systems with an average win rate of 72%.
Pricing is $4 per 1,000 pages, or $2 with the Batch API. Document AI, with structured annotations, is $5.
The API is a single call and accepts PDFs, images, and Office files. It can also run self-hosted in one container.
It fits RAG, agents, and the ingestion step of enterprise search. It is also available through platforms such as Microsoft Foundry.

What Mistral OCR 4 Is (Bounding-Box Document AI)

What Mistral OCR 4 returns from a single document

Input: PDFs, images, and Office files (Word / PowerPoint, etc.)

↓ Read by Mistral OCR 4

Text　… extracted as Markdown, keeping the structure of headings and tables

Bounding boxes　… coordinates for where each element sits on the page

Block classification　… types such as titles, tables, equations, and signatures

Confidence scores　… how reliable the reading is, per page and per word

Mistral OCR 4 is a document-reading (OCR) AI model that the French AI company Mistral AI released on June 23, 2026. OCR (Optical Character Recognition) is the technology that converts the characters inside an image or PDF into text a computer can work with. Let's start with what OCR 4 returns, what a bounding box is, and how it differs from traditional OCR.

What Mistral OCR 4 Is and the Data It Returns

Mistral OCR 4 pulls the contents of a document out as structured data, not as a flat run of characters. Along with the extracted text, it returns bounding boxes, typed blocks, and confidence scores that show how reliable each reading is.

Mistral AI Official News (Mistral OCR 4)View official source →

returns bounding boxes, typed-block classification (titles, tables, equations, signatures, and more), and inline confidence scores — from the Mistral OCR 4 announcement

In other words, when you feed it a paper invoice, a contract, or a research PDF, the prose comes back as Markdown text, tables come back as tables, and equations come back as equations — each tagged with what kind of block it is and where it sits on the page. This structured output sharply cuts the effort of handling the data downstream.

What a Bounding Box Is (a Box That Marks Where Text Sits)

A bounding box is a rectangular coordinate box that marks where extracted text or a figure sits on the page. In OCR 4, each block comes back with top-left and bottom-right coordinates (top_left_x, top_left_y, bottom_right_x, bottom_right_y). With this position information, you can point with a highlight to exactly which part of the source an AI answer is based on.

Mistral AI Official News (Mistral OCR 4)View official source →

Bounding boxes, our most-requested capability, localize text for in-context highlighting and reliable data pipelines. — from the Mistral OCR 4 announcement

Mistral calls bounding boxes its most-requested capability. If you ask an AI "where is the termination clause in this contract?" and it can show the spot in the original with a box rather than just an answer, the content is far easier for a person to verify. In practice, when you run invoices and contracts through an AI, it can read the characters but often cannot trace back to "where it was written," which slows down review. Bounding boxes are what cut that rework. The less room for error in automated document processing, the more this ability to trace "what was read, and where" pays off.

How It Differs from Traditional OCR (Block Classification and Confidence Scores)

Traditional OCR mainly aimed to turn characters and tables into text. OCR 4 takes a step further: it splits what it reads into units called blocks and classifies each one by type. Each block is localized with a bounding box and returned as structured data classified into types such as title, table, equation, and signature.

Mistral AI Official News (Mistral OCR 4)View official source →

Each block is localized with a bounding box, classified by type, and inline confidence scores are generated per-page and per-word. — from the Mistral OCR 4 announcement

The difference from traditional OCR lines up like this.

Aspect	Traditional OCR	Mistral OCR 4
Main output	Text from characters and tables	Markdown text plus structured data
Position of text	Often not provided	Returns coordinates as bounding boxes
Block classification	Limited	Sorts into titles, tables, equations, signatures, and more
Reading reliability	Often not shown	Confidence scores per page and per word
Handling downstream	Needs custom cleanup and splitting	Easy to pass straight to search and data work

*The traditional-OCR column is a general characterization; coverage varies by product.

On top of that, because confidence scores come per page and per word, you can focus later checks on the spots where the reading looks shaky. Where the old approach only transcribed characters, OCR 4 returns "what is where, and how certain" all at once — which is what makes it easy to work with in practice.

What Mistral OCR 4 Can Do and Its Benchmark Performance

Mistral OCR 4 public benchmarks and evaluation

Bars are scaled 0–100. OlmOCRBench and OmniDocBench are benchmark scores; the win rate is the share by which human reviewers preferred it over leading OCR and document-AI systems. The metrics differ, so they are not directly comparable side by side. Figures are Mistral's official values (as of June 2026).

OlmOCRBench score85.20

OmniDocBench score93.07

Avg. human-preference win rate (%)72

You can gauge what OCR 4 can do from both its breadth of coverage and its public benchmarks. This section looks at multilingual support and block classification, how confidence scores are put to use, and how it scores on benchmarks.

170-Language Support and Block Classification

OCR 4 covers a wide range of languages. It supports 170 languages and spans 10 language groups. That makes it easy to use even when you want to process multilingual documents, including Japanese, together.

Mistral AI Official News (Mistral OCR 4)View official source →

170 languages across 10 language groups — from the Mistral OCR 4 announcement

Block classification helps in practice too. Because titles, tables, equations, and signatures come back separately, it is easier to build later steps that, say, pull out only the tables to load into a spreadsheet, or just check whether a signature field is present. Rather than treating a whole document as one lump of text, you can target and extract the parts you need.

Confidence Scores and Source-Grounded Citations (RAG and Grounding)

Block types and confidence scores pair well with setups that have an AI read documents and answer from them. Block types and confidence scores drive source-grounded citations, redactions, and human-in-the-loop verification.

Mistral AI Official News (Mistral OCR 4)View official source →

block types and confidence scores drive source-grounded citations, redactions, and human-in-the-loop verification. — from the Mistral OCR 4 announcement

Confidence scores come not only per page but per word. So you can have a person check only the spots where a single wrong character would be critical, such as figures or proper nouns. Being able to apply human review to the low-confidence spots rather than eyeballing everything pays off most when you handle documents at scale.

This is exactly what helps as a preprocessing step for RAG (retrieval-augmented generation — having an AI search your own documents and use them as grounds for its answers). Blocks that are sorted by type and easy to trace back to a source make higher-quality "retrieval units" to feed an AI. For the bigger picture of feeding your PDFs to an AI, see our guide on how to load PDFs into ChatGPT.

Benchmark Results (OlmOCRBench 85.20 and a 72% Win Rate)

In published evaluations, OCR 4 posts strong numbers. It scored 85.20 on the public OlmOCRBench, which Mistral reports as the top overall score among the models it tested.

Mistral AI Official News (Mistral OCR 4)View official source →

the top overall score amongst the models we tested on the public OlmOCRBench (85.20) — from the Mistral OCR 4 announcement

It also scores high on a broad document-parsing benchmark. On OmniDocBench, OCR 4 records a score of 93.07.

Mistral AI Official News (Mistral OCR 4)View official source →

On OmniDocBench, OCR 4 achieves a score of 93.07. — from the Mistral OCR 4 announcement

Human reviewers preferred it too. According to Mistral, independent annotators preferred OCR 4 over every leading OCR and document-AI system it tested, with an average win rate of 72%. That said, benchmark numbers shift with the systems compared and the test conditions. Since each metric measures something different, it is more practical to weight the evaluations closest to your own use.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

Mistral OCR 4 Pricing and How to Use the API

Mistral OCR 4 pricing (per 1,000 pages, USD)

Bars are the price in dollars. The Batch API is a discount for processing in bulk. Document AI adds structured annotations. Source: Mistral official (as of June 2026).

Batch API (bulk processing)$2

Standard OCR$4

Document AI (annotated)$5

Pricing and usage are essential when you weigh adoption. This section covers the per-page price, how the API is called, and self-hosting and the platforms that offer it.

Pricing ($4 per 1,000 Pages, $2 with Batch)

OCR 4 is billed by usage, based on the number of pages processed. Standard OCR is $4 per 1,000 pages, and processing in bulk with the Batch API brings it to $2 per 1,000 pages.

Mistral AI Official News (Mistral OCR 4)View official source →

$4 per 1,000 pages, dropping to $2 with the Batch-API discount. — from the Mistral OCR 4 announcement (Pricing)

Plan	Price (per 1,000 pages)	Main use
Standard OCR	$4	Everyday document reading
Batch API	$2	Processing large volumes in bulk
Document AI (annotated)	$5	Adds structured annotations

If you can process large volumes that are not time-critical in bulk, the Batch API cuts the cost in half. Pricing can be revised, so check the official pricing page before you commit.

How to Use the API (mistral-ocr-latest and Supported Formats)

OCR 4 works with a single API call. From code, you call client.ocr.process() and set the model name to mistral-ocr-latest. To use OCR 4–specific features, set the model name to mistral-ocr-4-0 or newer.

Mistral Official Docs (OCR Processor)View official source →

client.ocr.process() … model="mistral-ocr-latest" — from the OCR Processor documentation

You can pass the input as a public URL (document_url), as Base64-encoded data, or as a file uploaded to the cloud. What comes back is one bundle of data: the Markdown text, per-page information, images, tables, blocks, and confidence scores.

The range of input formats is broad: documents such as PDF, PowerPoint, and Word, and images such as PNG, JPEG, and AVIF.

Mistral Official Docs (OCR Processor)View official source →

pdf, pptx, docx and more... / png, jpeg/jpg, avif and more... — from the OCR Processor documentation (supported formats)

When you want to describe figures or charts, you can add an annotation step. There are two modes. One describes each extracted image individually (bbox_annotation), processing each box separately after OCR. The other handles the whole document together (document_annotation), processing the Markdown text alongside the extracted images. You can use per-part annotation and whole-document summaries depending on the task.

Before you process anything through the API, if you just want to tidy a PDF into Markdown, the following browser-only tool is the quick way.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

Self-Hosting and the Platforms That Offer It

OCR 4 is flexible in how it is offered. It can be packaged into a single container and run fully self-hosted on your own infrastructure. For organizations that want to process sensitive documents without sending them outside, that self-hosting option is a good fit.

There are several platforms, too. Beyond Mistral's own Mistral Studio and API, it is available through Amazon SageMaker and Microsoft Foundry, and it is planned for Snowflake's Parse Document. You can pick the entry point that matches the cloud you already use. For where Mistral sits within generative AI overall, reading it alongside our guide on what Claude is (Anthropic's generative AI) helps put it in context.

Where Mistral OCR 4 Fits, Cautions, and Summary

Where Mistral OCR 4 fits

Document digitization　… turn invoices, contracts, and forms into text and tables while keeping structure

RAG preprocessing　… prepare classified, source-traceable blocks as easy "retrieval units" for AI search

Agent input　… the reading step that starts form filling, invoice processing, and compliance checks

Enterprise search ingestion　… the front end that takes large volumes of internal documents into a searchable form

To close, here is where Mistral OCR 4 delivers the most value, what to check before adopting it, and the sources.

Where It Fits (Document Digitization and RAG Preprocessing)

OCR 4 is built on the premise that you not only "read" a document but "hand it to the next step." Mistral positions it as the part that moves AI agents from merely reading documents to acting on them — form filling, invoice processing, and compliance checks.

Mistral AI Official News (Mistral OCR 4)View official source →

agents move from reading documents to acting on them (form filling, invoice processing, compliance checks) — from the Mistral OCR 4 announcement

In concrete terms, it shines at digitizing paper and PDF forms into a core system, shaping papers and reports into a searchable form, and having an AI answer from internal documents with grounds. For the whole flow of digitizing paper documents, see our guide to digitizing business documents as well. If you want to try the flow of feeding documents to an AI, starting by converting PDFs into a clean form makes it easier to get going.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

What to Check Before Using It

For all its convenience, there are a few things worth keeping in mind before adopting it. OCR 4's reading is described as highly accurate, but it is safer to build on the assumption that low-confidence spots and important figures get a final human check. Mistral itself anticipates uses that support human review.

Pricing, model names, and the platforms on offer can also change. In particular, if you pin a model name in production, confirm in the official documentation that the model is still offered and revisit it as needed. If cost is a concern, simply routing non-urgent jobs through the Batch API already lowers the unit price. Once your use and document volume settle, reselecting which fits — standard, batch, or self-hosted — keeps things lean.

Mistral OCR 4 Summary

Mistral OCR 4 is a document AI that not only reads characters but returns position, type, and reliability as structured data. Localizing with bounding boxes and classifying blocks is what underpins the quality of downstream work such as RAG, agents, and internal-document search. Pricing runs from $2 to $5 per 1,000 pages, and you can choose batch processing or self-hosting depending on the use. Starting from document digitization, it is a strong option when you want to broaden how you use AI.

Before feeding documents to an AI, tidying a PDF into Markdown while keeping the structure of headings and tables tends to make the downstream processing more stable. When you want to keep it entirely in the browser, the following tool helps.

Free ToolPDF to Markdown ConverterConvert PDF content to Markdown format. Auto-detects headings, tables, and lists — ideal for RAG and AI workflows.Try it now →

What Is Mistral OCR 4? Bounding-Box Document AI Explained

What Mistral OCR 4 Is (Bounding-Box Document AI)

What Mistral OCR 4 Is and the Data It Returns

What a Bounding Box Is (a Box That Marks Where Text Sits)

How It Differs from Traditional OCR (Block Classification and Confidence Scores)

What Mistral OCR 4 Can Do and Its Benchmark Performance

170-Language Support and Block Classification

Confidence Scores and Source-Grounded Citations (RAG and Grounding)

Benchmark Results (OlmOCRBench 85.20 and a 72% Win Rate)

Mistral OCR 4 Pricing and How to Use the API

Pricing ($4 per 1,000 Pages, $2 with Batch)

How to Use the API (mistral-ocr-latest and Supported Formats)

Self-Hosting and the Platforms That Offer It

Where Mistral OCR 4 Fits, Cautions, and Summary

Where It Fits (Document Digitization and RAG Preprocessing)

What to Check Before Using It

Mistral OCR 4 Summary

FAQ

Related Tools

Related Tool Categories

Articles

ChatGPT's Market Share Drops Below 50% for the First Time: The New AI Landscape

ChatGPT Slack Integration: What the New Connector Actions Do

China's $295B AI Plan: Data Centers and 80% Domestic Chips Explained