sakutto
Generative AI

PP-OCRv6: Accuracy, Speed, and How to Use the 50-Language OCR Model

PaddleOCROCRText Recognition

PP-OCRv6 Features and Multilingual OCR

PP-OCRv6's defining traits are a unified architecture that processes 50 languages with one model, and three model sizes matched to your use case. This section covers how it works and what improved over the previous generation.

The three tiers of PP-OCRv6

Tiny

1.5M parameters

49 languages (no Japanese)

Edge / IoT

0.96s on Apple M4

Small

7.7M parameters

50 languages

Mobile / desktop

Balanced

Medium

34.5M parameters

50 languages

Server / top accuracy

0.29s on A100

What PP-OCRv6 Is — PaddleOCR's Latest Text Recognition Model

PP-OCRv6 is the latest-generation model in "PaddleOCR," the open-source OCR framework developed by Baidu. It was released as PaddleOCR v3.7.0 on June 11, 2026.

OCR stands for Optical Character Recognition — the technology that extracts text from images and PDFs as machine-readable data. It is used to make scanned paper documents searchable, or to read text from signs captured in photos.

PP-OCRv6 is built on a newly designed unified backbone called PPLCNetV4. Both text detection (finding where the text is) and text recognition (determining what each character is) are handled through this single backbone.

View official source →
PP-OCRv6 is the latest generation of PaddleOCR's universal OCR model family. The model family scales from 1.5M to 34.5M parameters, with three tiers: tiny, small, and medium. — From the PP-OCRv6 overview

Despite being small at just 1.5M to 34.5M parameters, PP-OCRv6 surpasses VLMs (vision-language models) on the scale of 235 billion parameters in recognition accuracy — its most striking trait.

Recognizing 50 Languages with One Multilingual Model

The Medium and Small models of PP-OCRv6 recognize 50 languages with a single unified model. Supported languages include Simplified Chinese, Traditional Chinese, English, and Japanese, plus 46 Latin-script languages such as French, German, and Spanish.

Conventional OCR models required downloading a separate model per language and switching between them depending on the target. With PP-OCRv6, you load one model to process multilingual documents. In environments where documents in several languages are mixed, the overhead of switching languages turns directly into processing time and management cost. A design that only needs one model to be loaded removes that switching cost entirely.

View official source →
50 languages with a single unified model, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages (tiny supports 49, excluding Japanese). — From the language support description

One point that often trips people up is the Tiny model's language support. Tiny covers only 49 languages, excluding Japanese. If you handle Japanese documents, you need to choose Small or higher.

The dictionary adds roughly 200 diacritical characters (accented characters), allowing accurate character discrimination in Latin-script languages.

Detection and Recognition Architecture Improved Over PP-OCRv5

In PP-OCRv6, both the detection and recognition modules were redesigned.

For text detection, RepLKFPN — a lightweight large-kernel feature pyramid network — was introduced. With a design that has a wide 7×7 receptive field, the detection module's parameter count was cut by about 31%, from 172K in v5 to 118K. Even with fewer parameters, accuracy went up: detection Hmean improved by +4.6 points, from 81.6% in v5 to 86.2%.

For text recognition, EncoderWithLightSVTR was adopted. It combines local features and a global attention mechanism through additive skip connections. Recognition accuracy improved by +5.1 points, from 78.1% in v5 to 83.2%.

View official source →
Backbone: PPLCNetV4. Detection Neck: RepLKFPN (7×7 receptive field; 118K parameters vs. PP-OCRv5's 172K). Recognition Neck: EncoderWithLightSVTR (local-global attention with additive skip connections). — From the Architecture section

PP-OCRv6 Accuracy and Speed Benchmarks

Let us look at PP-OCRv6's accuracy and speed using the benchmark data in the official documentation, comparing both with the previous generation v5 and with large VLMs.

PP-OCRv6 vs PP-OCRv5 accuracy (Medium model)

Detection
v5: 81.6%
v6: 86.2% (+4.6pt)
Recognition
v5: 78.1%
v6: 83.2% (+5.1pt)

Comparing Detection and Recognition Accuracy with PP-OCRv5

The official documentation publishes a multi-scenario benchmark spanning 16 detection categories and 15 recognition categories. The table below compares the average accuracy of each tier with PP-OCRv5_server.

ModelParametersDetection Hmean (avg)Recognition (weighted avg)
PP-OCRv6 Medium34.5M86.2%83.2%
PP-OCRv6 Small7.7M84.1%81.3%
PP-OCRv6 Tiny1.5M80.6%73.5%
PP-OCRv5 Server81.6%78.1%
View official source →
PP-OCRv6_medium: AVG 86.2, PP-OCRv6_small: AVG 84.1, PP-OCRv6_tiny: AVG 80.6, PP-OCRv5_server: AVG 81.6 — From the Text Detection Hmean (%) Multi-Scenario Benchmark

What stands out is the Small model (7.7M parameters): it beats v5_server in both detection and recognition. Even with far fewer parameters, Small surpasses the previous generation's server model, so you can expect ample accuracy on desktop too. That is the basis for choosing Small in a desktop environment.

Looking scenario by scenario, some areas improved especially sharply. In recognition, Japanese rose by 16.8 points from 73.7% in v5_server to 90.5%, screen text such as digital displays and screen captures rose by 14.4 points from 68.1% to 82.5%, ancient documents by 12.0 points from 60.4% to 72.4%, and printed English by 9.0 points from 85.1% to 94.1%. On the detection side, rotated text improved by 13.8 points from 80.0% to 93.8%, and industrial text by 9.0 points from 64.3% to 73.3%. By contrast, handwritten Chinese stays at 62.1% (v5_server 58.0%), so handwriting remains a weak spot.

View official source →
PP-OCRv6_medium recognition: JP 90.5, Screen 82.5, Anc. 72.4, Print-EN 94.1. PP-OCRv5_server: JP 73.7, Screen 68.1, Anc. 60.4, Print-EN 85.1. — From the Text Recognition Accuracy (%) Multi-Scenario Benchmark

Accuracy Comparison with Large VLMs

PP-OCRv6's official benchmark also compares it with large VLMs (vision-language models) such as Qwen3-VL-235B and GPT-5.5.

VLMs are general-purpose image-understanding models that can do more than OCR, including describing image content and answering questions. Their parameter counts are orders of magnitude larger — tens to hundreds of billions — and they need high-end GPUs for inference.

With just 34.5M parameters, the PP-OCRv6 Medium model achieves higher OCR accuracy than these large VLMs. For the specific task of text recognition, a specialized model with parameters numbering in the thousandths of the VLMs' size outperforms a general giant — an effect of task specialization rather than architectural superiority.

View official source →
PP-OCRv6_medium with 34.5M parameters...surpasses VLMs such as Qwen3-VL-235B and GPT-5.5 in accuracy. — From the comparison with VLMs

That said, VLMs offer capabilities beyond OCR, such as layout understanding and grasping the meaning of image content. For pure text extraction use PP-OCRv6, and for understanding the whole image or context-dependent processing use a VLM — choose based on your goal.

Inference Speed Benchmarks by Hardware

Inference speed is compared as the time per image. The table below shows the end-to-end inference speed published in the official documentation.

Hardwarev6 Mediumv6 Smallv6 Tinyv5 Serverv5 Mobile
NVIDIA A1000.29s0.25s0.13s0.32s0.25s
NVIDIA V1000.72s0.49s0.21s0.66s0.50s
Intel Xeon 8350C2.05s0.79s0.32s2.04s0.80s
Apple M48.82s3.07s0.96s>10s5.82s
View official source →
NVIDIA A100: v6_medium 0.29s, v6_tiny 0.13s. Apple M4: v6_tiny 0.96s vs v5_mobile 5.82s (6.1× speedup). — From the End-to-End Inference Speed (s/image)

On an Apple M4, the Tiny model runs in 0.96 seconds — under a second — a 6.1× speedup over the v5 Mobile model (5.82 seconds). On a GPU (A100), even the Medium model processes in 0.29 seconds, which is plenty of throughput for batch processing large volumes of documents.

Even on a CPU-only setup, Tiny runs in 0.32 seconds on an Intel Xeon. There are many cases where Tiny is enough even without a GPU.

The choice of inference backend also changes speed. On an Intel Xeon, where the standard Paddle Inference backend takes 2.05 seconds for Medium, switching to the OpenVINO backend brings it down to 1.40 seconds, and Tiny to 0.20 seconds. On an Apple M4, Tiny runs in 0.35 seconds with ONNX Runtime — less than half the standard backend's 0.96 seconds. The official docs cite up to a 2.37× GPU inference speedup, so the combination of hardware and backend can raise effective speed further.

View official source →
Intel Xeon 8350C OpenVINO: v6_medium 1.40s, v6_tiny 0.20s. Apple M4 ONNX Runtime: v6_tiny 0.35s. "2.37× GPU inference speedup." — From the End-to-End Inference Speed table / performance highlights

How to Install PP-OCRv6 and Choose a Model

PP-OCRv6 can be installed from Python's package manager, and you can extract text from an image in just a few lines of code. This section explains the setup steps and how to choose among the three model tiers.

The PP-OCRv6 setup flow

1. Install pip install paddleocr

2. Run code PaddleOCR() → ocr.predict(image path)

3. Get results detection boxes + recognized text + confidence

From pip install to Extracting OCR Text

PP-OCRv6 is published on PyPI as the paddleocr package and runs on Python 3.8–3.13. Installation is a single line.

pip install paddleocr

After installing, the following code extracts text from an image.

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
)
result = ocr.predict("sample.png")

for res in result:
    res.print()
    res.save_to_json("output")
View official source →
from paddleocr import PaddleOCR

ocr = PaddleOCR( use_doc_orientation_classify=False, use_doc_unwarping=False, use_textline_orientation=False, ) result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png") — From the Quick Start code example

use_doc_orientation_classify and use_doc_unwarping are options for document orientation and dewarping correction. Setting them to False when you do not need them makes processing lighter.

You can choose from several inference backends. Besides the default Paddle Inference, you can use the Hugging Face Transformers backend with engine="transformers" and the ONNX Runtime backend with engine="onnxruntime". Choosing ONNX Runtime removes the need to install PaddlePaddle itself, which is handy when you want to keep the environment simple.

If you keep your PDFs and images in a text-ready state, it becomes easier to feed OCR results into search or data-entry automation.

Choosing Between Tiny, Small, and Medium

The three tiers are chosen by the trade-off between parameter count and accuracy. The table below summarizes how to choose by use case.

Use caseRecommended tierReason
Mobile apps / edge devicesTiny (1.5M)Lightest and fastest. Ideal if Japanese is not needed
Desktop apps / web servicesSmall (7.7M)Good balance of accuracy and speed. 50 languages
Server-side batch processingMedium (34.5M)Highest accuracy. Suited to high-volume GPU processing
Jobs that include JapaneseSmall or higherTiny does not support Japanese

The accuracy gap shows up most clearly in handwriting and industrial text (digital displays, dot-matrix characters, and so on). The Medium model's handwritten-Chinese recognition is 62.1%, whereas Tiny drops to 40.1%. For handwriting, Medium is the safe choice.

On the other hand, for printed English text, even Tiny achieves 88.4% recognition — enough for digitizing receipts and invoices. When you compress or convert image formats as preprocessing, be careful not to degrade quality too much before OCR.

Benefits and Caveats of PP-OCRv6

Here we summarize PP-OCRv6's strengths and the constraints to understand before adopting it.

The first benefit is simpler operations thanks to the 50-language unified model. You no longer need to manage a model per language, and you can process even mixed-language environments without switching models.

Another major benefit is that it is free, including for commercial use, because it is released under the Apache License 2.0. Cloud OCR services (such as Google Cloud Vision API and Amazon Textract) incur usage-based fees, while PP-OCRv6 is self-hosted with no extra cost. The cost difference grows the more documents you process.

Its parameter efficiency also stands out. Because it delivers accuracy above VLMs with 34.5M parameters, it is easy to adopt even in environments with tight GPU memory.

As for caveats, handwriting recognition is lower than for printed text. Even the Medium model is 62.1% for handwritten Chinese and 67.8% for handwritten English, so if reading handwritten notes is your main goal, verify the accuracy with real data first.

Also, PP-OCRv6 specializes in text detection and recognition, so layout analysis (recognizing table structure and reading order) requires separately combining PaddleOCR's layout-analysis module. In a workflow that converts PDFs to images before OCR, building in the layout-analysis combination from the start saves trouble later.

Finally, you cannot use Tiny for Japanese. Tiny does not support Japanese (49 languages), so choose Small or higher for documents that include Japanese.

View official source →
License: Apache-2.0. Current Version: 3.7.0. Supported Python: 3.8, 3.9, 3.10, 3.11, 3.12, 3.13. — From the package information
View official source →
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. — From the repository description

FAQ

Q. What is PP-OCRv6?
It is the latest OCR (optical character recognition) model from PaddlePaddle, released on June 11, 2026. It recognizes 50 languages with a single model and comes in three tiers with different parameter counts (Tiny, Small, and Medium).
PP-OCRv6 is the latest generation of PaddleOCR's universal OCR model family. The model family scales from 1.5M to 34.5M parameters, with three tiers: tiny, small, and medium. Hugging Face Blog — PP-OCRv6
Q. Is PP-OCRv6 free to use?
Yes. It is released under the Apache License 2.0, so it is free for commercial use. You can start using it simply by running pip install paddleocr.
License: Apache-2.0 PyPI — paddleocr
Q. How many languages does PP-OCRv6 support?
The Medium and Small tiers support 50 languages, covering Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages with a single model. The Tiny tier supports 49 languages, excluding Japanese.
50 languages with a single unified model, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages (tiny supports 49, excluding Japanese). PaddleOCR Official Documentation — PP-OCRv6 Introduction
Q. What is the difference between PP-OCRv6 and PP-OCRv5?
Text detection improves by +4.6 points and recognition by +5.1 points. The backbone was redesigned as PPLCNetV4, and the unified architecture now covers 50 languages in a single model.
Compared with PP-OCRv5_server, it improves text detection by +4.6 percentage points and text recognition by +5.1 percentage points. Hugging Face Blog — PP-OCRv6
Q. How do I choose between Tiny, Small, and Medium?
Tiny (1.5M parameters) suits edge devices and mobile apps where speed matters most. Small (7.7M) is a balanced choice for desktop and web apps. Medium (34.5M) is for server environments where accuracy is the priority.
PP-OCRv6 offers tiny/small/medium tiers (1.5M–34.5M parameters). Hugging Face Blog — PP-OCRv6
Q. Can PP-OCRv6 run without a GPU?
Yes. It runs on CPU as well. The Tiny model runs in 0.96 seconds on an Apple M4, and the Medium model in 2.05 seconds on an Intel Xeon, so you get practical speeds even without a GPU.
Apple M4: 0.96s (tiny), Intel Xeon 8350C: 2.05s (medium) PaddleOCR Official Documentation — PP-OCRv6 Introduction
Q. Can PP-OCRv6 recognize handwriting?
Yes, it supports handwriting recognition, but accuracy is lower than for printed text: 62.1% for handwritten Chinese and 67.8% for handwritten English with the Medium model. If handwriting is your main use case, test whether the accuracy fits your needs first.
HW-CN: 62.1%, HW-EN: 67.8% PaddleOCR Official Documentation — benchmark tables
Q. How is PP-OCRv6 different from large VLMs?
PP-OCRv6 is a lightweight OCR-specific model (up to 34.5M parameters) that surpasses 235-billion-parameter models like Qwen3-VL and GPT-5.5 in recognition accuracy. VLMs handle general image understanding, but for text recognition accuracy and speed, PP-OCRv6 is stronger.
surpasses VLMs such as Qwen3-VL-235B and GPT-5.5 in accuracy. PaddleOCR Official Documentation — PP-OCRv6 Introduction
Q. Can PP-OCRv6 be used commercially?
Yes. It is released under the Apache License 2.0, so commercial use is allowed. You can freely use, modify, and redistribute it within the license terms, with no restrictions on embedding it in your own products.
License: Apache-2.0 PyPI — paddleocr

Related Tools

Related Tool Categories

Articles