PP-OCRv6 Features and Multilingual OCR
PP-OCRv6's defining traits are a unified architecture that processes 50 languages with one model, and three model sizes matched to your use case. This section covers how it works and what improved over the previous generation.
The three tiers of PP-OCRv6
Tiny
1.5M parameters
49 languages (no Japanese)
Edge / IoT
0.96s on Apple M4
Small
7.7M parameters
50 languages
Mobile / desktop
Balanced
Medium
34.5M parameters
50 languages
Server / top accuracy
0.29s on A100
What PP-OCRv6 Is — PaddleOCR's Latest Text Recognition Model
PP-OCRv6 is the latest-generation model in "PaddleOCR," the open-source OCR framework developed by Baidu. It was released as PaddleOCR v3.7.0 on June 11, 2026.
OCR stands for Optical Character Recognition — the technology that extracts text from images and PDFs as machine-readable data. It is used to make scanned paper documents searchable, or to read text from signs captured in photos.
PP-OCRv6 is built on a newly designed unified backbone called PPLCNetV4. Both text detection (finding where the text is) and text recognition (determining what each character is) are handled through this single backbone.
PP-OCRv6 is the latest generation of PaddleOCR's universal OCR model family. The model family scales from 1.5M to 34.5M parameters, with three tiers: tiny, small, and medium. — From the PP-OCRv6 overview
Despite being small at just 1.5M to 34.5M parameters, PP-OCRv6 surpasses VLMs (vision-language models) on the scale of 235 billion parameters in recognition accuracy — its most striking trait.
Recognizing 50 Languages with One Multilingual Model
The Medium and Small models of PP-OCRv6 recognize 50 languages with a single unified model. Supported languages include Simplified Chinese, Traditional Chinese, English, and Japanese, plus 46 Latin-script languages such as French, German, and Spanish.
Conventional OCR models required downloading a separate model per language and switching between them depending on the target. With PP-OCRv6, you load one model to process multilingual documents. In environments where documents in several languages are mixed, the overhead of switching languages turns directly into processing time and management cost. A design that only needs one model to be loaded removes that switching cost entirely.
50 languages with a single unified model, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages (tiny supports 49, excluding Japanese). — From the language support description
One point that often trips people up is the Tiny model's language support. Tiny covers only 49 languages, excluding Japanese. If you handle Japanese documents, you need to choose Small or higher.
The dictionary adds roughly 200 diacritical characters (accented characters), allowing accurate character discrimination in Latin-script languages.
Detection and Recognition Architecture Improved Over PP-OCRv5
In PP-OCRv6, both the detection and recognition modules were redesigned.
For text detection, RepLKFPN — a lightweight large-kernel feature pyramid network — was introduced. With a design that has a wide 7×7 receptive field, the detection module's parameter count was cut by about 31%, from 172K in v5 to 118K. Even with fewer parameters, accuracy went up: detection Hmean improved by +4.6 points, from 81.6% in v5 to 86.2%.
For text recognition, EncoderWithLightSVTR was adopted. It combines local features and a global attention mechanism through additive skip connections. Recognition accuracy improved by +5.1 points, from 78.1% in v5 to 83.2%.
Backbone: PPLCNetV4. Detection Neck: RepLKFPN (7×7 receptive field; 118K parameters vs. PP-OCRv5's 172K). Recognition Neck: EncoderWithLightSVTR (local-global attention with additive skip connections). — From the Architecture section
PP-OCRv6 Accuracy and Speed Benchmarks
Let us look at PP-OCRv6's accuracy and speed using the benchmark data in the official documentation, comparing both with the previous generation v5 and with large VLMs.
PP-OCRv6 vs PP-OCRv5 accuracy (Medium model)
Comparing Detection and Recognition Accuracy with PP-OCRv5
The official documentation publishes a multi-scenario benchmark spanning 16 detection categories and 15 recognition categories. The table below compares the average accuracy of each tier with PP-OCRv5_server.
| Model | Parameters | Detection Hmean (avg) | Recognition (weighted avg) |
|---|---|---|---|
| PP-OCRv6 Medium | 34.5M | 86.2% | 83.2% |
| PP-OCRv6 Small | 7.7M | 84.1% | 81.3% |
| PP-OCRv6 Tiny | 1.5M | 80.6% | 73.5% |
| PP-OCRv5 Server | — | 81.6% | 78.1% |
PP-OCRv6_medium: AVG 86.2, PP-OCRv6_small: AVG 84.1, PP-OCRv6_tiny: AVG 80.6, PP-OCRv5_server: AVG 81.6 — From the Text Detection Hmean (%) Multi-Scenario Benchmark
What stands out is the Small model (7.7M parameters): it beats v5_server in both detection and recognition. Even with far fewer parameters, Small surpasses the previous generation's server model, so you can expect ample accuracy on desktop too. That is the basis for choosing Small in a desktop environment.
Looking scenario by scenario, some areas improved especially sharply. In recognition, Japanese rose by 16.8 points from 73.7% in v5_server to 90.5%, screen text such as digital displays and screen captures rose by 14.4 points from 68.1% to 82.5%, ancient documents by 12.0 points from 60.4% to 72.4%, and printed English by 9.0 points from 85.1% to 94.1%. On the detection side, rotated text improved by 13.8 points from 80.0% to 93.8%, and industrial text by 9.0 points from 64.3% to 73.3%. By contrast, handwritten Chinese stays at 62.1% (v5_server 58.0%), so handwriting remains a weak spot.
PP-OCRv6_medium recognition: JP 90.5, Screen 82.5, Anc. 72.4, Print-EN 94.1. PP-OCRv5_server: JP 73.7, Screen 68.1, Anc. 60.4, Print-EN 85.1. — From the Text Recognition Accuracy (%) Multi-Scenario Benchmark
Accuracy Comparison with Large VLMs
PP-OCRv6's official benchmark also compares it with large VLMs (vision-language models) such as Qwen3-VL-235B and GPT-5.5.
VLMs are general-purpose image-understanding models that can do more than OCR, including describing image content and answering questions. Their parameter counts are orders of magnitude larger — tens to hundreds of billions — and they need high-end GPUs for inference.
With just 34.5M parameters, the PP-OCRv6 Medium model achieves higher OCR accuracy than these large VLMs. For the specific task of text recognition, a specialized model with parameters numbering in the thousandths of the VLMs' size outperforms a general giant — an effect of task specialization rather than architectural superiority.
PP-OCRv6_medium with 34.5M parameters...surpasses VLMs such as Qwen3-VL-235B and GPT-5.5 in accuracy. — From the comparison with VLMs
That said, VLMs offer capabilities beyond OCR, such as layout understanding and grasping the meaning of image content. For pure text extraction use PP-OCRv6, and for understanding the whole image or context-dependent processing use a VLM — choose based on your goal.
Inference Speed Benchmarks by Hardware
Inference speed is compared as the time per image. The table below shows the end-to-end inference speed published in the official documentation.
| Hardware | v6 Medium | v6 Small | v6 Tiny | v5 Server | v5 Mobile |
|---|---|---|---|---|---|
| NVIDIA A100 | 0.29s | 0.25s | 0.13s | 0.32s | 0.25s |
| NVIDIA V100 | 0.72s | 0.49s | 0.21s | 0.66s | 0.50s |
| Intel Xeon 8350C | 2.05s | 0.79s | 0.32s | 2.04s | 0.80s |
| Apple M4 | 8.82s | 3.07s | 0.96s | >10s | 5.82s |
NVIDIA A100: v6_medium 0.29s, v6_tiny 0.13s. Apple M4: v6_tiny 0.96s vs v5_mobile 5.82s (6.1× speedup). — From the End-to-End Inference Speed (s/image)
On an Apple M4, the Tiny model runs in 0.96 seconds — under a second — a 6.1× speedup over the v5 Mobile model (5.82 seconds). On a GPU (A100), even the Medium model processes in 0.29 seconds, which is plenty of throughput for batch processing large volumes of documents.
Even on a CPU-only setup, Tiny runs in 0.32 seconds on an Intel Xeon. There are many cases where Tiny is enough even without a GPU.
The choice of inference backend also changes speed. On an Intel Xeon, where the standard Paddle Inference backend takes 2.05 seconds for Medium, switching to the OpenVINO backend brings it down to 1.40 seconds, and Tiny to 0.20 seconds. On an Apple M4, Tiny runs in 0.35 seconds with ONNX Runtime — less than half the standard backend's 0.96 seconds. The official docs cite up to a 2.37× GPU inference speedup, so the combination of hardware and backend can raise effective speed further.
Intel Xeon 8350C OpenVINO: v6_medium 1.40s, v6_tiny 0.20s. Apple M4 ONNX Runtime: v6_tiny 0.35s. "2.37× GPU inference speedup." — From the End-to-End Inference Speed table / performance highlights
How to Install PP-OCRv6 and Choose a Model
PP-OCRv6 can be installed from Python's package manager, and you can extract text from an image in just a few lines of code. This section explains the setup steps and how to choose among the three model tiers.
The PP-OCRv6 setup flow
1. Install pip install paddleocr
↓
2. Run code PaddleOCR() → ocr.predict(image path)
↓
3. Get results detection boxes + recognized text + confidence
From pip install to Extracting OCR Text
PP-OCRv6 is published on PyPI as the paddleocr package and runs on Python 3.8–3.13. Installation is a single line.
pip install paddleocr
After installing, the following code extracts text from an image.
from paddleocr import PaddleOCR
ocr = PaddleOCR(
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
)
result = ocr.predict("sample.png")
for res in result:
res.print()
res.save_to_json("output")
from paddleocr import PaddleOCRocr = PaddleOCR( use_doc_orientation_classify=False, use_doc_unwarping=False, use_textline_orientation=False, ) result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png") — From the Quick Start code example
use_doc_orientation_classify and use_doc_unwarping are options for document orientation and dewarping correction. Setting them to False when you do not need them makes processing lighter.
You can choose from several inference backends. Besides the default Paddle Inference, you can use the Hugging Face Transformers backend with engine="transformers" and the ONNX Runtime backend with engine="onnxruntime". Choosing ONNX Runtime removes the need to install PaddlePaddle itself, which is handy when you want to keep the environment simple.
If you keep your PDFs and images in a text-ready state, it becomes easier to feed OCR results into search or data-entry automation.
Choosing Between Tiny, Small, and Medium
The three tiers are chosen by the trade-off between parameter count and accuracy. The table below summarizes how to choose by use case.
| Use case | Recommended tier | Reason |
|---|---|---|
| Mobile apps / edge devices | Tiny (1.5M) | Lightest and fastest. Ideal if Japanese is not needed |
| Desktop apps / web services | Small (7.7M) | Good balance of accuracy and speed. 50 languages |
| Server-side batch processing | Medium (34.5M) | Highest accuracy. Suited to high-volume GPU processing |
| Jobs that include Japanese | Small or higher | Tiny does not support Japanese |
The accuracy gap shows up most clearly in handwriting and industrial text (digital displays, dot-matrix characters, and so on). The Medium model's handwritten-Chinese recognition is 62.1%, whereas Tiny drops to 40.1%. For handwriting, Medium is the safe choice.
On the other hand, for printed English text, even Tiny achieves 88.4% recognition — enough for digitizing receipts and invoices. When you compress or convert image formats as preprocessing, be careful not to degrade quality too much before OCR.
Benefits and Caveats of PP-OCRv6
Here we summarize PP-OCRv6's strengths and the constraints to understand before adopting it.
The first benefit is simpler operations thanks to the 50-language unified model. You no longer need to manage a model per language, and you can process even mixed-language environments without switching models.
Another major benefit is that it is free, including for commercial use, because it is released under the Apache License 2.0. Cloud OCR services (such as Google Cloud Vision API and Amazon Textract) incur usage-based fees, while PP-OCRv6 is self-hosted with no extra cost. The cost difference grows the more documents you process.
Its parameter efficiency also stands out. Because it delivers accuracy above VLMs with 34.5M parameters, it is easy to adopt even in environments with tight GPU memory.
As for caveats, handwriting recognition is lower than for printed text. Even the Medium model is 62.1% for handwritten Chinese and 67.8% for handwritten English, so if reading handwritten notes is your main goal, verify the accuracy with real data first.
Also, PP-OCRv6 specializes in text detection and recognition, so layout analysis (recognizing table structure and reading order) requires separately combining PaddleOCR's layout-analysis module. In a workflow that converts PDFs to images before OCR, building in the layout-analysis combination from the start saves trouble later.
Finally, you cannot use Tiny for Japanese. Tiny does not support Japanese (49 languages), so choose Small or higher for documents that include Japanese.
License: Apache-2.0. Current Version: 3.7.0. Supported Python: 3.8, 3.9, 3.10, 3.11, 3.12, 3.13. — From the package information
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. — From the repository description