Mistral AI Releases OCR 3: A Small Optical Character Recognition (OCR) Model for Structured Document AI at Scale

by
0 comments
Mistral AI Releases OCR 3: A Small Optical Character Recognition (OCR) Model for Structured Document AI at Scale

Mistral AI has released Mistral OCR 3, its latest optical character recognition service that powers the company’s Document AI stack. Model, named mistral-ocr-2512The framework is designed to extract interleaved text and images from PDFs and other documents while preserving the structure, and it does so at an aggressive price of $2 per 1,000 pages, with a 50% discount when used via the Batch API.

What is Mistral OCR 3 optimized for?

Mistral OCR 3 targets specific enterprise document workloads. The model is designed for forms, scanned documents, complex tables, and handwriting. It is evaluated on internal benchmarks taken from real business use cases, where it achieves an overall win rate of 74% over Mistral OCR 2 in these document categories using a fuzzy match metric against ground truth.

The model outputs Markdown that preserves the document layout, and when table formatting is enabled, it enriches the output with HTML based table representations. This combination gives downstream systems both the content and structural information needed for retrieval pipelines, analysis, and agent workflows.

Mistral Documents Role in AI

OCR 3 sits inside Mistral Document AI, the company’s document processing capability that combines OCR with structured data extraction and document QnA.

It now powers the Document AI Playground in Mistral AI Studio. In this interface, users upload PDFs or images and get back clean text or structured JSON without writing code. The same underlying OCR pipeline is accessible via public APIs, allowing teams to move from interactive exploration to production workloads without changing the core model.

Input, Output and Structure

The OCR processor accepts multiple document formats through a single API. document The field may indicate:

  • document_url For PDF, PPTX, DOCX and more
  • image_url For image types like PNG, JPEG or AVIF
  • Upload or Base64 encoded PDF or images via the same schema

This is documented in the OCR Processor section of Mistral’s Document AI documentation.

response is a JSON object pages array. Each page contains an index, a markdown string, a list of images, a list of tables table_format="html" Used, Hyperlink detected, Optional header And footer fields when header or footer extraction is enabled, and a dimensions Page size object. there is also one document_annotation fields for structured annotations and a usage_info Block for accounting information.

Markdown includes placeholders when images and HTML tables are extracted !(img-0.jpeg)(img-0.jpeg) And (tbl-3.html)(tbl-3.html)These placeholders are mapped back to the actual content using images And tables Arrays in the response, which simplifies downstream reconstruction.

Upgrade to Mistral OCR 2

Mistral OCR 3 offers several solid upgrades relative to OCR 2. The public release notes emphasize four main areas.

  • handwriting Mistral OCR 3 more accurately interprets cursive, mixed content annotations and handwritten text placed on top of printed templates.
  • form It improves detection of boxes, labels and handwritten entries in dense layouts such as invoices, receipts, compliance forms and government documents.
  • Scanned and complex documents The model is more robust to compression artifacts, skewness, distortion, low DPI, and background noise in scanned pages.
  • complex tables It reconstructs table structures with headers, merged cells, multi row blocks, and column hierarchies, and it can return HTML tables properly. colspan And rowspan Tags so that the layout is preserved.
https://mistral.ai/news/mistral-ocr-3

Pricing, Batch Estimation, and Annotation

The OCR 3 model card lists a price of $2 per 1,000 pages for standard OCR and $3 per 1,000 annotated pages when structured annotations are used.

Mistral also exposes OCR3 through its Batch Inference API /v1/batchWhich is documented under the batching section of the platform. Batch processing halves the effective OCR price to $1 per 1,000 pages by applying a 50% discount to jobs run through the batch pipeline.

The model integrates with two important features, Annotation – Structured and Bbox Extraction, on a single endpoint. These allow developers to attach schema-driven labels to areas of the document and obtain bounding boxes for text and other elements, which is useful when mapping content to downstream systems or UI overlays.

key takeaways

  1. Model and role:Mistral OCR 3, renamed mistral-ocr-2512is a new OCR service that powers Mistral’s Document AI stack for page-based document understanding.
  2. accuracy is achieved: On internal benchmarks covering forms, scanned documents, complex tables, and handwriting, OCR 3 achieved an overall win rate of 74% over Mistral OCR 2, putting Mistral at the cutting edge of both traditional and AI native OCR systems.
  3. Structured output for RAG: The service extracts interleaved text and embedded images and returns rich Markdown from HTML reconstructed tables, preserving layout and table structure so outputs can feed directly into RAGs, agents, and search pipelines with minimal additional parsing.
  4. API and Document Format:Developers access OCR 3 through /v1/ocr Endpoint or SDK, passing PDF as document_url and images like PNG or JPEG image_urlAnd React can enable options like HTML table output, header or footer extraction, and base64 images.
  5. Pricing and Batch Processing: OCR 3 is priced at $2 per 1,000 pages and $3 per 1,000 annotated pages, and when used via the Batch API the effective price of standard OCR drops to $1 per 1,000 pages for large-scale processing.

check it out technical details, Feel free to check us out GitHub page for tutorials, code, and notebooksAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletter,


Michael Sutter is a data science professional and holds a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michael excels in transforming complex datasets into actionable insights.

Related Articles

Leave a Comment