Mistral’s new OCR API turns PDF document into AI-ready text format

Mistral launches new OCR API
Rep.Image | Image Credits: Mistral | Cropped by GBN
By News Desk, GCC Business News

The Paris-based AI startup ‘Mistral’ has launched the Mistral Optical Character Recognition (OCR) application programming interface (API), which is capable of analyzing and processing PDF documents and converting it into an AI-ready text format.

This cutting-edge tool can extract data from PDF, making it easily consumable by AI models. Mistral stated that this new OCR API will allow developers to build AI applications for PDF files as well as enable them to create datasets to train new AI models.

AI models struggle with PDF documents. The content in this file format cannot be easily accessed by large language models (LLMs) using traditional Retrieval-Augmented Generation (RAG) techniques as the data cannot be processed by them, but Mistral OCR sets a new benchmark in document understanding.

Mistral OCR is capable of comprehending each element of documents such as media, text, tables, equations, with unprecedented accuracy and cognition. After analyzing, it can extract and present the information in the Markdown or a raw text file format.

AI models can utilize this extracted text as input and RAG systems can easily access them and answer queries about them. The AI firm claims that Mistral OCR has state-of-the-art understanding of complex document elements, including interleaved imagery, mathematical expressions, tables, and advanced layouts such as LaTeX formatting.

The model also enables deeper understanding of rich documents such as scientific papers with embedded charts, graphs, equations and figures.

Mistral OCR performs faster than its peers, processing up to 2000 pages per minute on a single node. The capacity to rapidly process documents ensures continuous learning and improvement even for high-throughput environments.

The AI firm claims that the Mistral OCR outperformed models such as Google Document AI, Azure OCR, and GPT-4o version 2024-11-20 for “text-only” documents. It also outperformed Google and Azure in multilingual capabilities.

Recommended | SenseTime expands AI capabilities in the Gulf region

YOU MAY LIKE