2025-11-14

Released: 14 November 2025

We’re excited to announce the launch of our first dedicated OCR model, delivering significantly improved accuracy, speed, and layout understanding for image-based text extraction. This release expands our multimodal AI capabilities and strengthens support for enterprise-scale document processing.

What’s New?

1. Llama-4-Maverick-17B-128E-Instruct (OCR)

A high-precision Optical Character Recognition model designed for reliability across diverse real-world input conditions.

Key features:

High-accuracy text extraction from documents, photos, screenshots, and natural scenes
Robust layout understanding, including tables, multi-column layouts, and irregular formatting
Mixed-language recognition with improved multilingual performance
Noise-tolerant inference for low-resolution, blurred, or compressed images
Optimized throughput and latency for large-scale production workloads
Easy API integration with existing applications and pipelines

Performance Enhancements

This model introduces:

Faster inference time for high-volume OCR jobs
Improved reconstruction of structured and semi-structured documents
Higher accuracy across multilingual and complex-layout documents
Enhanced resilience to visual noise and distortion

Next Steps for You

Start using the OCR model through the standard API endpoint.
Evaluate improvements by testing your document and image samples.
Integrate the model into automated pipelines for data extraction.
Reach out to support if you need help with onboarding or migration.

We’re excited to introduce Llama-4-Maverick-17B-128E-Instruct (OCR) and look forward to supporting your adoption.