LangMart: Mistral: Pixtral Large 2411
Model Overview
| Property | Value |
|---|---|
| Model ID | openrouter/mistralai/pixtral-large-2411 |
| Name | Mistral: Pixtral Large 2411 |
| Provider | mistralai |
| Released | 2024-11-19 |
Description
Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images.
The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.
Description
LangMart: Mistral: Pixtral Large 2411 is a language model provided by mistralai. This model offers advanced capabilities for natural language processing tasks.
Provider
mistralai
Specifications
| Spec | Value |
|---|---|
| Context Window | 131,072 tokens |
| Modalities | text+image->text |
| Input Modalities | text, image |
| Output Modalities | text |
Pricing
| Type | Price |
|---|---|
| Input | $2.00 per 1M tokens |
| Output | $6.00 per 1M tokens |
Capabilities
- Frequency penalty
- Max tokens
- Presence penalty
- Response format
- Seed
- Stop
- Structured outputs
- Temperature
- Tool choice
- Tools
- Top p
Detailed Analysis
Pixtral Large 2411 (November 2024) is a 124B multimodal powerhouse built on Mistral Large 2 (2407), representing frontier-level vision-language capabilities. This model combines Mistral Large's exceptional reasoning and language understanding with advanced vision processing, achieving state-of-the-art results across multimodal benchmarks. On MathVista (mathematical reasoning over visual data), Pixtral Large achieves 69.4%, outperforming all competitors including GPT-4V. The model excels at document analysis (complex financial reports, legal documents with diagrams), chart and graph interpretation requiring deep reasoning, natural image understanding with nuanced context, and multi-image reasoning across 30+ high-resolution images within its 128K context window. Pixtral Large represents the pinnacle of open-weights multimodal AI, enabling sophisticated vision-language applications previously requiring proprietary models. Ideal for enterprise document intelligence, advanced data visualization analysis, scientific figure interpretation, accessibility solutions requiring detailed image understanding, and research into multimodal architectures. The 124B scale enables reasoning depth matching text-only frontier models while processing visual information.