LangMart: Mistral: Mistral Small 3
Model Overview
| Property | Value |
|---|---|
| Model ID | openrouter/mistralai/mistral-small-24b-instruct-2501 |
| Name | Mistral: Mistral Small 3 |
| Provider | mistralai |
| Released | 2025-01-30 |
Description
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.
Description
LangMart: Mistral: Mistral Small 3 is a language model provided by mistralai. This model offers advanced capabilities for natural language processing tasks.
Provider
mistralai
Specifications
| Spec | Value |
|---|---|
| Context Window | 32,768 tokens |
| Modalities | text->text |
| Input Modalities | text |
| Output Modalities | text |
Pricing
| Type | Price |
|---|---|
| Input | $0.03 per 1M tokens |
| Output | $0.11 per 1M tokens |
Capabilities
- Frequency penalty
- Logit bias
- Max tokens
- Min p
- Presence penalty
- Repetition penalty
- Response format
- Seed
- Stop
- Structured outputs
- Temperature
- Tool choice
- Tools
- Top k
- Top p
Detailed Analysis
Mistral Small 24B Instruct 2501 (January 2025) is a 24-billion parameter model achieving state-of-the-art performance comparable to models 3x its size through architectural innovations and advanced training. The model hits 81% on MMLU benchmark, performing competitively with Llama 3.3 70B and Qwen 32B while operating at 3x the speed. This efficiency breakthrough enables high-quality AI at dramatically lower computational cost. Mistral Small 2501 excels at fast-response conversational agents requiring intelligence without latency, low-latency function calling for real-time tool integration, cost-optimized text generation maintaining quality standards, and applications where model efficiency directly impacts user experience. The 24B size represents an optimal efficiency frontier - significantly more capable than 7B models while remaining deployable on single high-end GPUs. The January 2025 release incorporates latest training techniques, ensuring the model handles 2025 knowledge, modern language patterns, and current events.