Mistral: Pixtral 12B
Model Overview
| Property |
Value |
| Model ID |
mistralai/pixtral-12b |
| Full Name |
Mistral: Pixtral 12B |
| Creator |
Mistral AI |
| Release Date |
September 10, 2024 |
| Model Type |
Multi-modal (Vision + Language) |
Description
The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent, making it openly available for research and commercial use.
Pixtral 12B represents Mistral AI's entry into the vision-language model space, combining strong text understanding with image comprehension capabilities.
Specifications
| Specification |
Value |
| Context Window |
32,000 tokens |
| Context Length |
32,768 tokens |
| Input Modalities |
Text, Image |
| Output Modalities |
Text |
| Architecture |
Transformer-based multi-modal |
Pricing
Via Hyperbolic Provider
| Type |
Cost per Million Tokens |
| Input |
$0.10 |
| Output |
$0.10 |
| Image Processing |
$0.0001445 per image |
Capabilities
- Multi-modal Input: Accepts both text and image inputs
- Text Generation: Produces text-only outputs
- Standard Parameters: Supports temperature, top-p, frequency penalty, and presence penalty
- Tool Calling: Supported
- Reasoning: Not available (standard generation model)
Supported Parameters
| Parameter |
Type |
Description |
temperature |
float |
Controls randomness in output (0.0-2.0) |
top_p |
float |
Nucleus sampling parameter (0.0-1.0) |
frequency_penalty |
float |
Reduces token repetition (-2.0 to 2.0) |
presence_penalty |
float |
Encourages new topics (-2.0 to 2.0) |
max_tokens |
integer |
Maximum tokens to generate |
stop |
array |
Stop sequences |
Use Cases
- Image Analysis: Describe, analyze, and extract information from images
- Document Understanding: Process documents with text and visual elements
- Visual Q&A: Answer questions about image content
- Content Moderation: Analyze images for content classification
- Accessibility: Generate alt-text descriptions for images
- General Chat: Standard conversational AI tasks
Primary Provider: Hyperbolic
| Property |
Value |
| Base URL |
https://api.hyperbolic.xyz/v1 |
| Model Variant ID |
mistralai/Pixtral-12B-2409 |
| Data Policy |
Training disabled; no prompt retention |
Usage Examples
Basic Text Completion
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/pixtral-12b",
"messages": [
{
"role": "user",
"content": "Explain the concept of neural networks in simple terms."
}
]
}'
Image + Text Input (Vision)
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/pixtral-12b",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What do you see in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}'
Python SDK Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.langmart.ai/v1",
api_key="your-langmart-api-key"
)
# Text-only request
response = client.chat.completions.create(
model="mistralai/pixtral-12b",
messages=[
{"role": "user", "content": "What is machine learning?"}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
Vision Request with Python
from openai import OpenAI
import base64
client = OpenAI(
base_url="https://api.langmart.ai/v1",
api_key="your-langmart-api-key"
)
# With image URL
response = client.chat.completions.create(
model="mistralai/pixtral-12b",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image in detail."},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"}
}
]
}
]
)
print(response.choices[0].message.content)
Comparison with Similar Models
| Model |
Context |
Vision |
Provider |
| Pixtral 12B |
32K |
Yes |
Mistral AI |
| Claude 3 Haiku |
200K |
Yes |
Anthropic |
| GPT-4o Mini |
128K |
Yes |
OpenAI |
| Llama 3.2 90B Vision |
128K |
Yes |
Meta |
Notes
- Weights are openly available via torrent
- First multi-modal model from Mistral AI
- Optimized for efficient inference with moderate parameter count
- Well-suited for applications requiring both text and image understanding
References
Last updated: December 2024