Mistral: Pixtral 12B

Model Overview

Property	Value
Model ID	`mistralai/pixtral-12b`
Full Name	Mistral: Pixtral 12B
Creator	Mistral AI
Release Date	September 10, 2024
Model Type	Multi-modal (Vision + Language)

Description

The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent, making it openly available for research and commercial use.

Pixtral 12B represents Mistral AI's entry into the vision-language model space, combining strong text understanding with image comprehension capabilities.

Specifications

Specification	Value
Context Window	32,000 tokens
Context Length	32,768 tokens
Input Modalities	Text, Image
Output Modalities	Text
Architecture	Transformer-based multi-modal

Pricing

Via Hyperbolic Provider

Type	Cost per Million Tokens
Input	$0.10
Output	$0.10
Image Processing	$0.0001445 per image

Capabilities

Multi-modal Input: Accepts both text and image inputs
Text Generation: Produces text-only outputs
Standard Parameters: Supports temperature, top-p, frequency penalty, and presence penalty
Tool Calling: Supported
Reasoning: Not available (standard generation model)

Supported Parameters

Parameter	Type	Description
`temperature`	float	Controls randomness in output (0.0-2.0)
`top_p`	float	Nucleus sampling parameter (0.0-1.0)
`frequency_penalty`	float	Reduces token repetition (-2.0 to 2.0)
`presence_penalty`	float	Encourages new topics (-2.0 to 2.0)
`max_tokens`	integer	Maximum tokens to generate
`stop`	array	Stop sequences

Use Cases

Image Analysis: Describe, analyze, and extract information from images
Document Understanding: Process documents with text and visual elements
Visual Q&A: Answer questions about image content
Content Moderation: Analyze images for content classification
Accessibility: Generate alt-text descriptions for images
General Chat: Standard conversational AI tasks

Provider Information

Primary Provider: Hyperbolic

Property	Value
Base URL	`https://api.hyperbolic.xyz/v1`
Model Variant ID	`mistralai/Pixtral-12B-2409`
Data Policy	Training disabled; no prompt retention

Usage Examples

Basic Text Completion

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/pixtral-12b",
    "messages": [
      {
        "role": "user",
        "content": "Explain the concept of neural networks in simple terms."
      }
    ]
  }'

Image + Text Input (Vision)

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/pixtral-12b",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What do you see in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'

Python SDK Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.langmart.ai/v1",
    api_key="your-langmart-api-key"
)

# Text-only request
response = client.chat.completions.create(
    model="mistralai/pixtral-12b",
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Vision Request with Python

from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://api.langmart.ai/v1",
    api_key="your-langmart-api-key"
)

# With image URL
response = client.chat.completions.create(
    model="mistralai/pixtral-12b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"}
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Comparison with Similar Models

Model	Context	Vision	Provider
Pixtral 12B	32K	Yes	Mistral AI
Claude 3 Haiku	200K	Yes	Anthropic
GPT-4o Mini	128K	Yes	OpenAI
Llama 3.2 90B Vision	128K	Yes	Meta

Notes

Weights are openly available via torrent
First multi-modal model from Mistral AI
Optimized for efficient inference with moderate parameter count
Well-suited for applications requiring both text and image understanding

References

Last updated: December 2024