M

Mistral: Pixtral 12B

Mistral AI
Vision
32K
Context
$0.1000
Input /1M
$0.1000
Output /1M
N/A
Max Output

Mistral: Pixtral 12B

Model Overview

Property Value
Model ID mistralai/pixtral-12b
Full Name Mistral: Pixtral 12B
Creator Mistral AI
Release Date September 10, 2024
Model Type Multi-modal (Vision + Language)

Description

The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent, making it openly available for research and commercial use.

Pixtral 12B represents Mistral AI's entry into the vision-language model space, combining strong text understanding with image comprehension capabilities.

Specifications

Specification Value
Context Window 32,000 tokens
Context Length 32,768 tokens
Input Modalities Text, Image
Output Modalities Text
Architecture Transformer-based multi-modal

Pricing

Via Hyperbolic Provider

Type Cost per Million Tokens
Input $0.10
Output $0.10
Image Processing $0.0001445 per image

Capabilities

  • Multi-modal Input: Accepts both text and image inputs
  • Text Generation: Produces text-only outputs
  • Standard Parameters: Supports temperature, top-p, frequency penalty, and presence penalty
  • Tool Calling: Supported
  • Reasoning: Not available (standard generation model)

Supported Parameters

Parameter Type Description
temperature float Controls randomness in output (0.0-2.0)
top_p float Nucleus sampling parameter (0.0-1.0)
frequency_penalty float Reduces token repetition (-2.0 to 2.0)
presence_penalty float Encourages new topics (-2.0 to 2.0)
max_tokens integer Maximum tokens to generate
stop array Stop sequences

Use Cases

  • Image Analysis: Describe, analyze, and extract information from images
  • Document Understanding: Process documents with text and visual elements
  • Visual Q&A: Answer questions about image content
  • Content Moderation: Analyze images for content classification
  • Accessibility: Generate alt-text descriptions for images
  • General Chat: Standard conversational AI tasks

Provider Information

Primary Provider: Hyperbolic

Property Value
Base URL https://api.hyperbolic.xyz/v1
Model Variant ID mistralai/Pixtral-12B-2409
Data Policy Training disabled; no prompt retention

Usage Examples

Basic Text Completion

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/pixtral-12b",
    "messages": [
      {
        "role": "user",
        "content": "Explain the concept of neural networks in simple terms."
      }
    ]
  }'

Image + Text Input (Vision)

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/pixtral-12b",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What do you see in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'

Python SDK Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.langmart.ai/v1",
    api_key="your-langmart-api-key"
)

# Text-only request
response = client.chat.completions.create(
    model="mistralai/pixtral-12b",
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Vision Request with Python

from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://api.langmart.ai/v1",
    api_key="your-langmart-api-key"
)

# With image URL
response = client.chat.completions.create(
    model="mistralai/pixtral-12b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"}
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Comparison with Similar Models

Model Context Vision Provider
Pixtral 12B 32K Yes Mistral AI
Claude 3 Haiku 200K Yes Anthropic
GPT-4o Mini 128K Yes OpenAI
Llama 3.2 90B Vision 128K Yes Meta

Notes

  • Weights are openly available via torrent
  • First multi-modal model from Mistral AI
  • Optimized for efficient inference with moderate parameter count
  • Well-suited for applications requiring both text and image understanding

References


Last updated: December 2024