O

LangMart: Mistral: Mixtral 8x22B Instruct

Openrouter
66K
Context
$2.00
Input /1M
$6.00
Output /1M
N/A
Max Output

LangMart: Mistral: Mixtral 8x22B Instruct

Model Overview

Property Value
Model ID openrouter/mistralai/mixtral-8x22b-instruct
Name Mistral: Mixtral 8x22B Instruct
Provider mistralai
Released 2024-04-17

Description

Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:

  • strong math, coding, and reasoning
  • large context length (64k)
  • fluency in English, French, Italian, German, and Spanish

See benchmarks on the launch announcement here. #moe

Description

LangMart: Mistral: Mixtral 8x22B Instruct is a language model provided by mistralai. This model offers advanced capabilities for natural language processing tasks.

Provider

mistralai

Specifications

Spec Value
Context Window 65,536 tokens
Modalities text->text
Input Modalities text
Output Modalities text

Pricing

Type Price
Input $2.00 per 1M tokens
Output $6.00 per 1M tokens

Capabilities

  • Frequency penalty
  • Max tokens
  • Presence penalty
  • Response format
  • Seed
  • Stop
  • Structured outputs
  • Temperature
  • Tool choice
  • Tools
  • Top p

Detailed Analysis

Mixtral 8x22B Instruct represents Mistral's scaled-up Sparse Mixture of Experts architecture with 8 experts of 22B parameters each (141B total) but only 39B active parameters per token through sparse activation. This model pushes MoE scaling to match GPT-4 class performance while maintaining remarkable efficiency - approximately 3.6x parameter efficiency over equivalent dense models. The larger expert size (22B vs 7B) enables deeper specialization: dedicated experts for advanced mathematics, complex code, scientific reasoning, creative writing, and multilingual nuances. Mixtral 8x22B achieves frontier-level performance on reasoning benchmarks, code generation (including architectural design), mathematical problem-solving, and sophisticated language understanding. The 32K+ context window handles extensive documents and codebases. The model represents the current peak of open-source MoE capabilities, offering GPT-4 comparable intelligence at significantly lower computational and financial cost. Ideal for enterprise applications requiring maximum intelligence with cost optimization, research into frontier MoE architectures, and self-hosted deployments needing GPT-4 class capabilities.