LangMart: Mistral: Mixtral 8x22B Instruct
Model Overview
| Property | Value |
|---|---|
| Model ID | openrouter/mistralai/mixtral-8x22b-instruct |
| Name | Mistral: Mixtral 8x22B Instruct |
| Provider | mistralai |
| Released | 2024-04-17 |
Description
Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:
- strong math, coding, and reasoning
- large context length (64k)
- fluency in English, French, Italian, German, and Spanish
See benchmarks on the launch announcement here. #moe
Description
LangMart: Mistral: Mixtral 8x22B Instruct is a language model provided by mistralai. This model offers advanced capabilities for natural language processing tasks.
Provider
mistralai
Specifications
| Spec | Value |
|---|---|
| Context Window | 65,536 tokens |
| Modalities | text->text |
| Input Modalities | text |
| Output Modalities | text |
Pricing
| Type | Price |
|---|---|
| Input | $2.00 per 1M tokens |
| Output | $6.00 per 1M tokens |
Capabilities
- Frequency penalty
- Max tokens
- Presence penalty
- Response format
- Seed
- Stop
- Structured outputs
- Temperature
- Tool choice
- Tools
- Top p
Detailed Analysis
Mixtral 8x22B Instruct represents Mistral's scaled-up Sparse Mixture of Experts architecture with 8 experts of 22B parameters each (141B total) but only 39B active parameters per token through sparse activation. This model pushes MoE scaling to match GPT-4 class performance while maintaining remarkable efficiency - approximately 3.6x parameter efficiency over equivalent dense models. The larger expert size (22B vs 7B) enables deeper specialization: dedicated experts for advanced mathematics, complex code, scientific reasoning, creative writing, and multilingual nuances. Mixtral 8x22B achieves frontier-level performance on reasoning benchmarks, code generation (including architectural design), mathematical problem-solving, and sophisticated language understanding. The 32K+ context window handles extensive documents and codebases. The model represents the current peak of open-source MoE capabilities, offering GPT-4 comparable intelligence at significantly lower computational and financial cost. Ideal for enterprise applications requiring maximum intelligence with cost optimization, research into frontier MoE architectures, and self-hosted deployments needing GPT-4 class capabilities.