MiMo-V2-Flash (Free)
Overview
| Property | Value |
|---|---|
| Model ID | xiaomi/mimo-v2-flash:free |
| Provider | Xiaomi |
| Created | December 14, 2025 |
| Context Length | 262,144 tokens |
| Max Completion Tokens | 65,536 |
| Modalities | Text (input/output) |
Description
MiMo-V2-Flash is an open-source language model developed by Xiaomi featuring a Mixture-of-Experts (MoE) architecture with 309B total parameters and 15B active parameters. It employs hybrid attention mechanisms and supports a 256K context window, excelling in reasoning, coding, and agent scenarios.
The model ranks #1 among open-source options on SWE-bench Verified and SWE-bench Multilingual benchmarks, with performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much (when using paid tiers).
Pricing
| Type | Cost per Million |
|---|---|
| Input | $0.00 (Free) |
| Output | $0.00 (Free) |
This is a completely free tier with no token costs.
Capabilities
- Reasoning Support: Yes (with configurable
reasoning.enabledboolean) - Tool Support: Full support for
toolsandtool_choice - Multipart Support: Yes
- Streaming: Supported
Default System Prompt
You are MiMo, an AI assistant developed by Xiaomi. Your knowledge cutoff date is December 2024.
Supported Parameters
| Parameter | Description |
|---|---|
reasoning |
Enable/disable reasoning mode |
include_reasoning |
Include reasoning in response |
max_tokens |
Maximum tokens in response |
temperature |
Sampling temperature |
top_p |
Nucleus sampling parameter |
stop |
Stop sequences |
response_format |
Output format specification |
tools |
Tool definitions for function calling |
tool_choice |
Tool selection preference |
frequency_penalty |
Frequency penalty for token repetition |
presence_penalty |
Presence penalty for topic diversity |
Related Models
Other Xiaomi models in the MiMo family:
xiaomi/mimo-v2-flash(Paid tier with potentially higher rate limits)
Architecture
- Total Parameters: 309B
- Active Parameters: 15B (Mixture-of-Experts)
- Attention: Hybrid attention mechanisms
- Context Window: 256K tokens
Reasoning Mode
The model supports reasoning with special tokens:
- Start token:
<think> - End token:
</think>
Important: For agentic integrations (Claude Code, Cline, Roo Code), it is recommended to disable reasoning mode for optimal performance, as the model is specifically optimized for this configuration.
Provider Infrastructure
| Property | Details |
|---|---|
| Primary Provider | Xiaomi |
| API Endpoint | https://api.langmart.ai/v1 |
| Data Centers | Singapore, Netherlands |
| Adapter | XiaomiAdapter |
Performance Benchmarks
- SWE-bench Verified: #1 among open-source models
- SWE-bench Multilingual: #1 among open-source models
- Performance comparable to Claude Sonnet 4.5
Usage Example
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "xiaomi/mimo-v2-flash:free",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
With Reasoning Enabled
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "xiaomi/mimo-v2-flash:free",
"messages": [
{"role": "user", "content": "Solve this step by step: What is 23 * 47?"}
],
"reasoning": {
"enabled": true
}
}'
Notes
- This is a free tier model with potential rate limiting
- Knowledge cutoff: December 2024
- Optimized for coding, reasoning, and agentic workflows
- Best performance in agent scenarios when reasoning is disabled