X

MiMo-V2-Flash (Free)

Xiaomi
Reasoning
262K
Context
Free
Input /1M
Free
Output /1M
66K
Max Output

MiMo-V2-Flash (Free)

Overview

Property Value
Model ID xiaomi/mimo-v2-flash:free
Provider Xiaomi
Created December 14, 2025
Context Length 262,144 tokens
Max Completion Tokens 65,536
Modalities Text (input/output)

Description

MiMo-V2-Flash is an open-source language model developed by Xiaomi featuring a Mixture-of-Experts (MoE) architecture with 309B total parameters and 15B active parameters. It employs hybrid attention mechanisms and supports a 256K context window, excelling in reasoning, coding, and agent scenarios.

The model ranks #1 among open-source options on SWE-bench Verified and SWE-bench Multilingual benchmarks, with performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much (when using paid tiers).

Pricing

Type Cost per Million
Input $0.00 (Free)
Output $0.00 (Free)

This is a completely free tier with no token costs.

Capabilities

  • Reasoning Support: Yes (with configurable reasoning.enabled boolean)
  • Tool Support: Full support for tools and tool_choice
  • Multipart Support: Yes
  • Streaming: Supported

Default System Prompt

You are MiMo, an AI assistant developed by Xiaomi. Your knowledge cutoff date is December 2024.

Supported Parameters

Parameter Description
reasoning Enable/disable reasoning mode
include_reasoning Include reasoning in response
max_tokens Maximum tokens in response
temperature Sampling temperature
top_p Nucleus sampling parameter
stop Stop sequences
response_format Output format specification
tools Tool definitions for function calling
tool_choice Tool selection preference
frequency_penalty Frequency penalty for token repetition
presence_penalty Presence penalty for topic diversity

Other Xiaomi models in the MiMo family:

  • xiaomi/mimo-v2-flash (Paid tier with potentially higher rate limits)

Architecture

  • Total Parameters: 309B
  • Active Parameters: 15B (Mixture-of-Experts)
  • Attention: Hybrid attention mechanisms
  • Context Window: 256K tokens

Reasoning Mode

The model supports reasoning with special tokens:

  • Start token: <think>
  • End token: </think>

Important: For agentic integrations (Claude Code, Cline, Roo Code), it is recommended to disable reasoning mode for optimal performance, as the model is specifically optimized for this configuration.

Provider Infrastructure

Property Details
Primary Provider Xiaomi
API Endpoint https://api.langmart.ai/v1
Data Centers Singapore, Netherlands
Adapter XiaomiAdapter

Performance Benchmarks

  • SWE-bench Verified: #1 among open-source models
  • SWE-bench Multilingual: #1 among open-source models
  • Performance comparable to Claude Sonnet 4.5

Usage Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xiaomi/mimo-v2-flash:free",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

With Reasoning Enabled

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xiaomi/mimo-v2-flash:free",
    "messages": [
      {"role": "user", "content": "Solve this step by step: What is 23 * 47?"}
    ],
    "reasoning": {
      "enabled": true
    }
  }'

Notes

  • This is a free tier model with potential rate limiting
  • Knowledge cutoff: December 2024
  • Optimized for coding, reasoning, and agentic workflows
  • Best performance in agent scenarios when reasoning is disabled