MiMo-V2-Flash (Free)

Overview

Property	Value
Model ID	`xiaomi/mimo-v2-flash:free`
Provider	Xiaomi
Created	December 14, 2025
Context Length	262,144 tokens
Max Completion Tokens	65,536
Modalities	Text (input/output)

Description

MiMo-V2-Flash is an open-source language model developed by Xiaomi featuring a Mixture-of-Experts (MoE) architecture with 309B total parameters and 15B active parameters. It employs hybrid attention mechanisms and supports a 256K context window, excelling in reasoning, coding, and agent scenarios.

The model ranks #1 among open-source options on SWE-bench Verified and SWE-bench Multilingual benchmarks, with performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much (when using paid tiers).

Pricing

Type	Cost per Million
Input	$0.00 (Free)
Output	$0.00 (Free)

This is a completely free tier with no token costs.

Capabilities

Reasoning Support: Yes (with configurable reasoning.enabled boolean)
Tool Support: Full support for tools and tool_choice
Multipart Support: Yes
Streaming: Supported

Default System Prompt

You are MiMo, an AI assistant developed by Xiaomi. Your knowledge cutoff date is December 2024.

Supported Parameters

Parameter	Description
`reasoning`	Enable/disable reasoning mode
`include_reasoning`	Include reasoning in response
`max_tokens`	Maximum tokens in response
`temperature`	Sampling temperature
`top_p`	Nucleus sampling parameter
`stop`	Stop sequences
`response_format`	Output format specification
`tools`	Tool definitions for function calling
`tool_choice`	Tool selection preference
`frequency_penalty`	Frequency penalty for token repetition
`presence_penalty`	Presence penalty for topic diversity

Other Xiaomi models in the MiMo family:

xiaomi/mimo-v2-flash (Paid tier with potentially higher rate limits)

Architecture

Total Parameters: 309B
Active Parameters: 15B (Mixture-of-Experts)
Attention: Hybrid attention mechanisms
Context Window: 256K tokens

Reasoning Mode

The model supports reasoning with special tokens:

Start token: <think>
End token: </think>

Important: For agentic integrations (Claude Code, Cline, Roo Code), it is recommended to disable reasoning mode for optimal performance, as the model is specifically optimized for this configuration.

Provider Infrastructure

Property	Details
Primary Provider	Xiaomi
API Endpoint	`https://api.langmart.ai/v1`
Data Centers	Singapore, Netherlands
Adapter	XiaomiAdapter

Performance Benchmarks

SWE-bench Verified: #1 among open-source models
SWE-bench Multilingual: #1 among open-source models
Performance comparable to Claude Sonnet 4.5

Usage Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xiaomi/mimo-v2-flash:free",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

With Reasoning Enabled

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xiaomi/mimo-v2-flash:free",
    "messages": [
      {"role": "user", "content": "Solve this step by step: What is 23 * 47?"}
    ],
    "reasoning": {
      "enabled": true
    }
  }'

Notes

This is a free tier model with potential rate limiting
Knowledge cutoff: December 2024
Optimized for coding, reasoning, and agentic workflows
Best performance in agent scenarios when reasoning is disabled

MiMo-V2-Flash (Free)

MiMo-V2-Flash (Free)

Overview

Description

Pricing

Capabilities

Default System Prompt

Supported Parameters

Related Models

Architecture

Reasoning Mode

Provider Infrastructure

Performance Benchmarks

Usage Example

With Reasoning Enabled

Notes