Google Gemini 2.0 Flash Lite - Complete Model Details
Overview
Model Name: Google: Gemini 2.0 Flash Lite
Model ID: google/gemini-2.0-flash-lite-001
Provider: Google Vertex
Created: February 25, 2025
Context Length: 1,048,576 tokens
Description: This model emphasizes "significantly faster time to first token (TTFT) compared to Gemini Flash 1.5" while matching quality expectations of larger variants, at reduced costs.
Pricing
| Metric | Cost |
|---|---|
| Input Tokens | $0.075 per million |
| Output Tokens | $0.30 per million |
Detailed Analysis
Gemini 2.0 Flash-Lite-001 is a stable, numbered release version providing production-ready performance with guaranteed behavior. This specific version ensures consistency across deployments. Why choose over 2.0 Flash: Flash-Lite-001 offers the same cost and latency advantages as the auto-updated Flash-Lite alias but with version pinning for production stability. It excels at high-throughput, structured tasks where you need predictable model behavior across time. Use this over 2.0 Flash when extreme speed and cost efficiency matter more than advanced reasoning capabilities. Stable vs Preview vs Auto-Updated: The -001 suffix indicates this is a numbered stable release, recommended for production use. Unlike the auto-updated alias (gemini-2.0-flash-lite), this version won't change unexpectedly, preventing behavior shifts in production systems. Preview versions may have billing enabled but can be deprecated with 2 weeks notice. 2.0 vs 2.5 generation: As a 2.0-generation model, Flash-Lite-001 represents the first iteration of Google's Lite optimization. Gemini 2.5 Flash-Lite (newer generation) delivers all-around higher quality with better instruction following, reduced verbosity, and stronger multimodal capabilities. The 2.5 generation uses 50% fewer output tokens, making it more cost-effective despite the same pricing. Best for: Production deployments requiring version stability, high-volume classification tasks, content moderation at scale, real-time routing decisions, and batch extraction jobs where output determinism is critical.
Related Models
- Gemini Flash 1.5
- Gemini Pro 1.5
- Other Google Gemini variants via LangMart
Providers
Platform: Google Vertex
Adapter: GoogleVertexGeminiAdapter
Data Policy:
- No training on user data
- No prompt retention
- Requires User IDs
- Terms: https://cloud.google.com/terms/
Performance Metrics
Recent Usage (December 2025):
| Date | Requests | Prompt Tokens | Completion Tokens | Tool Calls |
|---|---|---|---|---|
| Dec 23 | 4.5M | 4.07B | 446M | 29K |
| Dec 22 | 3.5M | 4.41B | 667M | 31K |
| Dec 6 | 8.4M | 6.70B | 1.26B | 143K |
Parameters
Input Modalities:
- Text
- Image
- File
- Audio
- Video
Output Modalities:
- Text only
Supported Parameters:
- max_tokens
- temperature
- top_p
- seed
- response_format
- stop
- structured_outputs
- tools
- tool_choice
Features:
- File URL support
- Input audio capability
- Base64 video input
- Tool choice support (none, auto, required)
- Multipart message support
- Chat completions