Claude 3 Haiku
Model Overview
| Property |
Value |
| Provider |
Anthropic |
| Model Name |
Claude 3 Haiku |
| Model ID (for inference) |
anthropic/claude-3-haiku |
| Created |
March 13, 2024 |
| Context Length |
200,000 tokens |
| Max Output Tokens |
4,096 tokens |
Description
Claude 3 Haiku is Anthropic's fastest and most compact model, designed for near-instant responsiveness with quick and accurate targeted performance. It excels at tasks requiring rapid responses while maintaining high quality output.
Key characteristics include:
- Near-instant response times for real-time applications
- Compact model size optimized for efficiency
- Strong performance on targeted tasks
- Multimodal support (text and images)
- Cost-effective pricing for high-volume applications
The model is ideal for use cases where speed is critical, such as chatbots, real-time assistants, content moderation, and high-throughput processing tasks.
Technical Specifications
| Specification |
Value |
| Context Window |
200,000 tokens |
| Max Completion Tokens |
4,096 tokens |
| Data Retention |
30 days |
| Moderation |
Required for API usage |
| Deprecation Date |
Not announced |
Pricing
Standard Pricing
| Type |
Rate |
| Input |
$0.25 / 1M tokens |
| Output |
$1.25 / 1M tokens |
| Image Input |
$0.40 / 1K images |
| Input Cache Read |
$0.03 / 1M tokens |
| Input Cache Write |
$0.30 / 1M tokens |
Price per Token (Detailed)
| Type |
Price per Token |
| Input |
$0.00000025 |
| Output |
$0.00000125 |
| Cache Read |
$0.00000003 |
| Cache Write |
$0.0000003 |
Capabilities
| Capability |
Supported |
| Reasoning Mode |
No |
| Tool/Function Calling |
Yes |
| Vision (Image Analysis) |
Yes |
| File Processing |
No |
| Streaming |
Yes |
| Caching |
Yes |
| Multi-Part Input |
Yes |
Supported Parameters
| Parameter |
Description |
max_tokens |
Maximum number of tokens to generate (up to 4,096) |
temperature |
Controls randomness (0-1) |
top_p |
Nucleus sampling threshold |
top_k |
Top-k sampling parameter |
stop |
Stop sequences to end generation |
tools |
List of available tools/functions |
tool_choice |
Control tool selection behavior |
Best Practices
- For High-Volume Applications: Leverage the low cost per token for batch processing tasks
- For Real-Time Chat: Take advantage of near-instant response times for conversational AI
- For Cost Optimization: Use Haiku for simpler tasks, reserving larger models for complex reasoning
- For Image Analysis: Utilize multimodal capability for quick image understanding tasks
- For Content Moderation: Ideal for high-throughput content screening
- For Caching: Use cache features for repeated context to further reduce costs
API Usage Example
curl https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-haiku",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-haiku",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
],
"max_tokens": 1024
}'
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-haiku",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,..."
}
}
]
}
],
"max_tokens": 1024
}'
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-haiku",
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
}'
Claude 3 Family
| Model |
Context |
Use Case |
| Claude 3 Opus |
200K tokens |
Highest capability, complex reasoning |
| Claude 3.5 Sonnet |
200K tokens |
Balanced performance and efficiency |
| Claude 3 Haiku |
200K tokens |
Speed-optimized, cost-effective |
Newer Generations
| Model |
Context |
Notes |
| Claude 3.7 Sonnet |
200K tokens |
Enhanced reasoning |
| Claude Sonnet 4 |
1M tokens |
Latest Sonnet generation |
| Claude Opus 4 |
1M tokens |
Latest flagship model |
Providers
Available Providers
| Provider |
Status |
| Anthropic |
Primary |
| Amazon Bedrock |
Available |
| Google Vertex |
Available |
Supported Modalities
Output Modalities
Claude 3 Haiku is optimized for:
- Speed: Near-instant responsiveness for real-time applications
- Efficiency: Compact model architecture for lower latency
- Accuracy: Quick and accurate targeted performance
- Throughput: High volume processing capability
Use Case Recommendations
| Use Case |
Suitability |
| Real-time chatbots |
Excellent |
| Content moderation |
Excellent |
| Quick Q&A |
Excellent |
| High-volume processing |
Excellent |
| Image captioning |
Good |
| Simple tool calling |
Good |
| Complex reasoning |
Consider larger models |
| Long-form generation |
Consider larger models |
Source