A

Claude 3 Haiku

Anthropic
Vision Tools Streaming
200K
Context
$0.2500
Input /1M
$1.25
Output /1M
4K
Max Output

Claude 3 Haiku

Model Overview

Property Value
Provider Anthropic
Model Name Claude 3 Haiku
Model ID (for inference) anthropic/claude-3-haiku
Created March 13, 2024
Context Length 200,000 tokens
Max Output Tokens 4,096 tokens

Description

Claude 3 Haiku is Anthropic's fastest and most compact model, designed for near-instant responsiveness with quick and accurate targeted performance. It excels at tasks requiring rapid responses while maintaining high quality output.

Key characteristics include:

  • Near-instant response times for real-time applications
  • Compact model size optimized for efficiency
  • Strong performance on targeted tasks
  • Multimodal support (text and images)
  • Cost-effective pricing for high-volume applications

The model is ideal for use cases where speed is critical, such as chatbots, real-time assistants, content moderation, and high-throughput processing tasks.

Technical Specifications

Specification Value
Context Window 200,000 tokens
Max Completion Tokens 4,096 tokens
Data Retention 30 days
Moderation Required for API usage
Deprecation Date Not announced

Pricing

Standard Pricing

Type Rate
Input $0.25 / 1M tokens
Output $1.25 / 1M tokens
Image Input $0.40 / 1K images
Input Cache Read $0.03 / 1M tokens
Input Cache Write $0.30 / 1M tokens

Price per Token (Detailed)

Type Price per Token
Input $0.00000025
Output $0.00000125
Cache Read $0.00000003
Cache Write $0.0000003

Capabilities

Capability Supported
Reasoning Mode No
Tool/Function Calling Yes
Vision (Image Analysis) Yes
File Processing No
Streaming Yes
Caching Yes
Multi-Part Input Yes

Supported Parameters

Parameter Description
max_tokens Maximum number of tokens to generate (up to 4,096)
temperature Controls randomness (0-1)
top_p Nucleus sampling threshold
top_k Top-k sampling parameter
stop Stop sequences to end generation
tools List of available tools/functions
tool_choice Control tool selection behavior

Best Practices

  1. For High-Volume Applications: Leverage the low cost per token for batch processing tasks
  2. For Real-Time Chat: Take advantage of near-instant response times for conversational AI
  3. For Cost Optimization: Use Haiku for simpler tasks, reserving larger models for complex reasoning
  4. For Image Analysis: Utilize multimodal capability for quick image understanding tasks
  5. For Content Moderation: Ideal for high-throughput content screening
  6. For Caching: Use cache features for repeated context to further reduce costs

API Usage Example

LangMart Format

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-haiku",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

LangMart Format

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-haiku",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ],
    "max_tokens": 1024
  }'

With Image Input

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-haiku",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,..."
            }
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

With Tool Calling

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-haiku",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Claude 3 Family

Model Context Use Case
Claude 3 Opus 200K tokens Highest capability, complex reasoning
Claude 3.5 Sonnet 200K tokens Balanced performance and efficiency
Claude 3 Haiku 200K tokens Speed-optimized, cost-effective

Newer Generations

Model Context Notes
Claude 3.7 Sonnet 200K tokens Enhanced reasoning
Claude Sonnet 4 1M tokens Latest Sonnet generation
Claude Opus 4 1M tokens Latest flagship model

Providers

Available Providers

Provider Status
Anthropic Primary
Amazon Bedrock Available
Google Vertex Available

Supported Modalities

Input Modalities

  • Text
  • Images

Output Modalities

  • Text only

Performance Characteristics

Claude 3 Haiku is optimized for:

  • Speed: Near-instant responsiveness for real-time applications
  • Efficiency: Compact model architecture for lower latency
  • Accuracy: Quick and accurate targeted performance
  • Throughput: High volume processing capability

Use Case Recommendations

Use Case Suitability
Real-time chatbots Excellent
Content moderation Excellent
Quick Q&A Excellent
High-volume processing Excellent
Image captioning Good
Simple tool calling Good
Complex reasoning Consider larger models
Long-form generation Consider larger models

Source