M

Meta Llama 2 13B Chat

Meta
4K
Context
$0.1000
Input /1M
$0.1000
Output /1M
N/A
Max Output

Meta Llama 2 13B Chat

Model Overview

Model ID: meta-llama/llama-2-13b-chat Creator: Meta (Llama Team) Release Date: June 20, 2023 Model Type: Chat-Optimized Language Model

Description

Meta Llama 2 13B Chat is a 13 billion parameter language model fine-tuned specifically for chat completions and conversational tasks. This is Meta's open-source contribution designed for dialogue-based applications and instruction-following capabilities.

Technical Specifications

Model Architecture

  • Parameter Count: 13 billion parameters
  • Model Family: Llama 2
  • Instruction Type: Llama2
  • Fine-tuning: Chat-optimized through instruction fine-tuning

Input/Output Configuration

  • Context Window: 4,096 tokens
  • Input Modalities: Text
  • Output Modalities: Text
  • Default Stop Sequences: </s>, [INST]

Model Variants

  • Full Model: meta-llama/Llama-2-13b-chat-hf (Hugging Face)
  • LangMart Endpoint: meta-llama/llama-2-13b-chat

Pricing

Note: Pricing information varies by provider and API platform. On LangMart, check the model pricing page for current rates. Generally:

  • Input tokens: Typically $0.1 per 1M tokens (subject to variation)
  • Output tokens: Typically $0.1 per 1M tokens (subject to variation)
  • Consult LangMart pricing directly for exact rates

Performance Characteristics

Capabilities

  • Chat-optimized for conversational tasks
  • High-quality text generation and completion
  • Multi-turn conversation support
  • Instruction following and chat-based reasoning
  • Training-friendly for fine-tuning on domain-specific tasks

Context and Limitations

  • 4,096 token context window (suitable for most conversations)
  • Optimized for chat interactions rather than general text processing
  • Open-source model with community support
  • No specific performance benchmarks provided on LangMart

API Integration

LangMart API Usage

# Request Format
curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-2-13b-chat",
    "messages": [
      {
        "role": "user",
        "content": "Hello, how can you help me?"
      }
    ],
    "max_tokens": 2048,
    "temperature": 0.7
  }'

API Parameters

Parameter Type Description Default
Context Window 4,096 tokens
model string Model identifier Required: meta-llama/llama-2-13b-chat
messages array Conversation history Required
max_tokens integer Maximum response length 2048
temperature float Response randomness (0-2) 0.7
top_p float Nucleus sampling parameter 1.0
frequency_penalty float Reduce repetition 0.0
presence_penalty float Encourage new tokens 0.0
stop array Stop sequences ["</s>", "[INST]"]

Usage Examples

Basic Chat Completion

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-2-13b-chat",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms."
      }
    ]
  }'

Multi-Turn Conversation

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-2-13b-chat",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      },
      {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      {
        "role": "user",
        "content": "What is its population?"
      }
    ]
  }'

Code Generation

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-2-13b-chat",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function to calculate factorial."
      }
    ],
    "temperature": 0.5
  }'

Creative Writing

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-2-13b-chat",
    "messages": [
      {
        "role": "user",
        "content": "Write a short science fiction story about space exploration."
      }
    ],
    "temperature": 0.9,
    "max_tokens": 1024
  }'

Model Availability

Platform Status Notes
Hugging Face Available Model: meta-llama/Llama-2-13b-chat-hf
LangMart Available Accessible via API endpoint
Replicate Available Inference platform option
Together AI Available API endpoint available
Other Providers Variable Check provider status

Training and Fine-tuning

This model is suitable for fine-tuning on:

  • Domain-specific chat applications
  • Customer service automation
  • Q&A systems
  • Dialogue-based applications
  • Instruction-following tasks

Requires appropriate licensing and resources for fine-tuning on custom datasets.

Safety and Ethical Considerations

  • Built with safety techniques from Meta's responsible AI research
  • Suitable for production deployment with appropriate monitoring
  • Community-maintained with open governance
  • Designed to reduce harmful outputs through instruction fine-tuning

Comparison with Other Models

Similar Models

  • Llama 2 7B Chat: Smaller, faster variant (7B parameters)
  • Llama 2 70B Chat: Larger, more capable variant (70B parameters)
  • Mistral 7B Instruct: Similar size, alternative architecture
  • Neural Chat 7B: Instruction-tuned alternative

Size vs Capability Tradeoff

  • 7B: Faster inference, lower memory, acceptable quality
  • 13B: Balance of speed and quality (recommended for most use cases)
  • 70B: Best quality, slower inference, higher resource requirements

Integration Notes

LangMart/LangChain Integration

from openai import OpenAI  # LangMart compatible

client = OpenAI(
    model_name="meta-llama/llama-2-13b-chat",
    api_key="YOUR_API_KEY",
    temperature=0.7,
    max_tokens=2048
)

response = llm("Hello, how can you help me?")

OpenAI-Compatible Endpoint

The model is compatible with OpenAI-style API calls:

import openai

openai.api_key = "YOUR_LANGMART_API_KEY"
openai.api_base = "https://api.langmart.ai/v1"

response = openai.ChatCompletion.create(
    model="meta-llama/llama-2-13b-chat",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

Last Updated

November 10, 2025

Resources


This documentation was generated from LangMart model data. For the most current information, visit the LangMart model documentation.