N

Nous: Hermes 3.1 Llama 3.1 405B

Nous Research
128K
Context
$1.00
Input /1M
$1.00
Output /1M
16K
Max Output

Nous: Hermes 3.1 Llama 3.1 405B

Model Overview

Property Value
Model Name Nous: Hermes 3.1 Llama 3.1 405B Instruct
Model ID nousresearch/nous-hermes-3.1-llama-3.1-405b
Author/Organization Nous Research
Release Date November 2024
Base Model Llama-3.1 405B (full-parameter finetune)
Architecture Transformer (Llama 3.1 architecture)

Description

Nous Hermes 3.1 is an advanced iterative refinement of the Hermes 3 model family, built on top of Llama-3.1 405B. This model represents the cutting edge of open-source instruction-tuned models with enhanced performance across reasoning, agentic capabilities, and long-context understanding.

Key Improvements Over Hermes 3

  • Enhanced Reasoning Capabilities: Improved logical reasoning and problem-solving with better accuracy
  • Advanced Agentic Performance: Superior autonomous agent behavior with improved planning
  • Extended Context Handling: Better utilization of full 131K token context window
  • Improved Instruction Following: More precise adherence to complex instructions
  • Better Multi-turn Coherence: Enhanced context retention and conversation continuity
  • Refined Function Calling: More reliable structured output and tool invocation
  • Advanced Code Generation: Improved code quality across programming languages
  • Enhanced Roleplay Capabilities: Better character consistency and creativity

Technical Specifications

Specification Value
Context Window 128,000 tokens
Context Length 131,072 tokens
Max Completion Tokens 16,384 tokens
Input Modalities Text
Output Modalities Text
Instruction Format ChatML
Quantization FP8
Parameters 405 Billion
Training Data Cutoff November 2024

Pricing

Type Price
Input Tokens $1.00 per 1M tokens
Output Tokens $1.00 per 1M tokens

Cost Examples

Use Case Input Tokens Output Tokens Estimated Cost
Short conversation 1,000 500 $0.0015
Code generation task 5,000 2,000 $0.007
Long document analysis 50,000 10,000 $0.06
Extended agent session 100,000 50,000 $0.15
Full context research task 130,000 10,000 $0.14

Capabilities

Core Capabilities

  • Text Generation: High-quality text completion and generation
  • Function Calling: Structured tool invocation with JSON schemas
  • Code Generation: Multi-language code writing, debugging, and refactoring
  • Reasoning: Complex logical reasoning, analysis, and problem-solving
  • Multi-turn Conversation: Extended dialogue with superior context retention
  • Agentic Tasks: Autonomous task execution with planning and tool use
  • Long-context Processing: Efficient handling of documents up to 131K tokens
  • Instruction Following: Precise adherence to complex, multi-step instructions

Tool Use Support

Tool Choice Option Description
none Disable tool calling
auto Model decides whether to use tools
required Force tool usage for all responses
function Specify exact function to call

Structured Outputs

Supports response_format parameter for:

  • JSON Mode: Generate valid JSON output
  • JSON Schema: Validate output against custom schemas
  • Custom Structured Outputs: Define specific response structures
  • XML Mode: Generate XML-formatted outputs

Supported Parameters

Parameter Type Range Default Description
temperature float 0.0 - 2.0 0.7 Controls randomness in responses
top_p float 0.0 - 1.0 0.9 Nucleus sampling threshold
top_k integer 1 - 100 40 Limits vocabulary to top K tokens
stop array - - Stop sequences to end generation
frequency_penalty float -2.0 - 2.0 0.0 Reduces repetition of frequent tokens
presence_penalty float -2.0 - 2.0 0.0 Reduces repetition of any repeated tokens
repetition_penalty float 0.0 - 2.0 1.0 Alternative repetition control
seed integer 0 - 2^32 - Random seed for reproducibility
min_p float 0.0 - 1.0 0.0 Minimum probability threshold
max_tokens integer 1 - 16384 - Maximum tokens to generate
response_format object - - Structured output format specification

Use Cases

  1. Advanced Agentic Applications: Complex autonomous agents, multi-step workflows
  2. Technical Problem Solving: Debugging, optimization, architectural design
  3. High-quality Code Development: Complex algorithms, system design, refactoring
  4. Advanced Reasoning Tasks: Logic puzzles, mathematical proofs, analysis
  5. Long-form Content Creation: Books, technical documentation, research papers
  6. Complex Multi-turn Interactions: Extended conversations with context preservation
  7. Roleplaying & Creative Writing: Character development, narrative creation
  8. Knowledge Integration: Combining information from extensive documents
  1. Low-latency Requirements: Large model size results in higher latency
  2. Cost-sensitive Applications: Premium pricing compared to smaller models
  3. Simple Queries: Overkill for basic Q&A or simple tasks
  4. Mobile/Edge Deployment: Requires cloud infrastructure
  5. Real-time Requirements: Not suitable for sub-second response needs

Hermes 3.1 Family

Model Parameters Context Characteristics
Nous Hermes 3.1 Llama 3.1 405B 405B 131K Maximum capability, latest iteration
Nous Hermes 3 Llama 3.1 405B 405B 131K Previous version, stable
Nous Hermes 3.1 Llama 3.1 70B 70B 131K Balanced performance/cost
Nous Hermes 3.1 Llama 3.1 8B 8B 131K Fast, cost-effective

Alternative 405B Options

Model Provider Context
Llama 3.1 405B Instruct Meta 131K
Llama 4 Maverick Meta 131K
Mixtral 8x22B Mistral 65K

Providers

Available Providers

Provider Status Details
DeepInfra Primary Full model availability
OpenRouter Secondary Aggregated access (if available)
Nous Research Official Direct access via Nous API

Provider Details

DeepInfra:

  • Provider Model ID: NousResearch/Nous-Hermes-3.1-Llama-3.1-405B
  • Max Completion Tokens: 16,384
  • Request Rate: Default limits
  • Availability: Full model access

Performance Characteristics

Strengths

  • Exceptional Reasoning: Superior logical reasoning from 405B parameters
  • Industry-leading Agentic Performance: Top-tier autonomous agent capabilities for open models
  • Strong Multi-turn Coherence: Maintains context quality over 100K+ token conversations
  • Reliable Function Calling: Consistent structured output and tool invocation
  • Excellent Code Quality: High-quality code generation across multiple languages
  • Superior Instruction Following: Precise execution of complex instructions
  • Extended Context Utilization: Efficiently uses full 131K token context window

Considerations

  • Higher Latency: Larger model size increases response time vs. smaller models
  • Premium Pricing: Higher costs compared to 70B or smaller models
  • Infrastructure Requirements: Requires substantial compute resources
  • Memory Footprint: Large model requires significant GPU/TPU memory (FP8 quantization required for practical deployment)

Comparison with Other 405B Models

Model Organization Parameters Context Use Case
Nous Hermes 3.1 Llama 3.1 405B Nous Research 405B 131K Advanced reasoning & agentic
Llama 3.1 405B Instruct Meta 405B 131K General-purpose, official base
Claude 3 Opus Anthropic ~200B* 200K Enterprise, safety-focused
GPT-4 Turbo OpenAI ~1.7T* 128K Premium closed-source
Mixtral 8x22B Mistral 141B 65K Efficient, open-source

*Estimated parameters


API Usage Examples

Basic Chat Completion

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "system",
        "content": "You are an expert software architect with deep knowledge of system design patterns."
      },
      {
        "role": "user",
        "content": "Design a distributed caching system for a high-traffic web application. Consider consistency, fault tolerance, and performance."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 4096
  }'

Function Calling Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "I need to retrieve the weather for New York, Boston, and Los Angeles for my trip planning."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather and forecast for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and state or country"
              },
              "days": {
                "type": "integer",
                "description": "Number of forecast days (1-7)"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Structured Output with JSON Schema

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "Analyze this code snippet and provide a detailed review with improvements."
      }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "CodeReview",
        "schema": {
          "type": "object",
          "properties": {
            "issues": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "severity": {"type": "string"},
                  "description": {"type": "string"},
                  "fix": {"type": "string"}
                }
              }
            },
            "improvements": {
              "type": "array",
              "items": {"type": "string"}
            },
            "overall_rating": {"type": "number"}
          }
        }
      }
    }
  }'

Long-context Processing

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "system",
        "content": "You are a research assistant. Analyze the provided documents and extract key insights."
      },
      {
        "role": "user",
        "content": "[Long document content - up to 131K tokens]\n\nProvide a comprehensive summary with key findings."
      }
    ],
    "temperature": 0.3,
    "max_tokens": 2048
  }'

Agentic Task with Multiple Tool Calls

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "Help me plan a business trip to San Francisco. I need flight bookings, hotel recommendations, and weather information."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "search_flights",
          "description": "Search for flight options",
          "parameters": {
            "type": "object",
            "properties": {
              "from": {"type": "string"},
              "to": {"type": "string"},
              "date": {"type": "string"}
            },
            "required": ["from", "to", "date"]
          }
        }
      },
      {
        "type": "function",
        "function": {
          "name": "search_hotels",
          "description": "Find hotel recommendations",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"},
              "check_in": {"type": "string"},
              "check_out": {"type": "string"},
              "price_range": {"type": "string"}
            },
            "required": ["city", "check_in", "check_out"]
          }
        }
      },
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather forecast",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"},
              "days": {"type": "integer"}
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

ChatML Format

Nous Hermes 3.1 uses the ChatML instruction format:

<|im_start|>system
You are a helpful and knowledgeable AI assistant with expertise in multiple domains.<|im_end|>
<|im_start|>user
What is quantum entanglement?<|im_end|>
<|im_start|>assistant
Quantum entanglement is a phenomenon in quantum mechanics where two or more particles become correlated in such a way that the quantum state of each particle cannot be described independently, even when the particles are separated by large distances.<|im_end|>
<|im_start|>user
How is it used in quantum computing?<|im_end|>
<|im_start|>assistant
In quantum computing, entanglement is fundamental to creating quantum gates and circuits. It allows quantum computers to process information in ways that classical computers cannot...

Performance Benchmarks

Reasoning & Analysis

Task Performance Notes
Complex Problem Solving Excellent Superior reasoning chains
Mathematical Proofs Excellent Handles complex logic
Code Generation Excellent Production-quality code
Analysis & Synthesis Excellent Integrates complex information

Conversation Quality

Metric Performance
Context Retention (50K tokens) Excellent
Context Retention (100K+ tokens) Excellent
Multi-turn Coherence Excellent
Instruction Following Excellent

Optimization Tips

For Best Results

  1. Use Clear System Prompts: Provide detailed role definitions for optimal performance
  2. Structure Complex Requests: Break multi-step tasks into clear steps
  3. Leverage Tool Use: Use function calling for structured information needs
  4. Set Appropriate Temperature: Use 0.3-0.5 for deterministic tasks, 0.7-0.9 for creative content
  5. Use Full Context: This model excels with extended context (50K+ tokens)
  6. Enable Structured Output: Use JSON schema for consistent, machine-readable responses

Cost Optimization

  • Consider Hermes 3.1 70B for similar tasks to reduce costs
  • Batch requests to reduce overhead
  • Use max_tokens to avoid unnecessary token generation
  • Implement prompt caching for repeated queries

Limitations & Considerations

  • Knowledge Cutoff: Information current only through November 2024
  • No Real-time Information: Cannot access current data, weather, or news
  • No Internet Access: Cannot browse the web or fetch external URLs
  • Training Data Bias: May reflect biases present in training data
  • Hallucinations: Can generate plausible but incorrect information
  • Token Limits: Context and completion tokens have hard limits
  • Processing Speed: Large model means slower response times than smaller alternatives

Source & Documentation


Support & Issues

For model-specific issues or questions:


Last updated: December 23, 2024

Note: This model documentation is based on publicly available information. Model availability and pricing may vary by provider. Please verify current availability and pricing with your chosen provider before implementation.