Nous: Hermes 3.1 Llama 3.1 405B

Model Overview

Property	Value
Model Name	Nous: Hermes 3.1 Llama 3.1 405B Instruct
Model ID	`nousresearch/nous-hermes-3.1-llama-3.1-405b`
Author/Organization	Nous Research
Release Date	November 2024
Base Model	Llama-3.1 405B (full-parameter finetune)
Architecture	Transformer (Llama 3.1 architecture)

Description

Nous Hermes 3.1 is an advanced iterative refinement of the Hermes 3 model family, built on top of Llama-3.1 405B. This model represents the cutting edge of open-source instruction-tuned models with enhanced performance across reasoning, agentic capabilities, and long-context understanding.

Key Improvements Over Hermes 3

Enhanced Reasoning Capabilities: Improved logical reasoning and problem-solving with better accuracy
Advanced Agentic Performance: Superior autonomous agent behavior with improved planning
Extended Context Handling: Better utilization of full 131K token context window
Improved Instruction Following: More precise adherence to complex instructions
Better Multi-turn Coherence: Enhanced context retention and conversation continuity
Refined Function Calling: More reliable structured output and tool invocation
Advanced Code Generation: Improved code quality across programming languages
Enhanced Roleplay Capabilities: Better character consistency and creativity

Technical Specifications

Specification	Value
Context Window	128,000 tokens
Context Length	131,072 tokens
Max Completion Tokens	16,384 tokens
Input Modalities	Text
Output Modalities	Text
Instruction Format	ChatML
Quantization	FP8
Parameters	405 Billion
Training Data Cutoff	November 2024

Pricing

Type	Price
Input Tokens	$1.00 per 1M tokens
Output Tokens	$1.00 per 1M tokens

Cost Examples

Use Case	Input Tokens	Output Tokens	Estimated Cost
Short conversation	1,000	500	$0.0015
Code generation task	5,000	2,000	$0.007
Long document analysis	50,000	10,000	$0.06
Extended agent session	100,000	50,000	$0.15
Full context research task	130,000	10,000	$0.14

Capabilities

Core Capabilities

Text Generation: High-quality text completion and generation
Function Calling: Structured tool invocation with JSON schemas
Code Generation: Multi-language code writing, debugging, and refactoring
Reasoning: Complex logical reasoning, analysis, and problem-solving
Multi-turn Conversation: Extended dialogue with superior context retention
Agentic Tasks: Autonomous task execution with planning and tool use
Long-context Processing: Efficient handling of documents up to 131K tokens
Instruction Following: Precise adherence to complex, multi-step instructions

Tool Use Support

Tool Choice Option	Description
`none`	Disable tool calling
`auto`	Model decides whether to use tools
`required`	Force tool usage for all responses
`function`	Specify exact function to call

Structured Outputs

Supports response_format parameter for:

JSON Mode: Generate valid JSON output
JSON Schema: Validate output against custom schemas
Custom Structured Outputs: Define specific response structures
XML Mode: Generate XML-formatted outputs

Supported Parameters

Parameter	Type	Range	Default	Description
`temperature`	float	0.0 - 2.0	0.7	Controls randomness in responses
`top_p`	float	0.0 - 1.0	0.9	Nucleus sampling threshold
`top_k`	integer	1 - 100	40	Limits vocabulary to top K tokens
`stop`	array	-	-	Stop sequences to end generation
`frequency_penalty`	float	-2.0 - 2.0	0.0	Reduces repetition of frequent tokens
`presence_penalty`	float	-2.0 - 2.0	0.0	Reduces repetition of any repeated tokens
`repetition_penalty`	float	0.0 - 2.0	1.0	Alternative repetition control
`seed`	integer	0 - 2^32	-	Random seed for reproducibility
`min_p`	float	0.0 - 1.0	0.0	Minimum probability threshold
`max_tokens`	integer	1 - 16384	-	Maximum tokens to generate
`response_format`	object	-	-	Structured output format specification

Use Cases

Recommended For

Advanced Agentic Applications: Complex autonomous agents, multi-step workflows
Technical Problem Solving: Debugging, optimization, architectural design
High-quality Code Development: Complex algorithms, system design, refactoring
Advanced Reasoning Tasks: Logic puzzles, mathematical proofs, analysis
Long-form Content Creation: Books, technical documentation, research papers
Complex Multi-turn Interactions: Extended conversations with context preservation
Roleplaying & Creative Writing: Character development, narrative creation
Knowledge Integration: Combining information from extensive documents

Not Recommended For

Low-latency Requirements: Large model size results in higher latency
Cost-sensitive Applications: Premium pricing compared to smaller models
Simple Queries: Overkill for basic Q&A or simple tasks
Mobile/Edge Deployment: Requires cloud infrastructure
Real-time Requirements: Not suitable for sub-second response needs

Hermes 3.1 Family

Model	Parameters	Context	Characteristics
Nous Hermes 3.1 Llama 3.1 405B	405B	131K	Maximum capability, latest iteration
Nous Hermes 3 Llama 3.1 405B	405B	131K	Previous version, stable
Nous Hermes 3.1 Llama 3.1 70B	70B	131K	Balanced performance/cost
Nous Hermes 3.1 Llama 3.1 8B	8B	131K	Fast, cost-effective

Alternative 405B Options

Model	Provider	Context
Llama 3.1 405B Instruct	Meta	131K
Llama 4 Maverick	Meta	131K
Mixtral 8x22B	Mistral	65K

Providers

Available Providers

Provider	Status	Details
DeepInfra	Primary	Full model availability
OpenRouter	Secondary	Aggregated access (if available)
Nous Research	Official	Direct access via Nous API

Provider Details

DeepInfra:

Provider Model ID: NousResearch/Nous-Hermes-3.1-Llama-3.1-405B
Max Completion Tokens: 16,384
Request Rate: Default limits
Availability: Full model access

Performance Characteristics

Strengths

Exceptional Reasoning: Superior logical reasoning from 405B parameters
Industry-leading Agentic Performance: Top-tier autonomous agent capabilities for open models
Strong Multi-turn Coherence: Maintains context quality over 100K+ token conversations
Reliable Function Calling: Consistent structured output and tool invocation
Excellent Code Quality: High-quality code generation across multiple languages
Superior Instruction Following: Precise execution of complex instructions
Extended Context Utilization: Efficiently uses full 131K token context window

Considerations

Higher Latency: Larger model size increases response time vs. smaller models
Premium Pricing: Higher costs compared to 70B or smaller models
Infrastructure Requirements: Requires substantial compute resources
Memory Footprint: Large model requires significant GPU/TPU memory (FP8 quantization required for practical deployment)

Comparison with Other 405B Models

Model	Organization	Parameters	Context	Use Case
Nous Hermes 3.1 Llama 3.1 405B	Nous Research	405B	131K	Advanced reasoning & agentic
Llama 3.1 405B Instruct	Meta	405B	131K	General-purpose, official base
Claude 3 Opus	Anthropic	~200B*	200K	Enterprise, safety-focused
GPT-4 Turbo	OpenAI	~1.7T*	128K	Premium closed-source
Mixtral 8x22B	Mistral	141B	65K	Efficient, open-source

*Estimated parameters

API Usage Examples

Basic Chat Completion

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "system",
        "content": "You are an expert software architect with deep knowledge of system design patterns."
      },
      {
        "role": "user",
        "content": "Design a distributed caching system for a high-traffic web application. Consider consistency, fault tolerance, and performance."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 4096
  }'

Function Calling Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "I need to retrieve the weather for New York, Boston, and Los Angeles for my trip planning."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather and forecast for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and state or country"
              },
              "days": {
                "type": "integer",
                "description": "Number of forecast days (1-7)"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Structured Output with JSON Schema

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "Analyze this code snippet and provide a detailed review with improvements."
      }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "CodeReview",
        "schema": {
          "type": "object",
          "properties": {
            "issues": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "severity": {"type": "string"},
                  "description": {"type": "string"},
                  "fix": {"type": "string"}
                }
              }
            },
            "improvements": {
              "type": "array",
              "items": {"type": "string"}
            },
            "overall_rating": {"type": "number"}
          }
        }
      }
    }
  }'

Long-context Processing

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "system",
        "content": "You are a research assistant. Analyze the provided documents and extract key insights."
      },
      {
        "role": "user",
        "content": "[Long document content - up to 131K tokens]\n\nProvide a comprehensive summary with key findings."
      }
    ],
    "temperature": 0.3,
    "max_tokens": 2048
  }'

Agentic Task with Multiple Tool Calls

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-3.1-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "Help me plan a business trip to San Francisco. I need flight bookings, hotel recommendations, and weather information."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "search_flights",
          "description": "Search for flight options",
          "parameters": {
            "type": "object",
            "properties": {
              "from": {"type": "string"},
              "to": {"type": "string"},
              "date": {"type": "string"}
            },
            "required": ["from", "to", "date"]
          }
        }
      },
      {
        "type": "function",
        "function": {
          "name": "search_hotels",
          "description": "Find hotel recommendations",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"},
              "check_in": {"type": "string"},
              "check_out": {"type": "string"},
              "price_range": {"type": "string"}
            },
            "required": ["city", "check_in", "check_out"]
          }
        }
      },
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather forecast",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"},
              "days": {"type": "integer"}
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

ChatML Format

Nous Hermes 3.1 uses the ChatML instruction format:

<|im_start|>system
You are a helpful and knowledgeable AI assistant with expertise in multiple domains.<|im_end|>
<|im_start|>user
What is quantum entanglement?<|im_end|>
<|im_start|>assistant
Quantum entanglement is a phenomenon in quantum mechanics where two or more particles become correlated in such a way that the quantum state of each particle cannot be described independently, even when the particles are separated by large distances.<|im_end|>
<|im_start|>user
How is it used in quantum computing?<|im_end|>
<|im_start|>assistant
In quantum computing, entanglement is fundamental to creating quantum gates and circuits. It allows quantum computers to process information in ways that classical computers cannot...

Performance Benchmarks

Reasoning & Analysis

Task	Performance	Notes
Complex Problem Solving	Excellent	Superior reasoning chains
Mathematical Proofs	Excellent	Handles complex logic
Code Generation	Excellent	Production-quality code
Analysis & Synthesis	Excellent	Integrates complex information

Conversation Quality

Metric	Performance
Context Retention (50K tokens)	Excellent
Context Retention (100K+ tokens)	Excellent
Multi-turn Coherence	Excellent
Instruction Following	Excellent

Optimization Tips

For Best Results

Use Clear System Prompts: Provide detailed role definitions for optimal performance
Structure Complex Requests: Break multi-step tasks into clear steps
Leverage Tool Use: Use function calling for structured information needs
Set Appropriate Temperature: Use 0.3-0.5 for deterministic tasks, 0.7-0.9 for creative content
Use Full Context: This model excels with extended context (50K+ tokens)
Enable Structured Output: Use JSON schema for consistent, machine-readable responses

Cost Optimization

Consider Hermes 3.1 70B for similar tasks to reduce costs
Batch requests to reduce overhead
Use max_tokens to avoid unnecessary token generation
Implement prompt caching for repeated queries

Limitations & Considerations

Knowledge Cutoff: Information current only through November 2024
No Real-time Information: Cannot access current data, weather, or news
No Internet Access: Cannot browse the web or fetch external URLs
Training Data Bias: May reflect biases present in training data
Hallucinations: Can generate plausible but incorrect information
Token Limits: Context and completion tokens have hard limits
Processing Speed: Large model means slower response times than smaller alternatives

Source & Documentation

Support & Issues

For model-specific issues or questions:

Nous Research: https://nousresearch.com/
Community Forums: HuggingFace Model Discussions
OpenRouter Support: https://langmart.ai/model-docs

Last updated: December 23, 2024

Note: This model documentation is based on publicly available information. Model availability and pricing may vary by provider. Please verify current availability and pricing with your chosen provider before implementation.

Nous: Hermes 3.1 Llama 3.1 405B

Nous: Hermes 3.1 Llama 3.1 405B

Model Overview

Description

Key Improvements Over Hermes 3

Technical Specifications

Pricing

Cost Examples

Capabilities

Core Capabilities

Tool Use Support

Structured Outputs

Supported Parameters

Use Cases

Recommended For

Not Recommended For

Related Models

Hermes 3.1 Family

Alternative 405B Options

Providers

Available Providers

Provider Details

Performance Characteristics

Strengths

Considerations

Comparison with Other 405B Models

API Usage Examples

Basic Chat Completion

Function Calling Example

Structured Output with JSON Schema

Long-context Processing

Agentic Task with Multiple Tool Calls

ChatML Format

Performance Benchmarks

Reasoning & Analysis

Conversation Quality

Optimization Tips

For Best Results

Cost Optimization

Limitations & Considerations

Source & Documentation

Support & Issues