N

Nous: Hermes 3 405B Instruct

Nous Research
128K
Context
$1.00
Input /1M
$1.00
Output /1M
16K
Max Output

Nous: Hermes 3 405B Instruct

Model Overview

Property Value
Model Name Nous: Hermes 3 405B Instruct
Model ID nousresearch/hermes-3-llama-3.1-405b
Author/Organization Nous Research
Release Date August 16, 2024
Base Model Llama-3.1 405B (full-parameter finetune)
Architecture Transformer (Llama 3.1 architecture)

Description

Hermes 3 is a generalist language model with significant improvements over its predecessor Hermes 2. It is a full-parameter finetune of Llama-3.1 405B, making it one of the largest openly available instruction-tuned models.

Key Improvements Over Hermes 2

  • Advanced Agentic Capabilities: Enhanced ability to act as an autonomous agent
  • Improved Roleplaying: Better performance in character-based and roleplay scenarios
  • Enhanced Reasoning: Stronger logical reasoning and problem-solving abilities
  • Better Multi-turn Conversation: Improved coherence across extended dialogues
  • Long-context Coherence: Maintains context quality over very long conversations
  • Powerful Steering Capabilities: Gives end users significant control over model behavior
  • Improved Function Calling: Better structured output and tool use
  • Enhanced Code Generation: More reliable code generation compared to Hermes 2

Technical Specifications

Specification Value
Context Window 128,000 tokens
Context Length 131,072 tokens
Max Completion Tokens 16,384 tokens
Input Modalities Text
Output Modalities Text
Instruction Format ChatML
Quantization FP8
Parameters 405 Billion

Pricing

Type Price
Input Tokens $1.00 per 1M tokens
Output Tokens $1.00 per 1M tokens

Cost Examples

Use Case Input Tokens Output Tokens Estimated Cost
Short conversation 1,000 500 $0.0015
Code generation task 5,000 2,000 $0.007
Long document analysis 50,000 10,000 $0.06
Extended agent session 100,000 50,000 $0.15

Capabilities

Core Capabilities

  • Text Generation: General-purpose text completion and generation
  • Function Calling: Structured tool invocation with JSON schemas
  • Code Generation: Multi-language code writing and debugging
  • Reasoning: Complex logical reasoning and analysis
  • Multi-turn Conversation: Extended dialogue with context retention
  • Agentic Tasks: Autonomous task execution with tool use

Tool Use Support

Tool Choice Option Description
none Disable tool calling
auto Model decides whether to use tools
required Force tool usage
function Specify exact function to call

Structured Outputs

Supports response_format parameter for:

  • JSON mode
  • JSON Schema validation
  • Custom structured outputs

Supported Parameters

Parameter Type Description
temperature float Controls randomness (0.0 - 2.0)
top_p float Nucleus sampling threshold (0.0 - 1.0)
top_k integer Limits vocabulary to top K tokens
stop array Stop sequences to end generation
frequency_penalty float Reduces repetition of frequent tokens
presence_penalty float Reduces repetition of any repeated tokens
repetition_penalty float Alternative repetition control
seed integer Random seed for reproducibility
min_p float Minimum probability threshold
response_format object Structured output format specification

Use Cases

  1. Agentic Applications: Autonomous agents, workflow automation
  2. Complex Reasoning Tasks: Logic puzzles, mathematical problems, analysis
  3. Code Development: Code generation, debugging, refactoring
  4. Roleplaying & Creative Writing: Character-based interactions, storytelling
  5. Long-form Content: Documents requiring extensive context
  6. Multi-step Tool Use: Complex workflows requiring multiple tool calls
  1. Low-latency Requirements: Large model size increases response time
  2. Cost-sensitive Applications: Higher cost compared to smaller models
  3. Simple Q&A: Overkill for basic question answering

API Usage Example

LangMart API

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/hermes-3-llama-3.1-405b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user",
        "content": "Explain the concept of recursion with a code example."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

Function Calling Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/hermes-3-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in San Francisco?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and state"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Hermes 3 Family

Model Parameters Context Use Case
Hermes 3 Llama 3.1 405B 405B 131K Maximum capability
Hermes 3 Llama 3.1 70B 70B 131K Balanced performance/cost
Hermes 3 Llama 3.1 8B 8B 131K Fast, cost-effective

Comparable Models

Model Provider Parameters Context
Llama 3.1 405B Instruct Meta 405B 131K
Claude 3 Opus Anthropic ~200B* 200K
GPT-4 Turbo OpenAI ~1.7T* 128K
Mixtral 8x22B Mistral 141B 65K

*Estimated parameters


Providers

Primary Provider

Provider Details
Name DeepInfra
Provider Model ID NousResearch/Hermes-3-Llama-3.1-405B
Max Completion Tokens 16,384

Performance Characteristics

Strengths

  • Exceptional reasoning capabilities from 405B parameter count
  • Industry-leading agentic performance for open models
  • Strong multi-turn coherence up to 128K context
  • Reliable function calling and structured outputs
  • High-quality code generation across multiple languages

Considerations

  • Higher latency due to model size
  • Premium pricing compared to smaller models
  • Requires FP8 quantization for practical deployment

ChatML Format

Hermes 3 uses the ChatML instruction format:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you for asking! How can I help you today?<|im_end|>

Source


Last updated: December 2024