Nous: Hermes 3 405B Instruct

Model Overview

Property	Value
Model Name	Nous: Hermes 3 405B Instruct
Model ID	`nousresearch/hermes-3-llama-3.1-405b`
Author/Organization	Nous Research
Release Date	August 16, 2024
Base Model	Llama-3.1 405B (full-parameter finetune)
Architecture	Transformer (Llama 3.1 architecture)

Description

Hermes 3 is a generalist language model with significant improvements over its predecessor Hermes 2. It is a full-parameter finetune of Llama-3.1 405B, making it one of the largest openly available instruction-tuned models.

Key Improvements Over Hermes 2

Advanced Agentic Capabilities: Enhanced ability to act as an autonomous agent
Improved Roleplaying: Better performance in character-based and roleplay scenarios
Enhanced Reasoning: Stronger logical reasoning and problem-solving abilities
Better Multi-turn Conversation: Improved coherence across extended dialogues
Long-context Coherence: Maintains context quality over very long conversations
Powerful Steering Capabilities: Gives end users significant control over model behavior
Improved Function Calling: Better structured output and tool use
Enhanced Code Generation: More reliable code generation compared to Hermes 2

Technical Specifications

Specification	Value
Context Window	128,000 tokens
Context Length	131,072 tokens
Max Completion Tokens	16,384 tokens
Input Modalities	Text
Output Modalities	Text
Instruction Format	ChatML
Quantization	FP8
Parameters	405 Billion

Pricing

Type	Price
Input Tokens	$1.00 per 1M tokens
Output Tokens	$1.00 per 1M tokens

Cost Examples

Use Case	Input Tokens	Output Tokens	Estimated Cost
Short conversation	1,000	500	$0.0015
Code generation task	5,000	2,000	$0.007
Long document analysis	50,000	10,000	$0.06
Extended agent session	100,000	50,000	$0.15

Capabilities

Core Capabilities

Text Generation: General-purpose text completion and generation
Function Calling: Structured tool invocation with JSON schemas
Code Generation: Multi-language code writing and debugging
Reasoning: Complex logical reasoning and analysis
Multi-turn Conversation: Extended dialogue with context retention
Agentic Tasks: Autonomous task execution with tool use

Tool Use Support

Tool Choice Option	Description
`none`	Disable tool calling
`auto`	Model decides whether to use tools
`required`	Force tool usage
`function`	Specify exact function to call

Structured Outputs

Supports response_format parameter for:

JSON mode
JSON Schema validation
Custom structured outputs

Supported Parameters

Parameter	Type	Description
`temperature`	float	Controls randomness (0.0 - 2.0)
`top_p`	float	Nucleus sampling threshold (0.0 - 1.0)
`top_k`	integer	Limits vocabulary to top K tokens
`stop`	array	Stop sequences to end generation
`frequency_penalty`	float	Reduces repetition of frequent tokens
`presence_penalty`	float	Reduces repetition of any repeated tokens
`repetition_penalty`	float	Alternative repetition control
`seed`	integer	Random seed for reproducibility
`min_p`	float	Minimum probability threshold
`response_format`	object	Structured output format specification

Use Cases

Recommended For

Agentic Applications: Autonomous agents, workflow automation
Complex Reasoning Tasks: Logic puzzles, mathematical problems, analysis
Code Development: Code generation, debugging, refactoring
Roleplaying & Creative Writing: Character-based interactions, storytelling
Long-form Content: Documents requiring extensive context
Multi-step Tool Use: Complex workflows requiring multiple tool calls

Not Recommended For

Low-latency Requirements: Large model size increases response time
Cost-sensitive Applications: Higher cost compared to smaller models
Simple Q&A: Overkill for basic question answering

API Usage Example

LangMart API

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/hermes-3-llama-3.1-405b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user",
        "content": "Explain the concept of recursion with a code example."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

Function Calling Example

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/hermes-3-llama-3.1-405b",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in San Francisco?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and state"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Hermes 3 Family

Model	Parameters	Context	Use Case
Hermes 3 Llama 3.1 405B	405B	131K	Maximum capability
Hermes 3 Llama 3.1 70B	70B	131K	Balanced performance/cost
Hermes 3 Llama 3.1 8B	8B	131K	Fast, cost-effective

Comparable Models

Model	Provider	Parameters	Context
Llama 3.1 405B Instruct	Meta	405B	131K
Claude 3 Opus	Anthropic	~200B*	200K
GPT-4 Turbo	OpenAI	~1.7T*	128K
Mixtral 8x22B	Mistral	141B	65K

*Estimated parameters

Providers

Primary Provider

Provider	Details
Name	DeepInfra
Provider Model ID	`NousResearch/Hermes-3-Llama-3.1-405B`
Max Completion Tokens	16,384

Performance Characteristics

Strengths

Exceptional reasoning capabilities from 405B parameter count
Industry-leading agentic performance for open models
Strong multi-turn coherence up to 128K context
Reliable function calling and structured outputs
High-quality code generation across multiple languages

Considerations

Higher latency due to model size
Premium pricing compared to smaller models
Requires FP8 quantization for practical deployment

ChatML Format

Hermes 3 uses the ChatML instruction format:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you for asking! How can I help you today?<|im_end|>

Source

Last updated: December 2024

Nous: Hermes 3 405B Instruct

Nous: Hermes 3 405B Instruct

Model Overview

Description

Key Improvements Over Hermes 2

Technical Specifications

Pricing

Cost Examples

Capabilities

Core Capabilities

Tool Use Support

Structured Outputs

Supported Parameters

Use Cases

Recommended For

Not Recommended For

API Usage Example

LangMart API

Function Calling Example

Related Models

Hermes 3 Family

Comparable Models

Providers

Primary Provider

Performance Characteristics

Strengths

Considerations

ChatML Format

Source