Nous: Hermes 3 405B Instruct
Model Overview
| Property |
Value |
| Model Name |
Nous: Hermes 3 405B Instruct |
| Model ID |
nousresearch/hermes-3-llama-3.1-405b |
| Author/Organization |
Nous Research |
| Release Date |
August 16, 2024 |
| Base Model |
Llama-3.1 405B (full-parameter finetune) |
| Architecture |
Transformer (Llama 3.1 architecture) |
Description
Hermes 3 is a generalist language model with significant improvements over its predecessor Hermes 2. It is a full-parameter finetune of Llama-3.1 405B, making it one of the largest openly available instruction-tuned models.
Key Improvements Over Hermes 2
- Advanced Agentic Capabilities: Enhanced ability to act as an autonomous agent
- Improved Roleplaying: Better performance in character-based and roleplay scenarios
- Enhanced Reasoning: Stronger logical reasoning and problem-solving abilities
- Better Multi-turn Conversation: Improved coherence across extended dialogues
- Long-context Coherence: Maintains context quality over very long conversations
- Powerful Steering Capabilities: Gives end users significant control over model behavior
- Improved Function Calling: Better structured output and tool use
- Enhanced Code Generation: More reliable code generation compared to Hermes 2
Technical Specifications
| Specification |
Value |
| Context Window |
128,000 tokens |
| Context Length |
131,072 tokens |
| Max Completion Tokens |
16,384 tokens |
| Input Modalities |
Text |
| Output Modalities |
Text |
| Instruction Format |
ChatML |
| Quantization |
FP8 |
| Parameters |
405 Billion |
Pricing
| Type |
Price |
| Input Tokens |
$1.00 per 1M tokens |
| Output Tokens |
$1.00 per 1M tokens |
Cost Examples
| Use Case |
Input Tokens |
Output Tokens |
Estimated Cost |
| Short conversation |
1,000 |
500 |
$0.0015 |
| Code generation task |
5,000 |
2,000 |
$0.007 |
| Long document analysis |
50,000 |
10,000 |
$0.06 |
| Extended agent session |
100,000 |
50,000 |
$0.15 |
Capabilities
Core Capabilities
- Text Generation: General-purpose text completion and generation
- Function Calling: Structured tool invocation with JSON schemas
- Code Generation: Multi-language code writing and debugging
- Reasoning: Complex logical reasoning and analysis
- Multi-turn Conversation: Extended dialogue with context retention
- Agentic Tasks: Autonomous task execution with tool use
| Tool Choice Option |
Description |
none |
Disable tool calling |
auto |
Model decides whether to use tools |
required |
Force tool usage |
function |
Specify exact function to call |
Structured Outputs
Supports response_format parameter for:
- JSON mode
- JSON Schema validation
- Custom structured outputs
Supported Parameters
| Parameter |
Type |
Description |
temperature |
float |
Controls randomness (0.0 - 2.0) |
top_p |
float |
Nucleus sampling threshold (0.0 - 1.0) |
top_k |
integer |
Limits vocabulary to top K tokens |
stop |
array |
Stop sequences to end generation |
frequency_penalty |
float |
Reduces repetition of frequent tokens |
presence_penalty |
float |
Reduces repetition of any repeated tokens |
repetition_penalty |
float |
Alternative repetition control |
seed |
integer |
Random seed for reproducibility |
min_p |
float |
Minimum probability threshold |
response_format |
object |
Structured output format specification |
Use Cases
Recommended For
- Agentic Applications: Autonomous agents, workflow automation
- Complex Reasoning Tasks: Logic puzzles, mathematical problems, analysis
- Code Development: Code generation, debugging, refactoring
- Roleplaying & Creative Writing: Character-based interactions, storytelling
- Long-form Content: Documents requiring extensive context
- Multi-step Tool Use: Complex workflows requiring multiple tool calls
Not Recommended For
- Low-latency Requirements: Large model size increases response time
- Cost-sensitive Applications: Higher cost compared to smaller models
- Simple Q&A: Overkill for basic question answering
API Usage Example
LangMart API
curl https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nousresearch/hermes-3-llama-3.1-405b",
"messages": [
{
"role": "system",
"content": "You are a helpful AI assistant."
},
{
"role": "user",
"content": "Explain the concept of recursion with a code example."
}
],
"temperature": 0.7,
"max_tokens": 2048
}'
Function Calling Example
curl https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nousresearch/hermes-3-llama-3.1-405b",
"messages": [
{
"role": "user",
"content": "What is the weather in San Francisco?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
Hermes 3 Family
| Model |
Parameters |
Context |
Use Case |
| Hermes 3 Llama 3.1 405B |
405B |
131K |
Maximum capability |
| Hermes 3 Llama 3.1 70B |
70B |
131K |
Balanced performance/cost |
| Hermes 3 Llama 3.1 8B |
8B |
131K |
Fast, cost-effective |
Comparable Models
| Model |
Provider |
Parameters |
Context |
| Llama 3.1 405B Instruct |
Meta |
405B |
131K |
| Claude 3 Opus |
Anthropic |
~200B* |
200K |
| GPT-4 Turbo |
OpenAI |
~1.7T* |
128K |
| Mixtral 8x22B |
Mistral |
141B |
65K |
*Estimated parameters
Providers
Primary Provider
| Provider |
Details |
| Name |
DeepInfra |
| Provider Model ID |
NousResearch/Hermes-3-Llama-3.1-405B |
| Max Completion Tokens |
16,384 |
Strengths
- Exceptional reasoning capabilities from 405B parameter count
- Industry-leading agentic performance for open models
- Strong multi-turn coherence up to 128K context
- Reliable function calling and structured outputs
- High-quality code generation across multiple languages
Considerations
- Higher latency due to model size
- Premium pricing compared to smaller models
- Requires FP8 quantization for practical deployment
Hermes 3 uses the ChatML instruction format:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you for asking! How can I help you today?<|im_end|>
Source
Last updated: December 2024