Nous: Hermes 2 Vision 7B (Alpha)
Model Overview
Model ID: nousresearch/nous-hermes-2-vision-7b
Organization: NousResearch
Release Date: December 7, 2023
Last Updated: November 10, 2025
Model Type: Vision-Language Model (Multimodal)
Description
Nous: Hermes 2 Vision 7B is an alpha-stage vision-language model that extends the capabilities of OpenHermes-2.5 by incorporating visual perception abilities. The model was developed using a specialized training dataset that emphasizes function-calling operations, enabling it to handle structured interactions alongside multimodal inputs.
Key Characteristics:
- Extends OpenHermes-2.5 with vision capabilities
- Specialized for function-calling operations
- Developed by NousResearch team (Leadership: qnguyen3, teknium)
- Early-stage production model (Alpha)
Technical Specifications
| Property | Value |
|---|---|
| Context Window | 4,096 tokens |
| Architecture | Mistral-based |
| Parameter Count | 7 billion |
| Context Window | 4,096 tokens |
| Input Modalities | Text and Image |
| Output Modalities | Text only |
Pricing
Pricing information for this model on LangMart is not currently publicly documented. Check OpenRouter's pricing page for current rates, as they vary based on:
- Input tokens
- Output tokens
- Request volume
- User tier
Capabilities
Multimodal Processing
- Processes both textual and visual information simultaneously
- Can analyze images and understand their content in context with user queries
- Suitable for vision-based reasoning tasks
Function Calling
- Specialized support for structured function calling within conversations
- Designed to handle API calls and tool interactions
- Supports function definitions and parameter passing
Text Processing
- Full text generation and completion capabilities
- Context-aware responses up to 4,096 tokens
- Integration with system prompts and instructions
Use Cases
Ideal For:
- Multimodal Chatbots: Vision-enabled conversational AI
- Document Analysis: Processing documents with text and images
- Function Calling Workflows: Structured API interactions with visual context
- Vision-Based Automation: Automated workflows that analyze visual input
- Interactive Agents: Assistants that combine vision and function calling
Not Ideal For:
- Image Generation: Model only analyzes, doesn't generate images
- Large Context Tasks: Limited 4,096 token window
- Production-Critical Systems: Still in alpha stage
- High-Frequency API Calls: May have rate limiting on some providers
Limitations
- Alpha Status: Still in early production stage; may have limitations or unexpected behaviors
- Context Window: Limited to 4,096 tokens (relatively small for long conversations)
- Output Only: Cannot generate images, only analyze them
- Vision Scope: Vision capabilities optimized for function-calling, not general vision tasks
Related Models
- Nous-Hermes-2 (Text): Text-only version without vision
- OpenHermes-2.5: Base model for text capabilities
- GPT-4 Vision: More capable commercial alternative
- Claude 3 Vision: Another multimodal alternative
- Llava: Simpler vision-only model without function calling
Providers
Model Weights: Available via Hugging Face
- Repository:
NousResearch/Nous-Hermes-2-Vision-Alpha - Access: https://huggingface.co/NousResearch/Nous-Hermes-2-Vision-Alpha
Available On:
- LangMart: https://langmart.ai/model-docs
- Hugging Face: Direct model weights available
Parameters & Configuration
Standard Parameters
- max_tokens: Configurable, up to context window limit (4,096)
- temperature: Adjustable for response creativity (typically 0.0-2.0)
- top_p: Nucleus sampling for diversity control
- top_k: Top-K sampling parameter
- frequency_penalty: Reduce repetition
- presence_penalty: Encourage new content
Model-Specific Settings
- model:
nousresearch/nous-hermes-2-vision-7b - system: Optional system prompt for context setting
- images: Multiple images supported in single request
Usage Examples
Text-Only Completion
{
"model": "nousresearch/nous-hermes-2-vision-7b",
"messages": [
{
"role": "user",
"content": "Explain quantum entanglement in simple terms."
}
],
"max_tokens": 500,
"temperature": 0.7
}
Vision with Text
{
"model": "nousresearch/nous-hermes-2-vision-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
],
"max_tokens": 1000
}
Function Calling Example
{
"model": "nousresearch/nous-hermes-2-vision-7b",
"messages": [
{
"role": "user",
"content": "Get the weather for New York and Boston."
}
],
"functions": [
{
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
],
"function_call": "auto"
}
Integration Guide
LangMart API Integration
# Basic request using curl
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nousresearch/nous-hermes-2-vision-7b",
"messages": [
{
"role": "user",
"content": "What is 2+2?"
}
],
"max_tokens": 100
}'
Python Integration
import requests
import base64
api_key = "your-openrouter-key"
image_url = "https://example.com/image.jpg"
response = requests.post(
"https://api.langmart.ai/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "nousresearch/nous-hermes-2-vision-7b",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image:"},
{"type": "image_url", "image_url": {"url": image_url}}
]
}
],
"max_tokens": 500
}
)
print(response.json())
Performance Characteristics
Latency
- Alpha Stage: Expected latencies may vary
- Typical Response Time: Depends on context size and output length
- Cold Start: Initial requests may be slower on some platforms
Throughput
- Concurrent Requests: Varies by provider infrastructure
- Token Throughput: Optimized for efficient processing
Accuracy
- Vision Understanding: Strong for standard image analysis tasks
- Text Generation: Inherits OpenHermes-2.5's capabilities
- Function Calling: Specialized for structured outputs
Comparison with Related Models
| Model | Parameters | Context | Vision | Function Calling |
|---|---|---|---|---|
| Nous-Hermes-2-Vision-7B | 7B | 4K | Yes | Yes |
| GPT-4 Vision | - | 128K | Yes | Yes |
| Claude 3 Vision | - | 200K | Yes | Yes |
| Llava | 7B-13B | 4K | Yes | No |
Documentation & Resources
- Model Card: Hugging Face Model Card
- Documentation: https://langmart.ai/model-docs
- Organization: NousResearch
Support & Community
- Issues: Report on NousResearch GitHub
- Community: Engage with NousResearch community for questions
- Updates: Follow releases for improvements and updates
Version History
| Date | Version | Updates |
|---|---|---|
| 2025-11-10 | Alpha (Current) | Last official update |
| 2023-12-07 | Initial Release | Model release |
Notes
- This model is in alpha stage and should be used with appropriate caution in production
- As an early release, behavior and capabilities may change with updates
- Performance metrics and pricing may vary across different providers
- The model excels at function calling use cases, unlike generic vision models
Last Updated: December 23, 2025 Source: OpenRouter Model Documentation Status: Alpha (Early Production)