N

Nous: Hermes 2 Vision 7B (Alpha)

Nous Research
Vision
4K
Context
N/A
Input /1M
N/A
Output /1M
N/A
Max Output

Nous: Hermes 2 Vision 7B (Alpha)

Model Overview

Model ID: nousresearch/nous-hermes-2-vision-7b Organization: NousResearch Release Date: December 7, 2023 Last Updated: November 10, 2025 Model Type: Vision-Language Model (Multimodal)

Description

Nous: Hermes 2 Vision 7B is an alpha-stage vision-language model that extends the capabilities of OpenHermes-2.5 by incorporating visual perception abilities. The model was developed using a specialized training dataset that emphasizes function-calling operations, enabling it to handle structured interactions alongside multimodal inputs.

Key Characteristics:

  • Extends OpenHermes-2.5 with vision capabilities
  • Specialized for function-calling operations
  • Developed by NousResearch team (Leadership: qnguyen3, teknium)
  • Early-stage production model (Alpha)

Technical Specifications

Property Value
Context Window 4,096 tokens
Architecture Mistral-based
Parameter Count 7 billion
Context Window 4,096 tokens
Input Modalities Text and Image
Output Modalities Text only

Pricing

Pricing information for this model on LangMart is not currently publicly documented. Check OpenRouter's pricing page for current rates, as they vary based on:

  • Input tokens
  • Output tokens
  • Request volume
  • User tier

Capabilities

Multimodal Processing

  • Processes both textual and visual information simultaneously
  • Can analyze images and understand their content in context with user queries
  • Suitable for vision-based reasoning tasks

Function Calling

  • Specialized support for structured function calling within conversations
  • Designed to handle API calls and tool interactions
  • Supports function definitions and parameter passing

Text Processing

  • Full text generation and completion capabilities
  • Context-aware responses up to 4,096 tokens
  • Integration with system prompts and instructions

Use Cases

Ideal For:

  • Multimodal Chatbots: Vision-enabled conversational AI
  • Document Analysis: Processing documents with text and images
  • Function Calling Workflows: Structured API interactions with visual context
  • Vision-Based Automation: Automated workflows that analyze visual input
  • Interactive Agents: Assistants that combine vision and function calling

Not Ideal For:

  • Image Generation: Model only analyzes, doesn't generate images
  • Large Context Tasks: Limited 4,096 token window
  • Production-Critical Systems: Still in alpha stage
  • High-Frequency API Calls: May have rate limiting on some providers

Limitations

  1. Alpha Status: Still in early production stage; may have limitations or unexpected behaviors
  2. Context Window: Limited to 4,096 tokens (relatively small for long conversations)
  3. Output Only: Cannot generate images, only analyze them
  4. Vision Scope: Vision capabilities optimized for function-calling, not general vision tasks
  • Nous-Hermes-2 (Text): Text-only version without vision
  • OpenHermes-2.5: Base model for text capabilities
  • GPT-4 Vision: More capable commercial alternative
  • Claude 3 Vision: Another multimodal alternative
  • Llava: Simpler vision-only model without function calling

Providers

Model Weights: Available via Hugging Face

Available On:

Parameters & Configuration

Standard Parameters

  • max_tokens: Configurable, up to context window limit (4,096)
  • temperature: Adjustable for response creativity (typically 0.0-2.0)
  • top_p: Nucleus sampling for diversity control
  • top_k: Top-K sampling parameter
  • frequency_penalty: Reduce repetition
  • presence_penalty: Encourage new content

Model-Specific Settings

  • model: nousresearch/nous-hermes-2-vision-7b
  • system: Optional system prompt for context setting
  • images: Multiple images supported in single request

Usage Examples

Text-Only Completion

{
  "model": "nousresearch/nous-hermes-2-vision-7b",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum entanglement in simple terms."
    }
  ],
  "max_tokens": 500,
  "temperature": 0.7
}

Vision with Text

{
  "model": "nousresearch/nous-hermes-2-vision-7b",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 1000
}

Function Calling Example

{
  "model": "nousresearch/nous-hermes-2-vision-7b",
  "messages": [
    {
      "role": "user",
      "content": "Get the weather for New York and Boston."
    }
  ],
  "functions": [
    {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name"
          }
        },
        "required": ["location"]
      }
    }
  ],
  "function_call": "auto"
}

Integration Guide

LangMart API Integration

# Basic request using curl
curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/nous-hermes-2-vision-7b",
    "messages": [
      {
        "role": "user",
        "content": "What is 2+2?"
      }
    ],
    "max_tokens": 100
  }'

Python Integration

import requests
import base64

api_key = "your-openrouter-key"
image_url = "https://example.com/image.jpg"

response = requests.post(
  "https://api.langmart.ai/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
  },
  json={
    "model": "nousresearch/nous-hermes-2-vision-7b",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Describe this image:"},
          {"type": "image_url", "image_url": {"url": image_url}}
        ]
      }
    ],
    "max_tokens": 500
  }
)

print(response.json())

Performance Characteristics

Latency

  • Alpha Stage: Expected latencies may vary
  • Typical Response Time: Depends on context size and output length
  • Cold Start: Initial requests may be slower on some platforms

Throughput

  • Concurrent Requests: Varies by provider infrastructure
  • Token Throughput: Optimized for efficient processing

Accuracy

  • Vision Understanding: Strong for standard image analysis tasks
  • Text Generation: Inherits OpenHermes-2.5's capabilities
  • Function Calling: Specialized for structured outputs
Model Parameters Context Vision Function Calling
Nous-Hermes-2-Vision-7B 7B 4K Yes Yes
GPT-4 Vision - 128K Yes Yes
Claude 3 Vision - 200K Yes Yes
Llava 7B-13B 4K Yes No

Documentation & Resources

Support & Community

  • Issues: Report on NousResearch GitHub
  • Community: Engage with NousResearch community for questions
  • Updates: Follow releases for improvements and updates

Version History

Date Version Updates
2025-11-10 Alpha (Current) Last official update
2023-12-07 Initial Release Model release

Notes

  • This model is in alpha stage and should be used with appropriate caution in production
  • As an early release, behavior and capabilities may change with updates
  • Performance metrics and pricing may vary across different providers
  • The model excels at function calling use cases, unlike generic vision models

Last Updated: December 23, 2025 Source: OpenRouter Model Documentation Status: Alpha (Early Production)