Q

Qwen 2.5 7B Instruct

Qwen
Vision
33K
Context
$4.0e-8
Input /1M
$1.0e-7
Output /1M
N/A
Max Output

Qwen 2.5 7B Instruct

Model ID: qwen/qwen-2.5-7b-instruct
Provider: Qwen - Alibaba's Qwen family
Canonical Slug: qwen/qwen-2.5-7b-instruct

Overview

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:

  • Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.

  • Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.

  • Long-context Support up to 128K tokens and can generate up to 8K tokens.

  • Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

Specifications

Specification Value
Context Window 32,768 tokens
Max Output Tokens Unknown
Modality text->text
Model Architecture text to text
Release Date 1729036800

Pricing

Metric Price
Prompt Cost $0.00000004 per 1M tokens
Completion Cost $0.0000001 per 1M tokens
Currency USD

Capabilities

  • Text Generation
  • Instruction Following

Supported Parameters

The model supports the following parameters in API requests:

  • temperature: Controls randomness (0.0 - 2.0), default: 1.0
  • top_p: Nucleus sampling (0.0 - 1.0), default: 1.0
  • top_k: Top-k filtering
  • frequency_penalty: Reduces repetition (-2.0 to 2.0)
  • presence_penalty: Encourages new topics (-2.0 to 2.0)
  • repetition_penalty: Alternative repetition control (0.5 - 2.0)
  • stop: Stop sequences
  • seed: Random seed for reproducibility
  • max_tokens: Maximum output length

API Usage Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen-2.5-7b-instruct",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "temperature": 1.0,
    "max_tokens": 2048,
    "top_p": 1.0
  }'

Performance Metrics

Speed & Quality Tradeoff

  • Inference Speed: Fast
  • Quality Tier: Advanced
  • Cost Efficiency: Optimized for production
  • Long-form text generation
  • Code generation and analysis
  • Conversational AI
  • Complex reasoning tasks
  • Information synthesis

From Same Provider

  • qwen/qwen3-max
  • qwen/qwen3-coder-plus
  • qwen/qwen-2.5-72b-instruct
  • qwen/qwen-2.5-32b-instruct
  • qwen/qwen-2.5-14b-instruct

Comparable Models from Other Providers

  • OpenAI: GPT-4 Turbo, GPT-4o
  • Anthropic: Claude 3.5 Sonnet
  • Google: Gemini 2.0 Flash
  • DeepSeek: DeepSeek-R1

Python Integration

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://api.langmart.ai/v1"
)

message = client.messages.create(
    model="qwen/qwen-2.5-7b-instruct",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": "Your prompt here"
        }
    ]
)

print(message.content[0].text)

JavaScript/Node.js Integration

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.LANGMART_API_KEY,
  baseURL: "https://api.langmart.ai/v1",
});

const completion = await openai.chat.completions.create({
  model: "qwen/qwen-2.5-7b-instruct",
  messages: [
    {
      role: "user",
      content: "Your prompt here",
    },
  ],
  max_tokens: 2048,
});

console.log(completion.choices[0].message.content);

Performance Notes

Strengths

  • Efficient inference with good quality
  • Well-suited for production workloads
  • Strong instruction-following ability
  • Balanced performance and cost

Considerations

  • Context length may be limited for very long documents
  • Specialized for specific tasks

Additional Information

  • Hugging Face Model: Qwen/Qwen2.5-7B-Instruct
  • License: Open or Commercial (depends on provider)
  • Streaming: Supported
  • Function Calling: Depends on model configuration
  • Vision Capabilities: No
  • Web Search: No

Availability & Status

  • LangMart Status: Available
  • Rate Limits: Standard LangMart limits apply
  • SLA: Subject to provider availability

Documentation Generated: 2025-12-24
Source: LangMart API & Public Documentation
Last Updated: December 2025