U

SOLAR-10.7B-Instruct-v1.0 Model Documentation

Upstage
4K
Context
Free
Input /1M
Free
Output /1M
N/A
Max Output

SOLAR-10.7B-Instruct-v1.0 Model Documentation

Overview

SOLAR-10.7B-Instruct-v1.0 is an instruction-tuned variant of SOLAR-10.7B, a 10.7 billion parameter large language model developed by Upstage. It demonstrates state-of-the-art performance among models under 30B parameters and outperforms significantly larger models.

Release Date: December 13, 2023

Model Specifications

Basic Information

Property Value
Model Name SOLAR-10.7B-Instruct-v1.0
Provider Upstage
Parameters ~11 billion (10.7B)
Model Type Instruction-tuned Language Model
Architecture Transformer (Llama-based)
Data Type Float16 (F16)
Context Window 4,096 tokens
License CC-BY-NC-4.0 (non-commercial)

Architecture Details

SOLAR-10.7B uses an innovative Depth Up-Scaling (DUS) methodology:

  • Base Architecture: Mistral 7B weights
  • Scaling Technique: Integrates Mistral 7B weights into upscaled layers
  • Training Method: Continued pre-training on the entire upscaled model
  • Advantage: Achieves better performance than direct scaling while maintaining efficiency

Training Approach

Instruction Fine-Tuning Strategy

The model combines state-of-the-art techniques:

  1. Supervised Fine-Tuning (SFT) - Learning from high-quality examples
  2. Direct Preference Optimization (DPO) - Learning from preference pairs

Training Datasets

Dataset Purpose
c-s-ale/alpaca-gpt4-data SFT
Open-Orca/OpenOrca SFT
In-house Metamath-based data SFT, DPO
Intel/orca_dpo_pairs DPO
allenai/ultrafeedback_binarized_cleaned DPO

Data Contamination Prevention

Datasets were carefully filtered to prevent contamination with test benchmarks. Contamination rates on major benchmarks:

Benchmark Contamination Rate
ARC 0.06%
MMLU 0.15%
TruthfulQA 0.28%
GSM8K 0.70%

Excluded tasks: ARC, HellaSwag, DROP, WinoGrande, and GSM8K variants.

Performance & Benchmarks

Comparative Benchmarks (H6 Score)

Model H6 Score Model Size Notes
SOLAR-10.7B-Instruct-v1.0 74.20 ~11B Outperforms larger models
Mixtral-8x7B-Instruct-v0.1 72.62 ~46.7B 4x larger, lower score
Yi-34B-200K 70.81 ~34B 3x larger
Yi-34B 69.42 ~34B 3x larger
Llama-2-70b-hf 67.87 ~70B 6x larger

Key Performance Strengths

  • Outperforms models up to 30B parameters - Significantly more efficient
  • Surpasses Mixtral 8x7B - Despite being ~4.3x smaller
  • Excellent for fine-tuning - High-quality base for adaptation
  • Robust and adaptable - Works well across various NLP tasks
  • Instruction-following - Strong at following complex instructions
  • Reasoning & Mathematics - Good performance on analytical tasks

Capabilities & Use Cases

Intended Use Cases

  • Primary Use: Single-turn conversation and instruction following
  • Strengths:
    • Natural language understanding and generation
    • Instruction-following tasks
    • Reasoning and problem-solving
    • Mathematical tasks
    • Code understanding and generation
    • Content summarization and analysis

Limitations

  • Optimized for single-turn conversations only - Not ideal for multi-turn chat systems
  • Less suitable for extended dialogue - Multi-turn optimization not a design goal
  • Non-commercial license - CC-BY-NC-4.0 due to training data constraints
  • Context limit of 4K tokens - Shorter than some modern models

Inference Parameters

Supported Generation Parameters

When using the model through API inference endpoints, the following parameters are supported:

Parameter Type Description
max_tokens integer Maximum tokens to generate in response
temperature float Sampling temperature (0.0 - 2.0)
top_p float Nucleus sampling parameter (0.0 - 1.0)
stop string/array Stop sequences to halt generation
frequency_penalty float Reduce frequency of repeated tokens
presence_penalty float Reduce likelihood of new tokens
seed integer Random seed for reproducibility

Integration & API Access

Providers Offering SOLAR-10.7B-Instruct

The model is available through multiple platforms:

  1. Hugging Face - Direct model download

    • Repository: upstage/SOLAR-10.7B-Instruct-v1.0
    • Format: HF Transformers compatible
  2. NVIDIA NIM - Optimized inference via TensorRT-LLM

    • Hardware: NVIDIA Lovelace (L40S tested)
    • Engine: TensorRT-LLM for latest GPU optimization
  3. LangDB - Model API access

    • Provider: togetherai
    • Model ID: SOLAR-10.7B-Instruct-v1.0
  4. Clarifai - AI model registry

    • Organization: upstage
    • Model: solar-10_7b-instruct
  5. Ollama - Local inference

    • Model: solar:10.7b-instruct-v1-q5_0 (quantized)

Usage Examples

Loading with Hugging Face Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-Instruct-v1.0")
model = AutoModelForCausalLM.from_pretrained(
    "upstage/SOLAR-10.7B-Instruct-v1.0",
    device_map="auto",
    torch_dtype=torch.float16,
)

Single-Turn Conversation Example

conversation = [{'role': 'user', 'content': 'Hello?'}]

prompt = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    use_cache=True,
    max_length=4096,
    temperature=0.7,
    top_p=0.9
)
output_text = tokenizer.decode(outputs[0])
print(output_text)

Expected Output:

<s> ### User:
Hello?

### Assistant:
Hello, how can I assist you today? Please feel free to ask any questions or request help with a specific task.</s>

Text Generation with Custom Parameters

# Generate with custom parameters
inputs = tokenizer("Explain quantum computing in simple terms:", return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_tokens=512,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Installation Requirements

# Install required packages
pip install transformers==4.35.2
pip install torch>=2.0.0  # For GPU acceleration: torch with CUDA
pip install accelerate    # For multi-GPU support

Installation & Setup

Requirements

  • Python 3.8+
  • PyTorch 2.0.0 or higher
  • Transformers 4.35.2
  • CUDA 11.8+ (for GPU inference, optional)
  • 16GB+ VRAM (for full model, lower with quantization)

Hardware Recommendations

Use Case Minimum GPU Recommended
Inference (batch 1) 16GB VRAM 24GB VRAM
Inference (batch 4+) 24GB VRAM 40GB+ VRAM
Fine-tuning 32GB VRAM 48GB+ VRAM
CPU only 32GB RAM 64GB RAM (very slow)

Quantization Options

For reduced memory usage, quantization variants are available:

  • q5_0 (Ollama) - 5-bit quantization, ~6GB VRAM
  • bfloat16 - 16-bit, optimized inference
  • int8 - 8-bit quantization, ~8GB VRAM

License Information

Component License Notes
Base Model (SOLAR-10.7B-v1.0) Apache 2.0 Permissive, commercial use allowed
Instruct Model (SOLAR-10.7B-Instruct-v1.0) CC-BY-NC-4.0 Non-commercial only
Training Data Attribution See model card Includes Alpaca, OpenOrca, etc.
  • Non-commercial restriction: The Instruct variant cannot be used for commercial purposes due to training data licensing
  • Attribution required: Must provide attribution to Upstage
  • Base model available: The base SOLAR-10.7B-v1.0 is Apache 2.0 licensed for commercial use
  • Data contamination: Datasets carefully filtered to prevent test set leakage

Community & Resources

Official Resources

Community Activity

  • Discussions: 44+ community discussions available
  • Model Usage: 82+ Hugging Face Spaces utilizing this model
  • Downloads: 29,291+ downloads per month
  • Integration: Available across 5+ major AI platforms

Citation

Paper Citations

@misc{kim2023solar,
      title={SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling},
      author={Dahyun Kim and Chanjun Park and Sanghoon Kim and Wonsung Lee and Wonho Song and Yunsu Kim and Hyeonwoo Kim and Yungi Kim and Hyeonju Lee and Jihoo Kim and Changbae Ahn and Seonghoon Yang and Sukyung Lee and Hyunbyung Park and Gyoungjin Gim and Mikyoung Cha and Hwalsuk Lee and Sunghun Kim},
      year={2023},
      eprint={2312.15166},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kim2024sdpo,
      title={sDPO: Don't Use Your Data All at Once},
      author={Dahyun Kim and Yungi Kim and Wonho Song and Hyeonwoo Kim and Yunsu Kim and Sanghoon Kim and Chanjun Park},
      year={2024},
      eprint={2403.19270},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

API Endpoint Configuration

For Gateway Integration

To integrate SOLAR-10.7B-Instruct-v1.0 with LangMart Gateway systems:

{
  "provider": "upstage",
  "model_id": "solar-10.7b-instruct-v1",
  "model_name": "SOLAR-10.7B-Instruct-v1.0",
  "description": "10.7B parameter instruction-tuned model, outperforms 30B models",
  "parameters": 10700000000,
  "context_window": 4096,
  "capabilities": ["chat", "text-generation", "instruction-following"],
  "supports": {
    "temperature": true,
    "top_p": true,
    "frequency_penalty": true,
    "presence_penalty": true,
    "max_tokens": true,
    "stop_sequences": true
  },
  "license": "CC-BY-NC-4.0",
  "commercial_use": false,
  "base_model_commercial": true,
  "best_for": "single-turn conversations, instruction following, reasoning tasks"
}

Performance Metrics Summary

  • Model Efficiency: Best-in-class performance/parameter ratio
  • Inference Speed: Fast due to 10.7B parameter count
  • Quality: Competitive with 30B-70B models on benchmarks
  • Reliability: Well-tested, 44+ community discussions, 29K+ monthly downloads
  • Adaptability: Excellent foundation for fine-tuning

Comparison Matrix

Aspect SOLAR-10.7B Mixtral-8x7B Llama-2-70B
Parameters 10.7B 46.7B 70B
H6 Score 74.20 72.62 67.87
Context 4K Unknown 4K
Efficiency Excellent Good Poor
License CC-BY-NC-4.0 Apache 2.0 Llama 2
Inference Speed Fast Moderate Slow

Troubleshooting & FAQs

Common Issues

Q: Can I use SOLAR-10.7B-Instruct commercially? A: No, the Instruct variant has a non-commercial CC-BY-NC-4.0 license. Use the base SOLAR-10.7B-v1.0 (Apache 2.0) instead.

Q: Does it support multi-turn conversations? A: The model can handle multi-turn input, but performance is optimized for single-turn interactions. For production chat systems, consider fine-tuning.

Q: What's the maximum context length? A: 4,096 tokens, which is sufficient for most single-turn use cases.

Q: Can I fine-tune this model? A: Yes, it's an excellent base for fine-tuning. Ensure compliance with CC-BY-NC-4.0 for the Instruct variant.


Last Updated: 2025-12-23 Documentation Version: 1.0 Source: Hugging Face Model Card, NVIDIA NIM Documentation