SOLAR-10.7B-Instruct-v1.0 Model Documentation
Overview
SOLAR-10.7B-Instruct-v1.0 is an instruction-tuned variant of SOLAR-10.7B, a 10.7 billion parameter large language model developed by Upstage. It demonstrates state-of-the-art performance among models under 30B parameters and outperforms significantly larger models.
Release Date: December 13, 2023
Model Specifications
Basic Information
| Property | Value |
|---|---|
| Model Name | SOLAR-10.7B-Instruct-v1.0 |
| Provider | Upstage |
| Parameters | ~11 billion (10.7B) |
| Model Type | Instruction-tuned Language Model |
| Architecture | Transformer (Llama-based) |
| Data Type | Float16 (F16) |
| Context Window | 4,096 tokens |
| License | CC-BY-NC-4.0 (non-commercial) |
Architecture Details
SOLAR-10.7B uses an innovative Depth Up-Scaling (DUS) methodology:
- Base Architecture: Mistral 7B weights
- Scaling Technique: Integrates Mistral 7B weights into upscaled layers
- Training Method: Continued pre-training on the entire upscaled model
- Advantage: Achieves better performance than direct scaling while maintaining efficiency
Training Approach
Instruction Fine-Tuning Strategy
The model combines state-of-the-art techniques:
- Supervised Fine-Tuning (SFT) - Learning from high-quality examples
- Direct Preference Optimization (DPO) - Learning from preference pairs
Training Datasets
| Dataset | Purpose |
|---|---|
| c-s-ale/alpaca-gpt4-data | SFT |
| Open-Orca/OpenOrca | SFT |
| In-house Metamath-based data | SFT, DPO |
| Intel/orca_dpo_pairs | DPO |
| allenai/ultrafeedback_binarized_cleaned | DPO |
Data Contamination Prevention
Datasets were carefully filtered to prevent contamination with test benchmarks. Contamination rates on major benchmarks:
| Benchmark | Contamination Rate |
|---|---|
| ARC | 0.06% |
| MMLU | 0.15% |
| TruthfulQA | 0.28% |
| GSM8K | 0.70% |
Excluded tasks: ARC, HellaSwag, DROP, WinoGrande, and GSM8K variants.
Performance & Benchmarks
Comparative Benchmarks (H6 Score)
| Model | H6 Score | Model Size | Notes |
|---|---|---|---|
| SOLAR-10.7B-Instruct-v1.0 | 74.20 | ~11B | Outperforms larger models |
| Mixtral-8x7B-Instruct-v0.1 | 72.62 | ~46.7B | 4x larger, lower score |
| Yi-34B-200K | 70.81 | ~34B | 3x larger |
| Yi-34B | 69.42 | ~34B | 3x larger |
| Llama-2-70b-hf | 67.87 | ~70B | 6x larger |
Key Performance Strengths
- Outperforms models up to 30B parameters - Significantly more efficient
- Surpasses Mixtral 8x7B - Despite being ~4.3x smaller
- Excellent for fine-tuning - High-quality base for adaptation
- Robust and adaptable - Works well across various NLP tasks
- Instruction-following - Strong at following complex instructions
- Reasoning & Mathematics - Good performance on analytical tasks
Capabilities & Use Cases
Intended Use Cases
- Primary Use: Single-turn conversation and instruction following
- Strengths:
- Natural language understanding and generation
- Instruction-following tasks
- Reasoning and problem-solving
- Mathematical tasks
- Code understanding and generation
- Content summarization and analysis
Limitations
- Optimized for single-turn conversations only - Not ideal for multi-turn chat systems
- Less suitable for extended dialogue - Multi-turn optimization not a design goal
- Non-commercial license - CC-BY-NC-4.0 due to training data constraints
- Context limit of 4K tokens - Shorter than some modern models
Inference Parameters
Supported Generation Parameters
When using the model through API inference endpoints, the following parameters are supported:
| Parameter | Type | Description |
|---|---|---|
max_tokens |
integer | Maximum tokens to generate in response |
temperature |
float | Sampling temperature (0.0 - 2.0) |
top_p |
float | Nucleus sampling parameter (0.0 - 1.0) |
stop |
string/array | Stop sequences to halt generation |
frequency_penalty |
float | Reduce frequency of repeated tokens |
presence_penalty |
float | Reduce likelihood of new tokens |
seed |
integer | Random seed for reproducibility |
Integration & API Access
Providers Offering SOLAR-10.7B-Instruct
The model is available through multiple platforms:
Hugging Face - Direct model download
- Repository:
upstage/SOLAR-10.7B-Instruct-v1.0 - Format: HF Transformers compatible
- Repository:
NVIDIA NIM - Optimized inference via TensorRT-LLM
- Hardware: NVIDIA Lovelace (L40S tested)
- Engine: TensorRT-LLM for latest GPU optimization
LangDB - Model API access
- Provider: togetherai
- Model ID: SOLAR-10.7B-Instruct-v1.0
Clarifai - AI model registry
- Organization: upstage
- Model: solar-10_7b-instruct
Ollama - Local inference
- Model:
solar:10.7b-instruct-v1-q5_0(quantized)
- Model:
Usage Examples
Loading with Hugging Face Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-Instruct-v1.0")
model = AutoModelForCausalLM.from_pretrained(
"upstage/SOLAR-10.7B-Instruct-v1.0",
device_map="auto",
torch_dtype=torch.float16,
)
Single-Turn Conversation Example
conversation = [{'role': 'user', 'content': 'Hello?'}]
prompt = tokenizer.apply_chat_template(
conversation,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
use_cache=True,
max_length=4096,
temperature=0.7,
top_p=0.9
)
output_text = tokenizer.decode(outputs[0])
print(output_text)
Expected Output:
<s> ### User:
Hello?
### Assistant:
Hello, how can I assist you today? Please feel free to ask any questions or request help with a specific task.</s>
Text Generation with Custom Parameters
# Generate with custom parameters
inputs = tokenizer("Explain quantum computing in simple terms:", return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_tokens=512,
temperature=0.8,
top_p=0.95,
do_sample=True
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
Installation Requirements
# Install required packages
pip install transformers==4.35.2
pip install torch>=2.0.0 # For GPU acceleration: torch with CUDA
pip install accelerate # For multi-GPU support
Installation & Setup
Requirements
- Python 3.8+
- PyTorch 2.0.0 or higher
- Transformers 4.35.2
- CUDA 11.8+ (for GPU inference, optional)
- 16GB+ VRAM (for full model, lower with quantization)
Hardware Recommendations
| Use Case | Minimum GPU | Recommended |
|---|---|---|
| Inference (batch 1) | 16GB VRAM | 24GB VRAM |
| Inference (batch 4+) | 24GB VRAM | 40GB+ VRAM |
| Fine-tuning | 32GB VRAM | 48GB+ VRAM |
| CPU only | 32GB RAM | 64GB RAM (very slow) |
Quantization Options
For reduced memory usage, quantization variants are available:
- q5_0 (Ollama) - 5-bit quantization, ~6GB VRAM
- bfloat16 - 16-bit, optimized inference
- int8 - 8-bit quantization, ~8GB VRAM
Licensing & Legal
License Information
| Component | License | Notes |
|---|---|---|
| Base Model (SOLAR-10.7B-v1.0) | Apache 2.0 | Permissive, commercial use allowed |
| Instruct Model (SOLAR-10.7B-Instruct-v1.0) | CC-BY-NC-4.0 | Non-commercial only |
| Training Data Attribution | See model card | Includes Alpaca, OpenOrca, etc. |
Important Legal Notes
- Non-commercial restriction: The Instruct variant cannot be used for commercial purposes due to training data licensing
- Attribution required: Must provide attribution to Upstage
- Base model available: The base SOLAR-10.7B-v1.0 is Apache 2.0 licensed for commercial use
- Data contamination: Datasets carefully filtered to prevent test set leakage
Community & Resources
Official Resources
- Model Card: Hugging Face - upstage/SOLAR-10.7B-Instruct-v1.0
- Paper: "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" (arXiv:2312.15166)
- Contact: contact@upstage.ai
Community Activity
- Discussions: 44+ community discussions available
- Model Usage: 82+ Hugging Face Spaces utilizing this model
- Downloads: 29,291+ downloads per month
- Integration: Available across 5+ major AI platforms
Citation
Paper Citations
@misc{kim2023solar,
title={SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling},
author={Dahyun Kim and Chanjun Park and Sanghoon Kim and Wonsung Lee and Wonho Song and Yunsu Kim and Hyeonwoo Kim and Yungi Kim and Hyeonju Lee and Jihoo Kim and Changbae Ahn and Seonghoon Yang and Sukyung Lee and Hyunbyung Park and Gyoungjin Gim and Mikyoung Cha and Hwalsuk Lee and Sunghun Kim},
year={2023},
eprint={2312.15166},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{kim2024sdpo,
title={sDPO: Don't Use Your Data All at Once},
author={Dahyun Kim and Yungi Kim and Wonho Song and Hyeonwoo Kim and Yunsu Kim and Sanghoon Kim and Chanjun Park},
year={2024},
eprint={2403.19270},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
API Endpoint Configuration
For Gateway Integration
To integrate SOLAR-10.7B-Instruct-v1.0 with LangMart Gateway systems:
{
"provider": "upstage",
"model_id": "solar-10.7b-instruct-v1",
"model_name": "SOLAR-10.7B-Instruct-v1.0",
"description": "10.7B parameter instruction-tuned model, outperforms 30B models",
"parameters": 10700000000,
"context_window": 4096,
"capabilities": ["chat", "text-generation", "instruction-following"],
"supports": {
"temperature": true,
"top_p": true,
"frequency_penalty": true,
"presence_penalty": true,
"max_tokens": true,
"stop_sequences": true
},
"license": "CC-BY-NC-4.0",
"commercial_use": false,
"base_model_commercial": true,
"best_for": "single-turn conversations, instruction following, reasoning tasks"
}
Performance Metrics Summary
- Model Efficiency: Best-in-class performance/parameter ratio
- Inference Speed: Fast due to 10.7B parameter count
- Quality: Competitive with 30B-70B models on benchmarks
- Reliability: Well-tested, 44+ community discussions, 29K+ monthly downloads
- Adaptability: Excellent foundation for fine-tuning
Comparison Matrix
| Aspect | SOLAR-10.7B | Mixtral-8x7B | Llama-2-70B |
|---|---|---|---|
| Parameters | 10.7B | 46.7B | 70B |
| H6 Score | 74.20 | 72.62 | 67.87 |
| Context | 4K | Unknown | 4K |
| Efficiency | Excellent | Good | Poor |
| License | CC-BY-NC-4.0 | Apache 2.0 | Llama 2 |
| Inference Speed | Fast | Moderate | Slow |
Troubleshooting & FAQs
Common Issues
Q: Can I use SOLAR-10.7B-Instruct commercially? A: No, the Instruct variant has a non-commercial CC-BY-NC-4.0 license. Use the base SOLAR-10.7B-v1.0 (Apache 2.0) instead.
Q: Does it support multi-turn conversations? A: The model can handle multi-turn input, but performance is optimized for single-turn interactions. For production chat systems, consider fine-tuning.
Q: What's the maximum context length? A: 4,096 tokens, which is sufficient for most single-turn use cases.
Q: Can I fine-tune this model? A: Yes, it's an excellent base for fine-tuning. Ensure compliance with CC-BY-NC-4.0 for the Instruct variant.
Last Updated: 2025-12-23 Documentation Version: 1.0 Source: Hugging Face Model Card, NVIDIA NIM Documentation