SOLAR-10.7B-Instruct-v1.0 Model Documentation

Overview

SOLAR-10.7B-Instruct-v1.0 is an instruction-tuned variant of SOLAR-10.7B, a 10.7 billion parameter large language model developed by Upstage. It demonstrates state-of-the-art performance among models under 30B parameters and outperforms significantly larger models.

Release Date: December 13, 2023

Model Specifications

Basic Information

Property	Value
Model Name	SOLAR-10.7B-Instruct-v1.0
Provider	Upstage
Parameters	~11 billion (10.7B)
Model Type	Instruction-tuned Language Model
Architecture	Transformer (Llama-based)
Data Type	Float16 (F16)
Context Window	4,096 tokens
License	CC-BY-NC-4.0 (non-commercial)

Architecture Details

SOLAR-10.7B uses an innovative Depth Up-Scaling (DUS) methodology:

Base Architecture: Mistral 7B weights
Scaling Technique: Integrates Mistral 7B weights into upscaled layers
Training Method: Continued pre-training on the entire upscaled model
Advantage: Achieves better performance than direct scaling while maintaining efficiency

Training Approach

Instruction Fine-Tuning Strategy

The model combines state-of-the-art techniques:

Supervised Fine-Tuning (SFT) - Learning from high-quality examples
Direct Preference Optimization (DPO) - Learning from preference pairs

Training Datasets

Dataset	Purpose
c-s-ale/alpaca-gpt4-data	SFT
Open-Orca/OpenOrca	SFT
In-house Metamath-based data	SFT, DPO
Intel/orca_dpo_pairs	DPO
allenai/ultrafeedback_binarized_cleaned	DPO

Data Contamination Prevention

Datasets were carefully filtered to prevent contamination with test benchmarks. Contamination rates on major benchmarks:

Benchmark	Contamination Rate
ARC	0.06%
MMLU	0.15%
TruthfulQA	0.28%
GSM8K	0.70%

Excluded tasks: ARC, HellaSwag, DROP, WinoGrande, and GSM8K variants.

Performance & Benchmarks

Comparative Benchmarks (H6 Score)

Model	H6 Score	Model Size	Notes
SOLAR-10.7B-Instruct-v1.0	74.20	~11B	Outperforms larger models
Mixtral-8x7B-Instruct-v0.1	72.62	~46.7B	4x larger, lower score
Yi-34B-200K	70.81	~34B	3x larger
Yi-34B	69.42	~34B	3x larger
Llama-2-70b-hf	67.87	~70B	6x larger

Key Performance Strengths

Outperforms models up to 30B parameters - Significantly more efficient
Surpasses Mixtral 8x7B - Despite being ~4.3x smaller
Excellent for fine-tuning - High-quality base for adaptation
Robust and adaptable - Works well across various NLP tasks
Instruction-following - Strong at following complex instructions
Reasoning & Mathematics - Good performance on analytical tasks

Capabilities & Use Cases

Intended Use Cases

Primary Use: Single-turn conversation and instruction following
Strengths:
- Natural language understanding and generation
- Instruction-following tasks
- Reasoning and problem-solving
- Mathematical tasks
- Code understanding and generation
- Content summarization and analysis

Limitations

Optimized for single-turn conversations only - Not ideal for multi-turn chat systems
Less suitable for extended dialogue - Multi-turn optimization not a design goal
Non-commercial license - CC-BY-NC-4.0 due to training data constraints
Context limit of 4K tokens - Shorter than some modern models

Inference Parameters

Supported Generation Parameters

When using the model through API inference endpoints, the following parameters are supported:

Parameter	Type	Description
`max_tokens`	integer	Maximum tokens to generate in response
`temperature`	float	Sampling temperature (0.0 - 2.0)
`top_p`	float	Nucleus sampling parameter (0.0 - 1.0)
`stop`	string/array	Stop sequences to halt generation
`frequency_penalty`	float	Reduce frequency of repeated tokens
`presence_penalty`	float	Reduce likelihood of new tokens
`seed`	integer	Random seed for reproducibility

Integration & API Access

Providers Offering SOLAR-10.7B-Instruct

The model is available through multiple platforms:

Hugging Face - Direct model download
- Repository: upstage/SOLAR-10.7B-Instruct-v1.0
- Format: HF Transformers compatible
NVIDIA NIM - Optimized inference via TensorRT-LLM
- Hardware: NVIDIA Lovelace (L40S tested)
- Engine: TensorRT-LLM for latest GPU optimization
LangDB - Model API access
- Provider: togetherai
- Model ID: SOLAR-10.7B-Instruct-v1.0
Clarifai - AI model registry
- Organization: upstage
- Model: solar-10_7b-instruct
Ollama - Local inference
- Model: solar:10.7b-instruct-v1-q5_0 (quantized)

Usage Examples

Loading with Hugging Face Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-Instruct-v1.0")
model = AutoModelForCausalLM.from_pretrained(
    "upstage/SOLAR-10.7B-Instruct-v1.0",
    device_map="auto",
    torch_dtype=torch.float16,
)

Single-Turn Conversation Example

conversation = [{'role': 'user', 'content': 'Hello?'}]

prompt = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    use_cache=True,
    max_length=4096,
    temperature=0.7,
    top_p=0.9
)
output_text = tokenizer.decode(outputs[0])
print(output_text)

Expected Output:

<s> ### User:
Hello?

### Assistant:
Hello, how can I assist you today? Please feel free to ask any questions or request help with a specific task.</s>

Text Generation with Custom Parameters

# Generate with custom parameters
inputs = tokenizer("Explain quantum computing in simple terms:", return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_tokens=512,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Installation Requirements

# Install required packages
pip install transformers==4.35.2
pip install torch>=2.0.0  # For GPU acceleration: torch with CUDA
pip install accelerate    # For multi-GPU support

Installation & Setup

Requirements

Python 3.8+
PyTorch 2.0.0 or higher
Transformers 4.35.2
CUDA 11.8+ (for GPU inference, optional)
16GB+ VRAM (for full model, lower with quantization)

Hardware Recommendations

Use Case	Minimum GPU	Recommended
Inference (batch 1)	16GB VRAM	24GB VRAM
Inference (batch 4+)	24GB VRAM	40GB+ VRAM
Fine-tuning	32GB VRAM	48GB+ VRAM
CPU only	32GB RAM	64GB RAM (very slow)

Quantization Options

For reduced memory usage, quantization variants are available:

q5_0 (Ollama) - 5-bit quantization, ~6GB VRAM
bfloat16 - 16-bit, optimized inference
int8 - 8-bit quantization, ~8GB VRAM

Licensing & Legal

License Information

Component	License	Notes
Base Model (SOLAR-10.7B-v1.0)	Apache 2.0	Permissive, commercial use allowed
Instruct Model (SOLAR-10.7B-Instruct-v1.0)	CC-BY-NC-4.0	Non-commercial only
Training Data Attribution	See model card	Includes Alpaca, OpenOrca, etc.

Important Legal Notes

Non-commercial restriction: The Instruct variant cannot be used for commercial purposes due to training data licensing
Attribution required: Must provide attribution to Upstage
Base model available: The base SOLAR-10.7B-v1.0 is Apache 2.0 licensed for commercial use
Data contamination: Datasets carefully filtered to prevent test set leakage

Community & Resources

Official Resources

Model Card: Hugging Face - upstage/SOLAR-10.7B-Instruct-v1.0
Paper: "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" (arXiv:2312.15166)
Contact: contact@upstage.ai

Community Activity

Discussions: 44+ community discussions available
Model Usage: 82+ Hugging Face Spaces utilizing this model
Downloads: 29,291+ downloads per month
Integration: Available across 5+ major AI platforms

Citation

Paper Citations

@misc{kim2023solar,
      title={SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling},
      author={Dahyun Kim and Chanjun Park and Sanghoon Kim and Wonsung Lee and Wonho Song and Yunsu Kim and Hyeonwoo Kim and Yungi Kim and Hyeonju Lee and Jihoo Kim and Changbae Ahn and Seonghoon Yang and Sukyung Lee and Hyunbyung Park and Gyoungjin Gim and Mikyoung Cha and Hwalsuk Lee and Sunghun Kim},
      year={2023},
      eprint={2312.15166},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kim2024sdpo,
      title={sDPO: Don't Use Your Data All at Once},
      author={Dahyun Kim and Yungi Kim and Wonho Song and Hyeonwoo Kim and Yunsu Kim and Sanghoon Kim and Chanjun Park},
      year={2024},
      eprint={2403.19270},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

API Endpoint Configuration

For Gateway Integration

To integrate SOLAR-10.7B-Instruct-v1.0 with LangMart Gateway systems:

{
  "provider": "upstage",
  "model_id": "solar-10.7b-instruct-v1",
  "model_name": "SOLAR-10.7B-Instruct-v1.0",
  "description": "10.7B parameter instruction-tuned model, outperforms 30B models",
  "parameters": 10700000000,
  "context_window": 4096,
  "capabilities": ["chat", "text-generation", "instruction-following"],
  "supports": {
    "temperature": true,
    "top_p": true,
    "frequency_penalty": true,
    "presence_penalty": true,
    "max_tokens": true,
    "stop_sequences": true
  },
  "license": "CC-BY-NC-4.0",
  "commercial_use": false,
  "base_model_commercial": true,
  "best_for": "single-turn conversations, instruction following, reasoning tasks"
}

Performance Metrics Summary

Model Efficiency: Best-in-class performance/parameter ratio
Inference Speed: Fast due to 10.7B parameter count
Quality: Competitive with 30B-70B models on benchmarks
Reliability: Well-tested, 44+ community discussions, 29K+ monthly downloads
Adaptability: Excellent foundation for fine-tuning

Comparison Matrix

Aspect	SOLAR-10.7B	Mixtral-8x7B	Llama-2-70B
Parameters	10.7B	46.7B	70B
H6 Score	74.20	72.62	67.87
Context	4K	Unknown	4K
Efficiency	Excellent	Good	Poor
License	CC-BY-NC-4.0	Apache 2.0	Llama 2
Inference Speed	Fast	Moderate	Slow

Troubleshooting & FAQs

Common Issues

Q: Can I use SOLAR-10.7B-Instruct commercially? A: No, the Instruct variant has a non-commercial CC-BY-NC-4.0 license. Use the base SOLAR-10.7B-v1.0 (Apache 2.0) instead.

Q: Does it support multi-turn conversations? A: The model can handle multi-turn input, but performance is optimized for single-turn interactions. For production chat systems, consider fine-tuning.

Q: What's the maximum context length? A: 4,096 tokens, which is sufficient for most single-turn use cases.

Q: Can I fine-tune this model? A: Yes, it's an excellent base for fine-tuning. Ensure compliance with CC-BY-NC-4.0 for the Instruct variant.

Last Updated: 2025-12-23 Documentation Version: 1.0 Source: Hugging Face Model Card, NVIDIA NIM Documentation