Goliath 120B Model Documentation
Overview
Model Name: Goliath 120B
Creator: alpindale
Inference Model ID: alpindale/goliath-120b
Provider: Mancer (via LangMart)
Release Date: November 10, 2023
Description
Goliath 120B is a merged model that combines "two fine-tuned Llama 70B models into one 120B model" by merging Xwin and Euryale variants. The model was created using the mergekit framework by @chargoddard, with merge ratio optimization by @Undi95.
This model represents an advanced approach to model merging, leveraging the strengths of both fine-tuned variants to create a more capable 120B parameter model.
Technical Specifications
Model Architecture
- Model Group: Llama2
- Base Model: Llama 70B (merged variant)
- Total Parameters: 120 Billion
- Context Window: 6,144 tokens
- Instruction Format: Airoboros
Input/Output Capabilities
- Input Modalities: Text only
- Output Modalities: Text only
- Max Completion Tokens: 1,024 per request
- Default Stop Sequences:
USER:,</s>
Pricing
| Metric | Cost |
|---|---|
| Context Window | 6,144 tokens |
| Input Token Cost | $6/Million tokens |
| Output Token Cost | $8/Million tokens |
| Max Completion Tokens | 1,024 per request |
Provider: Mancer 2
Cost Calculation Example
- Request: 100 input tokens + 500 output tokens
- Input cost: 100 × ($6/1M) = $0.0006
- Output cost: 500 × ($8/1M) = $0.004
- Total: $0.0046
Capabilities
| Feature | Supported |
|---|---|
| Tool Use | No |
| Reasoning | No |
| Vision | No |
| Function Calling | No |
Supported Parameters
The model supports a comprehensive set of parameters for fine-grained control:
| Parameter Category | Supported Options |
|---|---|
| Response Format | JSON mode, text |
| Token Limits | max_tokens, min_tokens |
| Sampling | temperature, top_p, top_k, top_a, min_p |
| Penalties | frequency_penalty, presence_penalty, repetition_penalty |
| Control | stop sequences, logit_bias |
| Advanced | seed (for reproducibility), logprobs |
Parameter Details
- temperature: Controls randomness (0.0 = deterministic, 2.0 = high randomness)
- top_p: Nucleus sampling parameter (0.0-1.0)
- top_k: Restricts sampling to top K tokens
- top_a: Threshold for token amplitude
- min_p: Minimum probability threshold
- frequency_penalty: Reduces token repetition based on frequency
- presence_penalty: Reduces token repetition based on presence
- repetition_penalty: Alternative token repetition control
- logit_bias: Adjusts logit values for specific tokens
- seed: Ensures reproducible outputs
- logprobs: Returns log probabilities of generated tokens
Use Cases
Goliath 120B is suitable for:
- Long-form text generation
- Advanced reasoning tasks
- Complex dialogue and instruction following
- Content creation and analysis
- Fine-tuned inference applications
- Creative writing and storytelling
- Code generation and technical explanations
- Document summarization and analysis
Limitations
- Context Window: 6,144 tokens is limited compared to modern models (32K-200K+)
- Modalities: Text-only (no vision, audio, or multimodal capabilities)
- Advanced Features: No function calling, tool use, or extended reasoning support
- Training Data: Based on Llama 2, may have knowledge cutoff limitations
Related Models
- Xwin-LM: One of the base models used in the merge (Llama 70B variant)
- Euryale: The second base model used in the merge (Llama 70B variant)
- Llama 2 70B: Base architecture for both merged models
- Goliath (Other Variants): Alternative Goliath model configurations
Development Credits
- Merge Framework: @chargoddard (mergekit)
- Merge Optimization: @Undi95 (ratio optimization)
- Creator: alpindale
Provider Information
Mancer 2 Details
- Provider Name: Mancer 2
- Hosting Endpoint:
neuro.mancer.tech/oai/v1 - Data Retention Policy: No data retention for training purposes
- Privacy: Terms of service available at mancer.tech
- Hosting: Dedicated hosting for optimal performance
Integration & Access
LangMart Integration The model is available through LangMart for:
- Chat completions interface
- Model comparison tools
- Batch processing
Model ID: alpindale/goliath-120b
Model Weights: Publicly available on Hugging Face for community use and local deployment
Usage
API Request Example
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alpindale/goliath-120b",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}'
Python Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.langmart.ai/v1",
api_key="YOUR_LANGMART_API_KEY"
)
response = client.chat.completions.create(
model="alpindale/goliath-120b",
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
Performance Characteristics
- Size: 120 Billion parameters
- Context: 6,144 token context window (suitable for standard conversations and documents)
- Inference Speed: Optimized by Mancer for fast inference
- Quality: Result of advanced merging of fine-tuned Llama models
- Instruction Following: Airoboros format specialized instruction handling
Model Comparison
Compared to other models available on LangMart:
- vs Claude 3.5 Sonnet: Less capable but significantly cheaper, text-only
- vs GPT-4: Stronger for creative tasks, weaker for complex reasoning
- vs Llama 2 70B: Combined strengths through model merging
- vs Llama 3: Older architecture but still effective
Integration Guide
Using with LangMart
const response = await client.post('/v1/chat/completions', {
model: 'alpindale/goliath-120b',
messages: [
{ role: 'user', content: 'Your prompt here' }
],
temperature: 0.7,
max_tokens: 1000
});
Environment Variables
LANGMART_API_KEY=your_api_key_here
LANGMART_MODEL_ID=alpindale/goliath-120b
References
- Creator: alpindale
- Mergekit Framework: Created by @chargoddard
- Merge Optimization: @Undi95
- Hosting Provider: Mancer 2
- Platform: LangMart.ai
- Repository: Hugging Face (model weights available)
Last Updated: December 23, 2025 Source: LangMart Model Registry Model Card: Available on LangMart and Hugging Face