Goliath 120B Model Documentation

Overview

Model Name: Goliath 120B Creator: alpindale Inference Model ID: alpindale/goliath-120b Provider: Mancer (via LangMart) Release Date: November 10, 2023

Description

Goliath 120B is a merged model that combines "two fine-tuned Llama 70B models into one 120B model" by merging Xwin and Euryale variants. The model was created using the mergekit framework by @chargoddard, with merge ratio optimization by @Undi95.

This model represents an advanced approach to model merging, leveraging the strengths of both fine-tuned variants to create a more capable 120B parameter model.

Technical Specifications

Model Architecture

Model Group: Llama2
Base Model: Llama 70B (merged variant)
Total Parameters: 120 Billion
Context Window: 6,144 tokens
Instruction Format: Airoboros

Input/Output Capabilities

Input Modalities: Text only
Output Modalities: Text only
Max Completion Tokens: 1,024 per request
Default Stop Sequences: USER:, </s>

Pricing

Metric	Cost
Context Window	6,144 tokens
Input Token Cost	$6/Million tokens
Output Token Cost	$8/Million tokens
Max Completion Tokens	1,024 per request

Provider: Mancer 2

Cost Calculation Example

Request: 100 input tokens + 500 output tokens
Input cost: 100 × ($6/1M) = $0.0006
Output cost: 500 × ($8/1M) = $0.004
Total: $0.0046

Capabilities

Feature	Supported
Tool Use	No
Reasoning	No
Vision	No
Function Calling	No

Supported Parameters

The model supports a comprehensive set of parameters for fine-grained control:

Parameter Category	Supported Options
Response Format	JSON mode, text
Token Limits	max_tokens, min_tokens
Sampling	temperature, top_p, top_k, top_a, min_p
Penalties	frequency_penalty, presence_penalty, repetition_penalty
Control	stop sequences, logit_bias
Advanced	seed (for reproducibility), logprobs

Parameter Details

temperature: Controls randomness (0.0 = deterministic, 2.0 = high randomness)
top_p: Nucleus sampling parameter (0.0-1.0)
top_k: Restricts sampling to top K tokens
top_a: Threshold for token amplitude
min_p: Minimum probability threshold
frequency_penalty: Reduces token repetition based on frequency
presence_penalty: Reduces token repetition based on presence
repetition_penalty: Alternative token repetition control
logit_bias: Adjusts logit values for specific tokens
seed: Ensures reproducible outputs
logprobs: Returns log probabilities of generated tokens

Use Cases

Goliath 120B is suitable for:

Long-form text generation
Advanced reasoning tasks
Complex dialogue and instruction following
Content creation and analysis
Fine-tuned inference applications
Creative writing and storytelling
Code generation and technical explanations
Document summarization and analysis

Limitations

Context Window: 6,144 tokens is limited compared to modern models (32K-200K+)
Modalities: Text-only (no vision, audio, or multimodal capabilities)
Advanced Features: No function calling, tool use, or extended reasoning support
Training Data: Based on Llama 2, may have knowledge cutoff limitations

Xwin-LM: One of the base models used in the merge (Llama 70B variant)
Euryale: The second base model used in the merge (Llama 70B variant)
Llama 2 70B: Base architecture for both merged models
Goliath (Other Variants): Alternative Goliath model configurations

Development Credits

Merge Framework: @chargoddard (mergekit)
Merge Optimization: @Undi95 (ratio optimization)
Creator: alpindale

Provider Information

Mancer 2 Details

Provider Name: Mancer 2
Hosting Endpoint: neuro.mancer.tech/oai/v1
Data Retention Policy: No data retention for training purposes
Privacy: Terms of service available at mancer.tech
Hosting: Dedicated hosting for optimal performance

Integration & Access

LangMart Integration The model is available through LangMart for:

Chat completions interface
Model comparison tools
Batch processing

Model ID: alpindale/goliath-120b

Model Weights: Publicly available on Hugging Face for community use and local deployment

Usage

API Request Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alpindale/goliath-120b",
    "messages": [
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.langmart.ai/v1",
    api_key="YOUR_LANGMART_API_KEY"
)

response = client.chat.completions.create(
    model="alpindale/goliath-120b",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

Performance Characteristics

Size: 120 Billion parameters
Context: 6,144 token context window (suitable for standard conversations and documents)
Inference Speed: Optimized by Mancer for fast inference
Quality: Result of advanced merging of fine-tuned Llama models
Instruction Following: Airoboros format specialized instruction handling

Model Comparison

Compared to other models available on LangMart:

vs Claude 3.5 Sonnet: Less capable but significantly cheaper, text-only
vs GPT-4: Stronger for creative tasks, weaker for complex reasoning
vs Llama 2 70B: Combined strengths through model merging
vs Llama 3: Older architecture but still effective

Integration Guide

Using with LangMart

const response = await client.post('/v1/chat/completions', {
  model: 'alpindale/goliath-120b',
  messages: [
    { role: 'user', content: 'Your prompt here' }
  ],
  temperature: 0.7,
  max_tokens: 1000
});

Environment Variables

LANGMART_API_KEY=your_api_key_here
LANGMART_MODEL_ID=alpindale/goliath-120b

References

Creator: alpindale
Mergekit Framework: Created by @chargoddard
Merge Optimization: @Undi95
Hosting Provider: Mancer 2
Platform: LangMart.ai
Repository: Hugging Face (model weights available)

Last Updated: December 23, 2025 Source: LangMart Model Registry Model Card: Available on LangMart and Hugging Face

Goliath 120B Model Documentation

Goliath 120B Model Documentation

Overview

Description

Technical Specifications

Model Architecture

Input/Output Capabilities

Pricing

Cost Calculation Example

Capabilities

Supported Parameters

Parameter Details

Use Cases

Limitations

Related Models

Development Credits

Provider Information

Mancer 2 Details

Integration & Access

Usage

API Request Example

Python Example

Performance Characteristics

Model Comparison

Integration Guide

Using with LangMart

Environment Variables

References