ERNIE 4.5 VL 424B A47B Model Details
Overview
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu's ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. The model handles both text and image inputs, supporting long-context generation up to 123,000 tokens with reasoning capabilities enabled through <think> and </think> tokens.
Key Characteristics:
- Input Modalities: Text, Images
- Output Modalities: Text
- Context Length: 123,000 tokens
- Created: June 30, 2025
- Architecture: Heterogeneous MoE with modality-isolated routing
- Fine-tuning Techniques: SFT, DPO, UPO, and RLVR
Pricing
| Metric | Cost |
|---|---|
| Input (per 1M tokens) | $0.336 |
| Output (per 1M tokens) | $1.00 |
| Max Completion Tokens | 16,000 |
Performance
Reasoning Support
- Status: Yes
- Modes: Thinking and non-thinking inference modes supported
- Use Cases: Optimized for English and Chinese tasks
Inference Capabilities
- High-fidelity cross-modal reasoning
- Image understanding with text processing
- Efficient scaling under 4-bit/8-bit quantization
Related Models
Model Family: Baidu ERNIE 4.5 series
Model Weights: Available on Hugging Face at baidu/ERNIE-4.5-VL-424B-A47B-PT
Note: No specific benchmark data or comparative performance metrics are publicly available on LangMart.
Providers
Primary Provider: Novita AI (NovitaAI)
- API Endpoint: https://api.novita.ai/v3/openai
- Quantization: FP16
- Capabilities: Supports chat completions API, multipart requests
Parameters
Supported parameters include:
- Reasoning
- Temperature
- Top P
- Stop sequences
- Frequency penalties
- Presence penalties
- Seed
- Top K
- Repetition penalty