B

ERNIE 4.5 VL 424B A47B Model Details

Baidu
Vision
123K
Context
$0.3360
Input /1M
$1.00
Output /1M
N/A
Max Output

ERNIE 4.5 VL 424B A47B Model Details

Overview

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu's ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. The model handles both text and image inputs, supporting long-context generation up to 123,000 tokens with reasoning capabilities enabled through <think> and </think> tokens.

Key Characteristics:

  • Input Modalities: Text, Images
  • Output Modalities: Text
  • Context Length: 123,000 tokens
  • Created: June 30, 2025
  • Architecture: Heterogeneous MoE with modality-isolated routing
  • Fine-tuning Techniques: SFT, DPO, UPO, and RLVR

Pricing

Metric Cost
Input (per 1M tokens) $0.336
Output (per 1M tokens) $1.00
Max Completion Tokens 16,000

Performance

Reasoning Support

  • Status: Yes
  • Modes: Thinking and non-thinking inference modes supported
  • Use Cases: Optimized for English and Chinese tasks

Inference Capabilities

  • High-fidelity cross-modal reasoning
  • Image understanding with text processing
  • Efficient scaling under 4-bit/8-bit quantization

Model Family: Baidu ERNIE 4.5 series

Model Weights: Available on Hugging Face at baidu/ERNIE-4.5-VL-424B-A47B-PT

Note: No specific benchmark data or comparative performance metrics are publicly available on LangMart.

Providers

Primary Provider: Novita AI (NovitaAI)

Parameters

Supported parameters include:

  • Reasoning
  • Temperature
  • Top P
  • Stop sequences
  • Frequency penalties
  • Presence penalties
  • Seed
  • Top K
  • Repetition penalty