Baidu’s ERNIE 4.5 is a state-of-the-art multimodal large language model released in early 2025, designed to process and integrate text, images, audio, and video through a novel heterogeneous Mixture-of-Experts (MoE) architecture. With variants scaling up to 424B total parameters (47B active), it emphasizes efficiency, accuracy, and multimodal synergy while drastically reducing operational costs compared to Western models . Below is a detailed analysis of its capabilities, benchmarks, cost, and comparisons to Claude 4 (Opus) and OpenAI o3 Pro.
๐ง 1. Core Capabilities of ERNIE 4.5
- Multimodal Integration: Jointly processes text, images, audio, and video with modality-specific experts, reducing hallucinations and improving cross-modal reasoning .
- Reasoning Modes:
- Non-thinking mode: Optimized for fast visual perception (e.g., object detection, chart analysis).
- Thinking mode: Enhanced for complex reasoning (e.g., math puzzles, document summarization) .
- Technical Innovations:
- FlashMask dynamic attention for accuracy.
- FP8 mixed-precision training and 4-bit quantization for efficient deployment .
- Languages: Primarily optimized for Chinese, but supports English in multimodal tasks .
โ๏ธ 2. Comparison with Claude 4 & o3 Pro
Performance Benchmarks
Table: Key benchmark averages (higher = better) :

Key Strengths/Weaknesses:
- ERNIE 4.5:
- โ Dominates multimodal tasks (e.g., DocVQA, MathVista) and Chinese benchmarks .
- โ Excels in document analysis, audio transcription, and image-based reasoning .
- โ Weaker in coding vs. Claude 4 and advanced math vs. o3 Pro .
- Claude 4:
- โ Superior coding output quality (e.g., clean implementations, bug fixes) .
- o3 Pro:
- โ Leads in pure math/reasoning but struggles with coding and multimodal coherence .
๐ฐ 3. Cost Difference
ERNIE 4.5 is ~100x cheaper than GPT-4.5 and 50โ80% cheaper than Claude 4 or o3 Pro. Pricing per million tokens :
Model | Input Cost | Output Cost |
---|---|---|
ERNIE 4.5 | $0.55 | $2.20 |
Claude 4 | $15.00 | $75.00 |
OpenAI o3 Pro | $10.00 | $40.00 |
- ERNIE 4.5 is free via Ernie Bot for individual users (API costs apply for enterprises) .
- Baiduโs cost edge stems from 4-bit quantization, dynamic load balancing, and optimized PaddlePaddle inference .
๐ 4. Benchmark Results Highlights
- Multimodal:
- Outscored GPT-4o 77.68 vs. 72.76 on integrated vision-language tasks (MathVista, MMMU) .
- Text/Reasoning:
- Beat DeepSeek-V3 (79.6 vs. 79.14) in MMLU-Pro and GSM8K .
- Surpassed GPT-4.5 in instruction following (IFEval) and Chinese knowledge (CMMLU) .
- Efficiency:
- Achieved 47% Model FLOPs Utilization during training, enabling high-throughput inference .
๐ 5. Conclusion: Is ERNIE 4.5 “Better”?
- For multimodal tasks & cost-sensitive deployments: Yes โ ERNIE 4.5 leads in multimodal integration and offers unmatched price-performance .
- For coding/creative tasks: No โ Claude 4 remains stronger in code quality .
- For pure reasoning/math: Mixed โ o3 Pro narrowly wins in math, but ERNIE excels in structured multimodal reasoning .
Baiduโs aggressive pricing and open-source release (Apache 2.0) could pressure competitors to lower costs, accelerating AI democratization ๐ . For global users, note that ERNIE Bot currently requires a Chinese account for full access .