YuanLab AI Releases Yuan 3.0 Ultra: A Flagship Multimodal MOE Foundation Model, Built for Strong Intelligence and Unmatched Efficiency

by
0 comments
YuanLab AI Releases Yuan 3.0 Ultra: A Flagship Multimodal MOE Foundation Model, Built for Strong Intelligence and Unmatched Efficiency

How can a trillion-parameter large language model achieve state-of-the-art enterprise performance while reducing its total parameter count by 33.3% and increasing pre-training efficiency by 49%? Yuan Lab AI releases Yuan3.0 Ultra, an open-source mixture-of-experts (MOE) large language model 1T total parameters And 68.8b active parameter. Model architectures are designed to optimize performance in enterprise-specific tasks while maintaining competitive general-purpose capabilities. Unlike traditional dense models, Yuan3.0 Ultra uses sparseness to increase capacity without linear increase in computational cost.

Layer-Adaptive Expert Pruning (LAEP)

Yuan3.0 is the primary innovation in Ultra’s training Layer-Adaptive Expert Pruning (LAEP) algorithm. While expert pruning is usually applied after training, LAEP identifies and removes underutilized experts directly during training. pre-training phase.

Research into expert load distribution revealed two distinct phases during pre-training:

  1. Initial infection stage: Characterized by high volatility in expert weights inherited from random initialization.
  2. Stationary Phase: Expert weights are aggregated, and the relative ranking of experts based on token assignments remains largely fixed.

Once stationary phase is reached, LAEP applies pruning based on two constraints:

  • Personal Weight Constraint (⍺): Targets experts whose token load layer is well below average.
  • Cumulative load constraint (β): Identifies the subgroup of experts who contributed the least to total token processing.

By applying LAEP with β=0.1 and varying ⍺, the model was reduced to the initial 1.5T parameters all the way down 1T parameter. it 33.3% decrease The aggregate parameters preserved the model’s multi-domain performance while significantly reducing memory requirements for deployment. In the 1T configuration, the number of experts per layer was reduced from 64 to the maximum 48 protected experts.

https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra/blob/main/Docs/Yuan3.0_Ultra%20Paper.pdf

Hardware efficiency and expert rearrangement

MoE models often suffer from device-level load imbalance when experts are distributed across a computing cluster.. To address this, Yuan3.0 Ultra implements a Expert Rearrangement Algorithm.

This algorithm ranks experts based on token load and uses a greedy strategy to distribute them across GPUs so that the cumulative token variance is minimized..

Method TFLOPS per GPU
Base Model (1515B) 62.14
deepseek-v3 aux loss 80.82
Yuan3.0 Ultra (LAEP) 92.60

Overall pre-training efficiency improved 49%. This improvement can be attributed to two factors:

  • Model Pruning: Contribution 32.4% To achieve efficiency.
  • Expert Rearrangement: Contribution 15.9% To achieve efficiency.

Reducing Overthinking with the Revised RIRM

In the reinforcement learning (RL) stage, the model employs a sophisticated Reflection Inhibition Reward Mechanism (RIRM) To prevent excessively long logic chains for simple functions.

The reward for reflection, $R_{ver}$, is calculated using a threshold-based penalty system: :

  • Rmin=0: Ideal number of reflection steps for direct reactions.
  • Rmaximum=3: Maximum tolerable reflectance limit.

For correct samples, the reward decreases as the reflection phase approaches R.maximumWhereas erroneous samples that ‘overthink’ (over R).maximum Receive maximum penalty. As a result of this mechanism a 16.33% gain in training accuracy and a 14.38% reduction in output token length.

https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra/blob/main/Docs/Yuan3.0_Ultra%20Paper.pdf

enterprise benchmark performance

The Yuan3.0 Ultra was evaluated in specialized enterprise benchmarks against multiple industry models, including GPT-5.2 and Gemini 3.1 Pro..

benchmark Job Category Yuan3.0 Ultra Score leading competitor score
Docmatix multimodal RAG 67.4% 48.4% (GPT-5.2)
ChatRAG Text retrieval (average) 68.2% 53.6% (km K2.5)
mmtab table logic 62.3% 66.2% (km K2.5)
Summary lesson summarization 62.8% 49.9% (Cloud Opus 4.6)
spider 1.0 text-to-sql 83.9% 82.7% (km K2.5)
BFCL V3 device invocation 67.8% 78.8% (Gemini 3.1 Pro)

The results show that Yuan3.0 achieves state-of-the-art accuracy in multimodal retrieval (DOCMATICS) and long-context retrieval (ChatRAG) while maintaining strong performance in ultra-structured data processing and tool calling..


check it out paper And repo. Also, feel free to follow us Twitter And don’t forget to join us 120k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.


Related Articles

Leave a Comment