Sparse Mixture-of-Experts Transformers with Dynamic Routing for Efficient Large Language Model Inference

Ruohan Zhang; Aditya Sharma; Yuki Sato

doi:10.55001/faids.v1i1.42

研究论文

Sparse Mixture-of-Experts Transformers with Dynamic Routing for Efficient Large Language Model Inference

Ruohan Zhang , Aditya Sharma , Yuki Sato

已出版: 2026-03-20 DOI: https://doi.org/10.55001/faids.v1i1.42 卷 1 期 1 (2026)

— Views — Downloads

摘要

We propose DynaMoE, a sparse Mixture-of-Experts (MoE) architecture with learned dynamic routing that achieves 2.8× inference speedup over dense Transformers of equivalent quality. Unlike conventional top-k gating, DynaMoE uses a lightweight auxiliary network to predict the optimal number of experts per token based on input complexity, allocating 2-8 experts dynamically. Evaluated on a 47B-parameter model trained on 1.2T tokens, DynaMoE matches GPT-4-level performance on MMLU (87.2%), HumanEval (82.3%), and GSM8K (94.1%) while reducing FLOPs per token by 64%. We provide theoretical analysis showing that dynamic routing preserves model expressiveness while enabling conditional computation, and release training code and model weights.

作者简介

Ruohan Zhang Shanghai AI Laboratory, Shanghai 200232, China

Ruohan Zhang is a senior researcher at Shanghai AI Laboratory, Shanghai 200232, China. Their research focuses on computational science, with over 19 publications in peer-reviewed journals.
Aditya Sharma Google DeepMind, Mountain View, CA 94043, USA

Aditya Sharma is a senior researcher at Google DeepMind, Mountain View, CA 94043, USA. Their research focuses on advanced materials, with over 68 publications in peer-reviewed journals.
Yuki Sato RIKEN Center for AIP, Tokyo 103-0027, Japan

Yuki Sato is a research fellow at RIKEN Center for AIP, Tokyo 103-0027, Japan. Their research focuses on computational science, with over 66 publications in peer-reviewed journals.

在线阅读下载 PDF

Sparse Mixture-of-Experts Transformers with Dynamic Routing for Efficient Large Language Model Inference. (2026). 人工智能与数据科学前沿, 1(1). https://doi.org/10.55001/faids.v1i1.42

Endnote/Zotero/Mendeley (RIS) BibTeX

Ruohan Zhang Shanghai AI Laboratory, Shanghai 200232, China
Aditya Sharma Google DeepMind, Mountain View, CA 94043, USA
Yuki Sato RIKEN Center for AIP, Tokyo 103-0027, Japan