Sparse Mixture-of-Experts Transformers with Dynamic Routing for Efficient Large Language Model Inference

Ruohan Zhang; Aditya Sharma; Yuki Sato

doi:10.55001/faids.v1i1.42

Articles

Sparse Mixture-of-Experts Transformers with Dynamic Routing for Efficient Large Language Model Inference

Ruohan Zhang , Aditya Sharma , Yuki Sato

Published: 2026-03-20 DOI: https://doi.org/10.55001/faids.v1i1.42 Vol. 1 No. 1 (2026)

— Views — Downloads

Abstract

We propose DynaMoE, a sparse Mixture-of-Experts (MoE) architecture with learned dynamic routing that achieves 2.8× inference speedup over dense Transformers of equivalent quality. Unlike conventional top-k gating, DynaMoE uses a lightweight auxiliary network to predict the optimal number of experts per token based on input complexity, allocating 2-8 experts dynamically. Evaluated on a 47B-parameter model trained on 1.2T tokens, DynaMoE matches GPT-4-level performance on MMLU (87.2%), HumanEval (82.3%), and GSM8K (94.1%) while reducing FLOPs per token by 64%. We provide theoretical analysis showing that dynamic routing preserves model expressiveness while enabling conditional computation, and release training code and model weights.

Author Biographies

Ruohan Zhang Shanghai AI Laboratory, Shanghai 200232, China

Ruohan Zhang is a senior researcher at Shanghai AI Laboratory, Shanghai 200232, China. Their research focuses on computational science, with over 19 publications in peer-reviewed journals.
Aditya Sharma Google DeepMind, Mountain View, CA 94043, USA

Aditya Sharma is a senior researcher at Google DeepMind, Mountain View, CA 94043, USA. Their research focuses on advanced materials, with over 68 publications in peer-reviewed journals.
Yuki Sato RIKEN Center for AIP, Tokyo 103-0027, Japan

Yuki Sato is a research fellow at RIKEN Center for AIP, Tokyo 103-0027, Japan. Their research focuses on computational science, with over 66 publications in peer-reviewed journals.

View Fulltext Download PDF

Sparse Mixture-of-Experts Transformers with Dynamic Routing for Efficient Large Language Model Inference. (2026). Frontiers in Artificial Intelligence and Data Science, 1(1). https://doi.org/10.55001/faids.v1i1.42

Endnote/Zotero/Mendeley (RIS) BibTeX

Ruohan Zhang Shanghai AI Laboratory, Shanghai 200232, China
Aditya Sharma Google DeepMind, Mountain View, CA 94043, USA
Yuki Sato RIKEN Center for AIP, Tokyo 103-0027, Japan