研究论文

Sparse Mixture-of-Experts Transformers with Dynamic Routing for Efficient Large Language Model Inference

Views Downloads

摘要

We propose DynaMoE, a sparse Mixture-of-Experts (MoE) architecture with learned dynamic routing that achieves 2.8× inference speedup over dense Transformers of equivalent quality. Unlike conventional top-k gating, DynaMoE uses a lightweight auxiliary network to predict the optimal number of experts per token based on input complexity, allocating 2-8 experts dynamically. Evaluated on a 47B-parameter model trained on 1.2T tokens, DynaMoE matches GPT-4-level performance on MMLU (87.2%), HumanEval (82.3%), and GSM8K (94.1%) while reducing FLOPs per token by 64%. We provide theoretical analysis showing that dynamic routing preserves model expressiveness while enabling conditional computation, and release training code and model weights.

作者简介

  • Ruohan Zhang Shanghai AI Laboratory, Shanghai 200232, China
    Ruohan Zhang is a senior researcher at Shanghai AI Laboratory, Shanghai 200232, China. Their research focuses on computational science, with over 19 publications in peer-reviewed journals.
  • Aditya Sharma Google DeepMind, Mountain View, CA 94043, USA
    Aditya Sharma is a senior researcher at Google DeepMind, Mountain View, CA 94043, USA. Their research focuses on advanced materials, with over 68 publications in peer-reviewed journals.
  • Yuki Sato RIKEN Center for AIP, Tokyo 103-0027, Japan
    Yuki Sato is a research fellow at RIKEN Center for AIP, Tokyo 103-0027, Japan. Their research focuses on computational science, with over 66 publications in peer-reviewed journals.