Poster
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models
StructMoE : Structured Mixture of Experts Using Low Rank Experts
Zain Sarwar · Ashwinee Panda · Benjamin Thérien · Stephen Rawls · Anirban Das · Kartik Balasubramaniam · Berkcan Kapusuzoglu · Shixiong Zhang · Sambit Sahu · MILIND NAPHADE · Supriyo Chakraborty
Keywords: [ Efficient Architectures ]
Abstract:
We introduce StructMoE, a method to scale MoE architectures by augmenting experts with dynamic capacity using structured matrices we call Low Rank Experts (LoRE). These LoREs are selected on a per-expert and per-token basis using a secondary router specific to every expert and are entangled with the main expert in the up-projection phase of the expert before the activation function. Empirically, we find this approach to outperform an MoE baseline in terms of loss on a held out validation set.
Chat is not available.