Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

MD-DiT: Step-aware Mixture-of-Depths for Efficient Diffusion Transformers

Mingzhu Shen · pengtao chen · Peng Ye · Guoxuan Xia · Tao Chen · Christos Bouganis · Yiren Zhao


Abstract:

Diffusion models (DMs) excel in vision generation tasks such as Text-to-Image but face high computational demands due to their large timestep dimensions. While reducing the number of timesteps has been the primary focus of previous studies, our research aims to optimize DM inference efficiency by reconfiguring the model architecture, particularly for diffusion transformers (DiT). Drawing inspiration from mixture-of-depth (MD) models, we account for the computational asymmetry across different timesteps, acknowledging that each computational block contributes differently at each time step. This observation leads us to explore strategies to bypass certain computational blocks (block skipping) or reuse the results from previous timesteps (block caching).To this end, We introduce MD-DiT, a unified framework that optimizes diffusion transformers by integrating block skipping and caching through gradient-free search, allowing the model to select blocks at varying timesteps for improved inference efficiency. Our findings demonstrate a 20% reduction in computational cost for a 4-step Latent Consistency Model (LCM) and a 59% reduction in a 40-step setup. MD-DiT exceeds the performance of state-of-the-art training-free methods, such as DeepCache, TGATE, and T-Stitch.

Chat is not available.