Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

UniTMGE: Uniform Text-Motion Generation and Editing via Diffusion Model

Ruoyu Wang · Xiang Li · Tengjiao Sun · Yangfan He · TIANYU SHI · yitingxie


Abstract:

Diffusion excels in controllable generation for continuous modalities, ideal for continuous motion generation. However, its flexibility is limited, focusing solely on text-to-motion generation and lacking motion editing capabilities. To address these issues, we introduce UniTMGE, a uniform text-motion generation and editing framework based on diffusion. UniTMGE overcomes single-modality limitations, enabling efficient and effective performance across multiple tasks like text-driven motion generation, motion captioning, motion completion, and multi-modal motion editing. UniTMGE comprises three components: CTMV for mapping text and motion into a shared latent space using contrastive learning, a controllable diffusion model customized for the CTMV space, and MCRE for unifying multimodal conditions into CLIP representations, enabling precise multimodal control and flexible motion editing through simple linear operations. We conducted both closed-world experiments and open-world experiments using the Motion-X dataset with detailed text descriptions, with results demonstrating our model's effectiveness and generalizability across multiple tasks.

Chat is not available.