Poster
Single Forward Video Generation Model
Zhixing Zhang · Yanyu Li · Yushu Wu · yanwu xu · Anil Kag · Ivan Skorokhodov · Willi Menapace · Aliaksandr Siarohin · Junli Cao · Dimitris Metaxas · Sergey Tulyakov · Jian Ren
[
Abstract
]
Thu 12 Dec 11 a.m. PST
— 2 p.m. PST
Abstract:
Diffusion models have demonstrated remarkable success in generating high-quality videos through iterative denoising processes. However, these models typically require multiple denoising steps for sampling, resulting in high computational costs. In this work, we propose a novel approach of single-step video generation by leveraging adversarial distillation in video diffusion models. Our method builds upon the Stable-Video-Diffusion (SVD) framework, utilizing a generative adversarial network (GAN) to distill the complex multi-step denoising process of the diffusion model into a single forward pass. The GAN is trained to mimic the final output of the diffusion model, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive quality for synthesized videos with significantly reduced computational overhead (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with exiting works with even better generation quality), paving the way for real-time video synthesis. We plan to release the code and pre-trained models.
Live content is unavailable. Log in and register to view live content