Poster
On improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi · Jakob Verbeek · Pietro Astolfi · Michal Drozdzal · Adriana Romero · Marton Havasi · Matthew Muckley · Yohann Benchetrit · Karteek Alahari · Melissa Hall · Reyhane Askari Hemmat
Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, large-scale end-to-end training of these models is computationally costly, and hence most research focuses either on finetuning pretrained models or experiments at smaller scales.In this work we aim to improve the training efficiency and performance of LDMs with the goal of scaling to larger datasets and higher resolutions.We focus our study on two points that are critical for good performance and efficient training: (i) the mechanisms used for semantic level (\eg a text prompt, or class name) and low-level (crop size, random flip, \etc) conditioning of the model, and (ii) pre-training strategies to transfer representations learned on smaller and lower-resolution datasets to larger ones.The main contributions of our work are the following: we present systematic experimental study of these points, we propose a novel conditioning mechanism that disentangles semantic and low-level conditioning, we obtain state-of-the-art performance on CC12M for text-to-image at 512 resolution.
Live content is unavailable. Log in and register to view live content