Poster
in
Workshop: NeurIPS 2022 Workshop on Score-Based Methods
Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"
Chen Henry Wu · Fernando D De la Torre
Recent text-to-image diffusion models trained on large-scale data achieve remarkable performance on text-conditioned image synthesis (e.g., GLIDE, DALL∙E 2, Imagen, Stable Diffusion). This paper presents an embarrassingly simple method to use these text-to-image diffusion models as zero-shot image-to-image editors. Our method, CycleDiffusion, is based on a recent finding that, when the "random seed" is fixed, sampling from two diffusion model distributions will produce images with minimal differences, and the core of our idea is to infer the "random seed" that is likely to produce a source image conditioned on a source text. We formalize the "random seed" as a sequence of isometric Gaussian noises that we reformulate as diffusion models' latent code. Using the "random seed" inferred from the source text-image pair, we generate a target image conditioned a target text. Experiments show that CycleDiffusion can minimally edit the image in a zero-shot manner.