Poster
in
Workshop: NeurIPS 2023 Workshop on Machine Learning for Creativity and Design
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Aaron Gokaslan · A. Feder Cooper · Jasmine Collins · Landan Seguin · Austin Jacobson · Mihir Patel · Jonathan Frankle · Cory Stephenson · Volodymyr Kuleshov
We assemble a dataset of creative commons licensed images and train a set of open diffusion models on that dataset that are competitive with Stable Diffusion 2. This task presents two challenges: high-resolution CC images 1) lack the captions necessary to train text-to-image generative models, and 2) are relatively scarce (∼70 million, compared to LAION’s ∼2 billion). In turn, we first describe telephoning, a type of transfer learning, which we use to produce a dataset of high-quality synthetic captions paired with curated CC images. Second, we propose a more efficient training recipe to explore this question of data scarcity. Third, we implement a variety of ML-systems optimizations that achieve ∼3X training speed-ups. We train multiple versions Stable Diffusion 2 (SD2), each on a differently sized subsets of LAION-2B, and find we can successfully train using <3% of LAION-2B. Our largest model, dubbed CommonCanvas, achieves comparable performance to SD2 on human evaluation, even though we only use a CC dataset that is <3% the size of LAION and synthetic captions for training.