Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Compositional Learning: Perspectives, Methods, and Paths Forward

Learning Via Imagination: Controlled Diffusion Image Augmentation

Judah Goldfeder · Patrick Puma · Gabriel Guo · Gabriel Trigo · Hod Lipson

Keywords: [ Image Classsification ] [ Diffusion Models ]


Abstract:

While synthetic data generated through diffusion models has been shown to improve task performance, current approaches face two key challenges: the high cost of fine-tuning diffusion models for specific datasets and the domain gap between real and synthetic data, which limits utility in fine-grained classification. To address these issues, we propose CDaug, a novel compositional approach to data augmentation using controlled diffusion. Instead of generating entirely new images, CDaug conditions generated images on existing data in a self-supervised manner, akin to how humans use imagination to compose new scenarios from existing concepts, leveraging the compositionality of learned representations to infuse meaningful variations. Our pipeline utilizes ControlNet, conditioned on original data and captions generated by the multi-modal LLM LLaVA2, to guide the generative process. By recombining the underlying structure and semantic priors of the data, CDaug achieves high-quality augmentations without fine-tuning. Using open-source models, our modular approach demonstrates improved performance across seven fine-grained datasets in both few-shot and full dataset settings, showing promise for compositional generalization in fine-grained environments.

Chat is not available.