Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)

The Association Between Training Data and Text-to-Image Generation Capabilities

Preethi Seshadri · Yasaman Razeghi · Sameer Singh · Yanai Elazar


Abstract:

Text-to-image (T2I) models are often touted for their supposed ability to create compositional images with many components. However, these models can fail to faithfully generate images when presented with prompts containing just two or three entities. In this work, we seek an explanation for such failures with respect to the training data. We introduce the training appearance ratio, which compares the number of training images depicting specific entities vs. the number of training captions mentioning those same entities, and examine how well this measure correlates with generation success rates. We find positive and significant correlations between these ratios and successful image generations. Furthermore, our proposed measure yields stronger correlations with model success rates than existing training data frequency measures. These associations suggest that our proposed measure (training appearance ratio) better captures the relationship between training data statistics and generation success.

Chat is not available.