Oral
in
Workshop: Workshop on Responsibly Building Next Generation of Multimodal Foundation Models
Consistency-diversity-realism Pareto fronts of conditional image generative models
Pietro Astolfi · Melissa Hall · Jakob Verbeek · Marlene Careil · Oscar Mañas · Matthew Muckley · Adriana Romero · Michal Drozdzal
Keywords: [ Image generative models ] [ world models ]
Building world models that accurately and comprehensively represent the realworld is a holy grail for image generative models as it would enable their use asworld simulators. For conditional image generative models to be successful worldmodels, they should not only excel at image quality and prompt-image consistencybut also ensure high representation diversity. However, current research ingenerative models mostly focuses on creative applications that are predominantlyconcerned with human preferences of image quality and aesthetics. We note thatgenerative models have inference time mechanisms – or knobs – that allow thecontrol of generation consistency, quality, and diversity. In this paper, we usestate-of-the-art text-to-image and their knobs to draw consistency-diversity-realismPareto fronts that provide a holistic view on consistency-diversity-realismmulti-objective. Our experiments suggest that realism and consistency can both beimproved simultaneously; however there exists a clear tradeoff between realism/-consistency and diversity. By looking at Pareto optimal points, we note that earliermodels are better at representation diversity and worse in consistency-realism, andmore recent models excel in consistency-realism while decreasing significantlythe representation diversity. Overall, our analysis clearly shows that there is no best model and the choice of model should be determined by the downstreamapplication. With this analysis, we invite the research community to considerPareto fronts as an analytical tool to measure progress towards world models.