Poster
Beyond Aesthetics: Cultural Competence in Text-to-Image Models
Nithish Kannen Senthilkumar · Arif Ahmad · Marco Andreetto · Vinodkumar Prabhakaran · Utsav Prabhu · Adji Bousso Dieng · Pushpak Bhattacharyya · Shachi Dave
The use of Text-to-Image (T2I) models is expanding beyond generating generic objects, as they are increasingly adopted by diverse global communities to create visual representations of their unique cultures. Current T2I benchmarks primarily evaluate image-text alignment, aesthetics, and fidelity of generations for complex prompts with generic objects, overlooking the critical dimension of cultural understanding. In this work, we address this gap by defining a framework to evaluate the cultural competence of T2I models and present a scalable approach to collecting cultural artifacts unique to a particular culture from a Knowledge Graph (KG) and Large Language Model (LLM) in loop. We assess the ability of state-of-the-art T2I models to generate culturally faithful and realistic images across eight countries and three cultural domains. Furthermore, we emphasize the importance of T2I models reflecting a culture's diversity and introduce cultural diversity as a novel metric for T2I evaluation, drawing inspiration from the Vendi Score. We introduce T2I-CUBE, a first-of-its-kind benchmark for T2I evaluation. T2I-CUBE includes cultural prompts, metrics, and cultural concept spaces, enabling a comprehensive assessment of T2I models' cultural knowledge and diversity. Our evaluations reveal significant gaps in the cultural knowledge of existing models and provide valuable insights into the cultural diversity of image outputs for under-specified prompts. By introducing a novel approach to evaluating cultural competence in T2I models, T2I-CUBE will be instrumental in fostering the development of models with a good understanding of global culture.
Live content is unavailable. Log in and register to view live content