Skip to yearly menu bar Skip to main content


Lightning Talk
in
Workshop: Data Centric AI

Feminist Curation of Text for Data-centric AI


Abstract:

Language models are becoming increasingly central to artificial intelligence through their use in online search, recommendation engines and language generation technologies. However, concepts of gender can be deeply embedded in textual datasets that are used to train language models, which can have a profound influence on societal conceptions of gender. There is therefore an urgent need for scalable methods to enable the evaluation of how gender is represented in large-scale text datasets and language models. We propose a framework founded in feminist theory and feminist linguistics for the assessment of gender ideology embedded in textual datasets and language models, and propose strategies to mitigate bias.