NeurIPS Are Models Biased on Text without Gender-related Language?

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)

Are Models Biased on Text without Gender-related Language?

Catarina Belém · Preethi Seshadri · Yasaman Razeghi · Sameer Singh

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: In the large language models (LLMs) era, it is imperative to measure and understand how gender biases present in the training data influence model behavior.Previous works construct benchmarks around known stereotypes (e.g., occupations) and demonstrate high levels of gender bias in LLMs, raising serious concerns about models exhibiting undesirable behaviors.We expand on existing literature by asking the question: \textit{Do large language models still favor one gender over the other in non-stereotypical settings?}To tackle this question, we restrict LLM evaluation to a \textit{neutral} subset, in which sentences are free of pronounced word-gender associations.After characterizing these associations in terms of pretraining data statistics, we use them to (1) create a new benchmark and (2) adapt popular gender pronoun benchmarks | Winobias and Winogender | removing strongly gender-correlated words.Surprisingly, when assessing $20+$ models in the proposed benchmarks, we still detect critically high gender bias across all tested models. For instance, after adjusting for strong word-gender associations, we find that all models still exhibit clear gender preferences in about $60\%$-$95\%$ of the sentences, representing a small change (up to $10\%$) from a \textit{stereotypical} setting.

Chat is not available.

Poster in Workshop: Socially Responsible Language Modelling Research (SoLaR)

Are Models Biased on Text without Gender-related Language?

Catarina Belém · Preethi Seshadri · Yasaman Razeghi · Sameer Singh

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)