Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)
Understanding Model Bias Requires Systematic Probing Across Tasks
Soline Boussard · Susannah (Cheng) Su · Helen Zhao · Siddharth Swaroop · Weiwei Pan
Keywords: [ ChatGPT ] [ Large language models ] [ Healthcare ] [ Systematic bias ] [ Bias probing ] [ GenAI ] [ Responsible AI ]
There is a growing body of literature exposing social biases of LLMs. However, these works often focus on a specific protected group, a specific prompt type and a specific decision task. Given the large and complex input-output space of LLMs, case-by-case analyses alone may not paint a picture of the systematic biases of these models. In this paper, we argue for broad and systematic bias probing. We propose to do so by comparing the distribution of outputsover a wide range of prompts, multiple protected attributes and across different realistic decision making settings in thesame application domain. We demonstrate this approach for three personalized healthcare advice-seeking settings. We argue that studying the complex patterns of bias across tasks helps us better anticipatethe way behaviors (specifically biased behaviors) of LLMs might generalize to new tasks.