NeurIPS LLMs Infer Protected Attributes Beyond Proxy Features

Poster
in
Workshop: Algorithmic Fairness through the lens of Metrics and Evaluation

LLMs Infer Protected Attributes Beyond Proxy Features

Dimitri Staufer

Keywords: [ Bias Detection ] [ Human-computer interaction ] [ Bias Mitigation ] [ Evaluation Metrics and Techniques ]

[ Abstract ]

[ Poster]

Abstract:

The majority of fairness research treats Large Language Models (LLMs) like traditional machine learning classifiers, focusing on in-context learning to guide models toward standard outputs like binary classifications. However, LLMs, particularly in chatbot applications, are increasingly used for decision-making and personalized recommendations, such as job recommendations or travel planning. Our study explores how LLMs respond to subtle signals in user prompts—such as writing style, tone, and cultural references—and how these signals can unintentionally influence job recommendations, leading to biased outcomes. Using thousands of variations of job-related prompts, we evaluate how these signals affect the diversity and ranking of job recommendations, without assuming direct associations with specific demographic categories. Initial results indicate that variations in tone, spelling errors, and local references influence inferred qualifications, resulting in biased recommendations. To address these biases, we propose a two-fold framework to detect subtle signals in user prompts and selectively suppress them using adversarial stylometry and paraphrasing.

Chat is not available.

Poster in Workshop: Algorithmic Fairness through the lens of Metrics and Evaluation

LLMs Infer Protected Attributes Beyond Proxy Features

Dimitri Staufer

Poster
in
Workshop: Algorithmic Fairness through the lens of Metrics and Evaluation