NeurIPS From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

Poster
in
Workshop: Pluralistic Alignment Workshop

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

Thom Lake · Eunsol Choi · Greg Durrett

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

The alignment process changes several properties of a large language model's (LLM's) output distribution. In this work, we re-examine previously reported reductions in response diversity post-alignment in open-ended QA. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation. Both fine-tuning and prompting based alignment techniques suppress irrelevant and unhelpful content while shifting the output distribution toward longer responses covering multiple samples from the base LLM, essentially presenting diverse information in a single response. We argue these changes are well characterized as a shift from distributional pluralism to Overton pluralism, rather than an overall reduction in response diversity, and highlight the need for decoupled measures of semantic and lexical diversity.

Chat is not available.

Poster in Workshop: Pluralistic Alignment Workshop

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

Thom Lake · Eunsol Choi · Greg Durrett

Poster
in
Workshop: Pluralistic Alignment Workshop