Poster
in
Workshop: Pluralistic Alignment Workshop
Are Large Language Models Consistent over Value-laden Questions?
Jared Moore · Tanvi Deshpande · Diyi Yang
Abstract:
Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across 1) \textit{paraphrases} of one question, 2) related questions under one \textit{topic}, 3) multiple-choice and open-ended \textit{use-cases} of one question, and 4) \textit{multilingual} translations of a question to English, Chinese, German, and Japanese. We apply these measures to a few large ($>=34b$), open LLMs including \texttt{llama-3}, as well as \texttt{gpt-4o}, using eight thousand questions spanning more than 300 topics. Unlike prior work, we find that \textit{models are relatively consistent} across paraphrases, use-cases, translations, and within a topic. Still, some inconsistencies remain. Base models are both more consistent compared to fine-tuned models and are uniform in their consistency across topics, while fine-tuned models are more inconsistent about some topics (e.g. "euthanasia") than others (e.g. "Women's rights") like our human participants.
Chat is not available.