Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics

#01: MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

Allen Nie · Yuhui Zhang · Atharva Shailesh Amdekar · Chris Piech · Tatsunori Hashimoto · Tobias Gerstenberg

Keywords: [ causal reasoning ] [ dataset ] [ Cognitive Science ] [ language models ] [ moral reasoning ]


Abstract:

Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable. We collected a dataset of stories from 24 cognitive science papers and developed a system to annotate each story with the factors they investigate. Using this dataset, we test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. On the aggregate level, alignment has improved with more recent LLMs. However, using statistical analyses we find that LLMs weigh the different factors quite differently from human participants. These results show how curated, challenging datasets combined with insights from cognitive science can help use go beyond comparisons based merely on aggregate metrics: we uncover LLMs implicit preferences and show to what extent these align with human intuitions.

Chat is not available.