NeurIPS Poster Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

Poster

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

Jingru (Jessica) Jia · Zehua Yuan · Junhao Pan · Paul McNamara · Deming Chen

West Ballroom A-D #5503

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ Poster] [ OpenReview]

Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in supporting decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Although several empirical studies have investigated the rationality and social behavior performance of LLMs, their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics theories, to evaluate the decision-making behaviors of LLMs. With a multiple-choice-list experiment, we initially estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities, but there are significant variations in the degree to which these behaviors are expressed across different LLMs. Further, we explore their behavior when embedded with socio-demographic features of human beings, uncovering significant disparities across various demographic characteristics.

Chat is not available.