Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Behavioral Machine Learning

Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

Ivan Zakazov · Mikolaj Boronski · Lorenzo Drudi · Robert West


Abstract:

The ongoing revolution in language modeling has led to various novel applications, some of which rely on the emerging "social abilities" of large language models (LLMs). Already, many turn to the new "cyber friends" for advice during pivotal moments of their lives and trust them with the deepest secrets, implying that accurate shaping of LLMs' "personalities" becomes paramount. To this end, state-of-the-art approaches \cite{serapiogarcía2023personality, jiang2023evaluating} exploit the vast variety of training data, and prompt the model to adopt a particular personality.We ask if personality-prompted models behave (i.e., "make" decisions when presented with a social situation) in line with the ascribed personality. We use classic psychological experiments - the Milgram Experiment and the Ultimatum Game - as social interaction testbeds allowing for quantitative analysis and apply personality prompting to GPT-3.5/4/4o-mini/4o. Our experiments reveal failure modes of the prompt-based modulation of the models' "behavior", challenging the optimistic sentiment towards personality prompting generally held in the community.

Chat is not available.