Workshop
Pluralistic Alignment Workshop
Mikhail Terekhov · Moksh Jain · Ruyuan Wan · Maarten Sap · Mitchell Gordon · Dongyeop Kang · Caglar Gulcehre · Amy Zhang · He He
West Meeting Room 116, 117
Sat 14 Dec, 9 a.m. PST
Aligning AI with human preferences and societal values is increasingly important. Yet, today’s AI alignment methods have been shown to be insufficient for capturing the vast space of complex – and often conflicting – real-world values. Our workshop will discuss how to integrate diverse perspectives, values, and expertise into pluralistic AI alignment. We aim to explore new methods for multi-objective alignment by drawing inspiration from governance and consensus-building practices to address conflicting values in pluralistic AI alignment. Discussion will include technical approaches for dataset collection, algorithms development, and the design of human-AI interaction workflows that reflect pluralistic values among diverse populations. By gathering experts from various fields, this workshop seeks to foster interdisciplinary collaboration and push the boundaries of the understanding, development and practice of pluralistic AI alignment.
Schedule
Sat 9:00 a.m. - 9:10 a.m.
|
Opening Remarks
(
Introduction
)
>
SlidesLive Video |
🔗 |
Sat 9:10 a.m. - 9:55 a.m.
|
Keynote Talk: Monojit Choudhury
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 9:55 a.m. - 10:40 a.m.
|
Keynote Talk: Hannah Rose Kirk
(
Invited Talk
)
>
SlidesLive Video |
Hannah Rose Kirk 🔗 |
Sat 10:40 a.m. - 11:40 a.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 11:40 a.m. - 12:25 p.m.
|
Keynote Talk: Yejin Choi
(
Invited Talk
)
>
SlidesLive Video |
Yejin Choi 🔗 |
Sat 12:25 p.m. - 1:30 p.m.
|
Lunch Break
|
🔗 |
Sat 1:30 p.m. - 2:15 p.m.
|
Keynote Talk: Seth Lazar
(
Invited Talk
)
>
SlidesLive Video |
Seth Lazar 🔗 |
Sat 2:15 p.m. - 3:00 p.m.
|
Keynote Talk: Melanie Mitchell
(
Invited Talk
)
>
SlidesLive Video |
Melanie Mitchell 🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Coffee Break
|
🔗 |
Sat 3:30 p.m. - 3:45 p.m.
|
Contributed Talk: MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 3:45 p.m. - 4:00 p.m.
|
Contributed Talk: Multilingual Trolley Problems for Language Models
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 4:00 p.m. - 4:15 p.m.
|
Contributed Talk: Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 4:15 p.m. - 4:30 p.m.
|
Contributed Talk: Representative Social Choice: From Learning Theory to AI Alignment
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 4:30 p.m. - 4:45 p.m.
|
Contributed Talk: Toward Democracy Levels for AI
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 4:45 p.m. - 5:30 p.m.
|
Keynote Talk: Michael Bernstein
(
Invited Talk
)
>
SlidesLive Video |
Michael Bernstein 🔗 |
-
|
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning ( Poster ) > link | Sriyash Poddar · Yanming Wan · Hamish Ivison · Abhishek Gupta · Natasha Jaques 🔗 |
-
|
Virtual Personas for Language Models via an Anthology of Backstories ( Poster ) > link | Suhong Moon · Marwa Abdulhai · Minwoo Kang · Joseph Suh · Widyadewi Soedarmadji · Eran Kohen Behar · David Chan 🔗 |
-
|
Pareto-Optimal Learning from Preferences with Hidden Context ( Poster ) > link | Ryan Boldi · Li Ding · Lee Spector · Scott Niekum 🔗 |
-
|
Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation ( Poster ) > link | Raeid Saqur 🔗 |
-
|
Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study ( Poster ) > link | Tanay Baswa · Nitin Aravind Birur · Divyanshu Kumar · Jatan Loya · Anurakt Kumar · Prashanth Harshangi · Sahil Agarwal 🔗 |
-
|
Multilingual Trolley Problems for Language Models ( Poster ) > link |
12 presentersZhijing Jin · Sydney Levine · Max Kleiman-Weiner · Giorgio Piatti · Jiarui Liu · Fernando Gonzalez Adauto · Francesco Ortu · András Strausz · Mrinmaya Sachan · Rada Mihalcea · Yejin Choi · Bernhard Schölkopf |
-
|
Conditioned Language Policy: A General Framework For Steerable Multi-Objective Finetuning ( Poster ) > link |
20 presentersKaiwen Wang · Rahul Kidambi · Ryan Sullivan · Alekh Agarwal · Christoph Dann · Andrea Michi · Marco Gelmi · Yunxuan Li · Raghav Gupta · Kumar Avinava Dubey · Alexandre Rame · Johan Ferret · Geoffrey Cideron · Le Hou · Hongkun Yu · Amr Ahmed · Aranyak Mehta · Leonard Hussenot · Olivier Bachem · Edouard Leurent |
-
|
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities ( Poster ) > link | Zheyuan Zhang · Fengyuan Hu · Jayjun Lee · Freda Shi · Parisa Kordjamshidi · Joyce Chai · Ziqiao Ma 🔗 |
-
|
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment ( Poster ) > link | Thom Lake · Eunsol Choi · Greg Durrett 🔗 |
-
|
Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions ( Poster ) > link | Haoxian Chen · HANYANG ZHAO · Henry Lam · David Yao · Wenpin Tang 🔗 |
-
|
Contrastive Learning Neuromotor Interface From Teacher ( Poster ) > link | Kilian Freitag · Ran Wei 🔗 |
-
|
MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces ( Poster ) > link | Shravan Nayak · Rashid Mushkani · Hugo Berard · Allison Cohen · Shin Koseki · Hadrien Bertrand 🔗 |
-
|
Model Plurality: A Taxonomy for Pluralistic AI ( Poster ) > link | Christina Lu · Max Van Kleek 🔗 |
-
|
Mechanism Design for LLM Fine-tuning with Multiple Reward Models ( Poster ) > link | Haoran Sun · Yurong Chen · Siwei Wang · Wei Chen · Xiaotie Deng 🔗 |
-
|
Towards Representative Social Choice Using Statistical Learning Theory ( Poster ) > link | Tianyi (Alex) Qiu 🔗 |
-
|
Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment ( Poster ) > link | Peter Vamplew · Conor Hayes · Cameron Foale · Richard Dazeley · Hadassah Harland 🔗 |
-
|
Value-Aligned Imitation via focused Satisficing ( Poster ) > link | Rushit Shah · Nikolaos Agadakos · Synthia Sasulski · Ali Farajzadeh · Sanjiban Choudhury · Brian Ziebart 🔗 |
-
|
Democracy Levels for AI ( Poster ) > link | Aviv Ovadya · Luke Thorburn · Kyle Redman · Manon Revel · Flynn Devine · Atoosa Kasirzadeh · Smitha Milli · Andrew Konya 🔗 |
-
|
Intuitions of Compromise: Utilitarianism vs. Contractualism ( Poster ) > link | Jared Moore · Yejin Choi · Sydney Levine 🔗 |
-
|
Learning from Personal Preferences ( Poster ) > link | Kelly Jiang · Berk Ustun · Jessica Hullman 🔗 |
-
|
AI, Pluralism, and (Social) Compensation ( Poster ) > link | Nandhini Swaminathan · David Danks 🔗 |
-
|
PersonalLLM: Tailoring LLMs to Individual Preferences ( Poster ) > link | Thomas Zollo · Andrew Siah · Naimeng Ye · Ang Li · Hongseok Namkoong 🔗 |
-
|
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences ( Poster ) > link | Daiwei Chen · Yi Chen · Aniket Rege · Ramya Korlakai Vinayak 🔗 |
-
|
A Case Study in Plural Governance Design ( Poster ) > link | Joel Miller · Christopher Kanich · Glen Weyl 🔗 |
-
|
RLDF: Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs ( Poster ) > link | Ruoxi Cheng · Haoxuan Ma · Shuirong Cao · Jiaqi Li · Aihua Pei · Zhiqiang wang · Pengliang Ji · Haoyu Wang · Jiaqi Huo 🔗 |
-
|
Plurals: A system for pluralistic AI via simulated social ensembles ( Poster ) > link | Joshua Ashkinaze · Eric Gilbert · Ceren Budak 🔗 |
-
|
Can Language Models Reason about Individualistic Human Values and Preferences? ( Poster ) > link | Liwei Jiang · Sydney Levine · Yejin Choi 🔗 |
-
|
Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs ( Poster ) > link |
11 presentersScott Friedman · Noam Benkler · Drisana Mosaphir · Jeffrey Rye · Sonja Schmer-Galunder · Micah Goldwater · Matthew McLure · Ruta Wheelock · Jeremy Gottlieb · Robert Goldman · Christopher Miller |
-
|
Chain of Alignment ( Poster ) > link | Andrew Konya · Aviv Ovadya · K. J. Kevin Feng · Quan Ze Chen · Lisa Schirch · Colin Irwin · Amy Zhang 🔗 |
-
|
Being Considerate as a Pathway to Pluralist Alignment ( Poster ) > link | Parand A. Alamdari · Toryn Klassen · Rodrigo Toro Icarte · Sheila McIlraith 🔗 |
-
|
Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI ( Poster ) > link | Hadassah Harland · Richard Dazeley · Peter Vamplew · Hashini Senaratne · Bahareh nakisa · Francisco Cruz 🔗 |
-
|
Group Robust Best-of-K Decoding of Language Models for Pluralistic Alignment ( Poster ) > link | Anja Petrovic · Seongho Son · William Bankes · Xiaohang Tang · Shyam Sundhar Ramesh · Sangwoong Yoon · Ilija Bogunovic 🔗 |
-
|
"There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations. ( Poster ) > link | Elle Michelle Yang · Matthias Gallé · Seraphina Goldfarb-Tarrant 🔗 |
-
|
Value Alignment from Unstructured Text ( Poster ) > link | Inkit Padhi · Karthikeyan Natesan Ramamurthy · Prasanna Sattigeri · Manish Nagireddy · Pierre Dognin · Kush Varshney 🔗 |
-
|
Critique-out-Loud Reward Models ( Poster ) > link | Zachary Ankner · Mansheej Paul · Brandon Cui · Jonathan Chang · Prithviraj Ammanabrolu 🔗 |
-
|
Legal Theory for Pluralistic Alignment ( Poster ) > link | Nicholas A. Caputo 🔗 |
-
|
Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training ( Poster ) > link | Jingyu Zhang · Ahmed Elgohary Ghoneim · Ahmed Magooda · Daniel Khashabi · Ben Van Durme 🔗 |
-
|
Pluralistic Alignment Over Time ( Poster ) > link | Toryn Klassen · Parand A. Alamdari · Sheila McIlraith 🔗 |
-
|
Bridging in Social Media Feeds Censors Controversial Topics ( Poster ) > link | Smitha Milli · Luke Thorburn · Paul Bouchaud · Yixin Wang · Nikhil Garg · Emma Pierson 🔗 |
-
|
Value pluralism and AI value alignment ( Poster ) > link | Atoosa Kasirzadeh 🔗 |
-
|
Are Large Language Models Consistent over Value-laden Questions? ( Poster ) > link | Jared Moore · Tanvi Deshpande · Diyi Yang 🔗 |
-
|
Selective Preference Aggregation ( Poster ) > link | Shreyas Kadekodi · Hayden McTavish · Berk Ustun 🔗 |
-
|
FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness ( Poster ) > link | Tina Behzad · Mithilesh Kumar Singh · Anthony Ripa · Klaus Mueller 🔗 |
-
|
Policy Aggregation ( Poster ) > link | Parand A. Alamdari · Soroush Ebadian · Ariel Procaccia 🔗 |
-
|
Evaluating the Prompt Steerability of Large Language Models ( Poster ) > link | Erik Miehling · Michael Desmond · Karthikeyan Natesan Ramamurthy · Elizabeth Daly · Pierre Dognin · Jesus Rios · Djallel Bouneffouf · Miao Liu 🔗 |
-
|
AGR: Age Group fairness Reward for Bias Mitigation in LLMs ( Poster ) > link | Shuirong Cao · Ruoxi Cheng · Zhiqiang wang 🔗 |
-
|
Diverging Preferences: Why do Annotators Sensibly Disagree? ( Poster ) > link | Michael Zhang · Zhilin Wang · Jena Hwang · Yi Dong · Olivier Delalleau · Yejin Choi · Eunsol Choi · Xiang Ren · Valentina Pyatkin 🔗 |
-
|
Aligning to Thousands of Varying Preferences via System Message Generalization ( Poster ) > link | Seongyun Lee · Sue Hyun Park · Seungone Kim · Minjoon Seo 🔗 |
-
|
Trustworthy Human-AI Interaction Through Agreement Protocols ( Poster ) > link | Natalie Collina · Surbhi Goel · Varun Gupta · Aaron Roth 🔗 |