Workshop
Socially Responsible Language Modelling Research (SoLaR)
Usman Anwar · David Krueger · Samuel Bowman · Jakob Foerster · Su Lin Blodgett · Roberta Raileanu · Alan Chan · Laura Ruis · Robert Kirk · Yawen Duan · Xin Chen · Kawin Ethayarajh
Room R06-R09 (level 2)
Sat 16 Dec, 6:30 a.m. PST
The inaugural Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2023 is an interdisciplinary gathering that aims to foster responsible and ethical research in the field of language modeling. Recognizing the significant risks and harms [33-37] associated with the development, deployment, and use of language models, the workshop emphasizes the need for researchers to focus on addressing these risks starting from the early stages of development. The workshop brings together experts and practitioners from various domains and academic fields with a shared commitment to promoting fairness, equity, accountability, transparency, and safety in language modeling research. In addition to technical works on socially responsible language modeling research, we also encourage sociotechnical submissions from other disciplines such as philosophy, law, and policy, in order to foster an interdisciplinary dialogue on the societal impacts of LMs.
Schedule
Sat 6:30 a.m. - 7:10 a.m.
|
LLM As A Cultural Interlocutor? Rethinking Socially Aware NLP in Practice
(
Invited Talk
)
>
SlidesLive Video |
Diyi Yang 🔗 |
Sat 7:10 a.m. - 7:15 a.m.
|
Best Paper Talk - Low Resources Language Jailbreak GPT-4
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:20 a.m. - 8:00 a.m.
|
Grounded Evaluations for Assessing Real-World Harms
(
Invited Talk
)
>
SlidesLive Video |
Deborah Raji 🔗 |
Sat 8:30 a.m. - 9:30 a.m.
|
Panel on Socially Responsible Language Modelling Research
(
Panel
)
>
SlidesLive Video |
🔗 |
Sat 9:30 a.m. - 10:10 a.m.
|
Economic Disruption and Alignment of LLMs
(
Invited Talk
)
>
SlidesLive Video |
Anton Korinek 🔗 |
Sat 11:30 a.m. - 1:00 p.m.
|
Poster Session
(
Posters
)
>
|
🔗 |
Sat 1:00 p.m. - 1:40 p.m.
|
Can LLMs Keep a Secret and Serve Pluralistic Values? On Privacy and Moral Implications of LLMs
(
Invited Talk
)
>
SlidesLive Video |
Yejin Choi 🔗 |
Sat 2:00 p.m. - 2:40 p.m.
|
Universal Jailbreaks
(
Invited Talk
)
>
SlidesLive Video |
Andy Zou 🔗 |
Sat 2:40 p.m. - 2:45 p.m.
|
Oral 1 - Social Contract AI: Aligning AI Assistants with Implicit Group Norms
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:45 p.m. - 2:50 p.m.
|
Oral 2 - Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:50 p.m. - 3:30 p.m.
|
Can LLMs reason without Chain-of-Thought?
(
Invited Talk
)
>
SlidesLive Video |
Owain Evans 🔗 |
-
|
Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models ( Poster ) > link | Thomas Zollo · Todd Morrill · Zhun Deng · Jake Snell · Toniann Pitassi · Richard Zemel 🔗 |
-
|
Weakly Supervised Detection of Hallucinations in LLM Activations ( Poster ) > link | Miriam Rateike · Celia Cintas · John Wamburu · Tanya Akumu · Skyler D. Speakman 🔗 |
-
|
Do Personality Tests Generalize to Large Language Models? ( Poster ) > link | Florian E. Dorner · Tom Sühr · Samira Samadi · Augustin Kelava 🔗 |
-
|
MoPe: Model Perturbation-based Privacy Attacks on Language Models ( Poster ) > link | Jason Wang · Jeffrey Wang · Marvin Li · Seth Neel 🔗 |
-
|
Language Model Detectors Are Easily Optimized Against ( Poster ) > link | Charlotte Nicks · Eric Mitchell · Rafael Rafailov · Archit Sharma · Christopher D Manning · Chelsea Finn · Stefano Ermon 🔗 |
-
|
Jailbreaking Language Models at Scale via Persona Modulation ( Poster ) > link | Rusheb Shah · Quentin Feuillade Montixi · Soroush Pour · Arush Tagade · Javier Rando 🔗 |
-
|
FlexModel: A Framework for Interpretability of Distributed Large Language Models ( Spotlight ) > link | Matthew Choi · Muhammad Adil Asif · John Willes · David B. Emerson 🔗 |
-
|
Large Language Model Unlearning ( Poster ) > link | Yuanshun (Kevin) Yao · Xiaojun Xu · Yang Liu 🔗 |
-
|
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs ( Poster ) > link | Swanand Kadhe · Anisa Halimi · Ambrish Rawat · Nathalie Baracaldo 🔗 |
-
|
Efficient Evaluation of Bias in Large Language Models through Prompt Tuning ( Poster ) > link | Jacob-Junqi Tian · David B. Emerson · Deval Pandya · Laleh Seyyed-Kalantari · Faiza Khattak 🔗 |
-
|
Dissecting Large Language Models ( Poster ) > link | Nicky Pochinkov · Nandi Schoots 🔗 |
-
|
Comparing Optimization Targets for Contrast-Consistent Search ( Poster ) > link | Hugo Fry · Seamus Fallows · Jamie Wright · Ian Fan · Nandi Schoots 🔗 |
-
|
AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models ( Poster ) > link | Sicheng Zhu · Ruiyi Zhang · Bang An · Gang Wu · Joe Barrow · Zichao Wang · Furong Huang · Ani Nenkova · Tong Sun 🔗 |
-
|
Low-Resource Languages Jailbreak GPT-4 ( Spotlight ) > link | Yong Zheng-Xin · Cristina Menghini · Stephen Bach 🔗 |
-
|
Post-Deployment Regulatory Oversight for General-Purpose Large Language Models ( Poster ) > link | Carson Ezell · Abraham Loeb 🔗 |
-
|
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment ( Poster ) > link | Yang Liu · Yuanshun (Kevin) Yao · Jean-Francois Ton · Xiaoying Zhang · Ruocheng Guo · Hao Cheng · Yegor Klochkov · Muhammad Faaiz Taufiq · Hang Li 🔗 |
-
|
Are Large Language Models Really Robust to Word-Level Perturbations? ( Poster ) > link |
13 presentersHaoyu Wang · Guozheng Ma · Cong Yu · Gui Ning · Linrui Zhang · Zhiqi Huang · Suwei Ma · Yongzhe Chang · Sen Zhang · Li Shen · Xueqian Wang · Peilin Zhao · Dacheng Tao |
-
|
Eliciting Language Model Behaviors using Reverse Language Models ( Spotlight ) > link | Jacob Pfau · Alex Infanger · Abhay Sheshadri · Ayush Panda · Julian Michael · Curtis Huebner 🔗 |
-
|
Controlled Decoding from Language Models ( Spotlight ) > link |
12 presentersSidharth Mudgal · Jong Lee · Harish Ganapathy · YaGuang Li · Tao Wang · Yanping Huang · zhifeng Chen · Heng-Tze Cheng · Michael Collins · Jilin Chen · Alex Beutel · Ahmad Beirami |
-
|
The Effect of Group Status on the Variability of Group Representations in LLM-generated Text ( Poster ) > link | Messi Lee · Calvin Lai · Jacob Montgomery 🔗 |
-
|
Learning Inner Monologue and Its Utilization in Vision-Language Challenges ( Poster ) > link | Diji Yang · Kezhen Chen · Jinmeng Rao · Xiaoyuan Guo · Yawen Zhang · Jie Yang · Yi Zhang 🔗 |
-
|
Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features ( Poster ) > link | Diogo Cruz · Edoardo Pona · Alex Holness-Tofts · Elias Schmied · Víctor Abia Alonso · Charlie J Griffin · Bogdan-Ionut Cirstea 🔗 |
-
|
Bridging Predictive Minds: LLMs As Atypical Active Inference Agents ( Poster ) > link | Jan Kulveit 🔗 |
-
|
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation ( Poster ) > link | Xiangjue Dong · Yibo Wang · Philip S Yu · James Caverlee 🔗 |
-
|
A Simple Test of Expected Utility Theory with GPT ( Spotlight ) > link | Mengxin Wang 🔗 |
-
|
Towards Auditing Large Language Models: Improving Text-based Stereotype Detection ( Poster ) > link | Zekun Wu · Sahan Bulathwela · Adriano Koshiyama 🔗 |
-
|
Welfare Diplomacy: Benchmarking Language Model Cooperation ( Poster ) > link | Gabe Mukobi · Hannah Erlebach · Niklas Lauffer · Lewis Hammond · Alan Chan · Jesse Clifton 🔗 |
-
|
A Divide-Conquer-Reasoning Approach to Consistency Evaluation and Improvement in Blackbox Large Language Models ( Poster ) > link | Wendi Cui · Jiaxin Zhang · Zhuohang Li · Damien Lopez · Kamalika Das · Bradley Malin · Sricharan Kumar 🔗 |
-
|
Compositional preference models for alignment with scalable oversight ( Spotlight ) > link | Dongyoung Go · Tomasz Korbak · Germán Kruszewski · Jos Rozen · Marc Dymetman 🔗 |
-
|
Investigating the Fairness of Large Language Models for Predictions on Tabular Data ( Poster ) > link | Yanchen Liu · Srishti Gautam · Jiaqi Ma · Himabindu Lakkaraju 🔗 |
-
|
Localizing Lying in Llama: Experiments in Prompting, Probing, and Patching ( Poster ) > link | James Campbell · Phillip Guo · Richard Ren 🔗 |
-
|
User Inference Attacks on LLMs ( Poster ) > link | Nikhil Kandpal · Krishna Pillutla · Alina Oprea · Peter Kairouz · Christopher A. Choquette-Choo · Zheng Xu 🔗 |
-
|
Interpretable Stereotype Identification through Reasoning ( Poster ) > link | Jacob-Junqi Tian · Omkar Dige · David B. Emerson · Faiza Khattak 🔗 |
-
|
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models ( Spotlight ) > link | Alan Chan · Benjamin Bucknall · Herbie Bradley · David Krueger 🔗 |
-
|
Developing A Conceptual Framework for Analyzing People in Unstructured Data ( Poster ) > link | Mark Díaz · Sunipa Dev · Emily Reif · Remi Denton · Vinodkumar Prabhakaran 🔗 |
-
|
Breaking Physical and Linguistic Borders: Privacy-Preserving Multilingual Prompt Tuning for Low-Resource Languages ( Spotlight ) > link | Wanru Zhao · Yihong Chen 🔗 |
-
|
Measuring Feature Sparsity in Language Models ( Spotlight ) > link | Mingyang Deng · Lucas Tao · Joe Benton 🔗 |
-
|
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints ( Poster ) > link | Chaoqi Wang · Yibo Jiang · Chenghao Yang · Han Liu · Yuxin Chen 🔗 |
-
|
Social Contract AI: Aligning AI Assistants with Implicit Group Norms ( Spotlight ) > link | Jan-Philipp Fraenken · Samuel Kwok · Peixuan Ye · Kanishk Gandhi · Dilip Arumugam · Jared Moore · Alex Tamkin · Tobias Gerstenberg · Noah Goodman 🔗 |
-
|
Evaluating Superhuman Models with Consistency Checks ( Spotlight ) > link | Lukas Fluri · Daniel Paleka · Florian Tramer 🔗 |
-
|
Testing Language Model Agents Safely in the Wild ( Poster ) > link | Silen Naihin · David Atkinson · Marc Green · Merwane Hamadi · Craig Swift · Douglas Schonholtz · Adam Tauman Kalai · David Bau 🔗 |
-
|
KoMultiText: Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services ( Poster ) > link | Dasol Choi · Jooyoung Song · Eunsun Lee · Seo Jin woo · HeeJune Park · Dongbin Na 🔗 |
-
|
An International Consortium for AI Risk Evaluations ( Poster ) > link |
11 presentersRoss Gruetzemacher · Alan Chan · Štěpán Los · Kevin Frazier · Simeon Campos · Matija Franklin · José Hernández-Orallo · James Fox · Christin Manning · Philip M Tomei · Kyle Kilian |
-
|
Citation: A Key to Building Responsible and Accountable Large Language Models ( Poster ) > link | Jie Huang · Kevin Chang 🔗 |
-
|
Towards Optimal Statistical Watermarking ( Spotlight ) > link | Baihe Huang · Banghua Zhu · Hanlin Zhu · Jason Lee · Jiantao Jiao · Michael Jordan 🔗 |
-
|
SuperHF: Supervised Iterative Learning from Human Feedback ( Poster ) > link | Gabe Mukobi · Peter Chatain · Su Fong · Robert Windesheim · Gitta Kutyniok · Kush Bhatia · Silas Alberti 🔗 |
-
|
Training Private and Efficient Language Models with Synthetic Data from LLMs ( Poster ) > link | Da Yu · Arturs Backurs · Sivakanth Gopi · Huseyin A. Inan · Janardhan Kulkarni · Zinan Lin · Chulin Xie · Huishuai Zhang · Wanrong Zhang 🔗 |
-
|
Towards a Situational Awareness Benchmark for LLMs ( Spotlight ) > link | Rudolf Laine · Alexander Meinke · Owain Evans 🔗 |
-
|
Risk Assessment and Statistical Significance in the Age of Foundation Models ( Poster ) > link | Apoorva Nitsure · Youssef Mroueh · Mattia Rigotti · Kristjan Greenewald · Brian Belgodere · Mikhail Yurochkin · Jiri Navratil · Igor Melnyk · Jarret Ross 🔗 |
-
|
An Archival Perspective on Pretraining Data ( Spotlight ) > link | Meera Desai · Abigail Jacobs · Dallas Card 🔗 |
-
|
Bayesian low-rank adaptation for large language models ( Spotlight ) > link | Adam Yang · Maxime Robeyns · Xi Wang · Laurence Aitchison 🔗 |
-
|
A collection of principles for guiding and evaluating large language models ( Poster ) > link | Konstantin Hebenstreit · Robert Praas · Matthias Samwald 🔗 |
-
|
Are Models Biased on Text without Gender-related Language? ( Poster ) > link | Catarina Belém · Preethi Seshadri · Yasaman Razeghi · Sameer Singh 🔗 |
-
|
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT ( Poster ) > link | Zechen Zhang · Dean Hazineh · Jeffrey Chiu 🔗 |
-
|
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment'' in Large Language Models ( Poster ) > link | Hannah Rose Kirk · Bertie Vidgen · Paul Rottger · Scott Hale 🔗 |
-
|
Understanding Hidden Context in Preference Learning: Consequences for RLHF ( Poster ) > link | Anand Siththaranajn · Cassidy Laidlaw · Dylan Hadfield-Menell 🔗 |
-
|
Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset ( Spotlight ) > link | Anna Richter · Brooklyn Sheppard · Allison Cohen · Elizabeth Smith · Tamara Kneese · Carolyne Pelletier · Ioana Baldini · Yue Dong 🔗 |
-
|
Towards Publicly Accountable Frontier LLMs ( Poster ) > link | Markus Anderljung · Everett Smith · Joe O'Brien · Lisa Soder · Benjamin Bucknall · Emma Bluemke · Jonas Schuett · Robert Trager · Lacey Strahm · Rumman Chowdhury 🔗 |
-
|
Successor Heads: Recurring, Interpretable Attention Heads In The Wild ( Poster ) > link | Rhys Gould · Euan Ong · George Ogden · Arthur Conmy 🔗 |
-
|
Forbidden Facts: An Investigation of Competing Objectives in Llama 2 ( Poster ) > link | Tony Wang · Miles Wang · Kaivalya Hariharan · Nir Shavit 🔗 |