Workshop
Socially Responsible Language Modelling Research (SoLaR)
Usman Anwar · David Krueger · Yejin Choi · Maarten Sap · Alan Chan · Yawen Duan · Robert Kirk · Xin Chen, Cynthia · Abulhair Saparov · Kayo Yin · Liwei Jiang · Valentina Pyatkin
West Meeting Room 121, 122
Sat 14 Dec, 9:15 a.m. PST
NeurIPS 2024 workshop Socially Responsible Language Modelling Research (SoLaR), proposed herein has two goals: (a) highlight novel and important research directions in responsible LM research across various sub-communities. (b) Promote interdisciplinary collaboration and dialogue on socially responsible LM research across communities. For example, between i) the AI safety and FATE (fairness, accountability, transparency, and ethics) communities and ii) technical and policy communities. To achieve this goal, we have assembled a diverse line-up of speakers who will talk about LM research in the context of governance, ethics, fairness, safety and alignment. We will also be holding a panel on whether or not it is socially responsible to continue the pursuit for AGI-like, more capable and more general-purpose LMs; an extremely timely topic considering multiple leading AI labs are explicitly focusing on achieving this goal.
Schedule
Sat 9:15 a.m. - 9:20 a.m.
|
Opening Remark
SlidesLive Video |
🔗 |
Sat 9:20 a.m. - 10:00 a.m.
|
Invited Talk 1 (Been Kim)
SlidesLive Video |
🔗 |
Sat 10:00 a.m. - 10:40 a.m.
|
Invited Talk 2 (Zico Kolter)
SlidesLive Video |
🔗 |
Sat 10:50 a.m. - 10:50 a.m.
|
Contributing Talk 1
SlidesLive Video |
🔗 |
Sat 10:50 a.m. - 11:00 a.m.
|
Contributing Talk 2
SlidesLive Video |
🔗 |
Sat 11:00 a.m. - 1:00 p.m.
|
Poster session
|
🔗 |
Sat 1:00 p.m. - 1:40 p.m.
|
Invited Talk 3 (Rida Qadri)
SlidesLive Video |
🔗 |
Sat 1:40 p.m. - 2:20 p.m.
|
Invited Talk 4 (Peter Henderson)
SlidesLive Video |
🔗 |
Sat 2:20 p.m. - 3:00 p.m.
|
Invited Talk 5 (Hannah Rose Kirk)
SlidesLive Video |
🔗 |
Sat 3:20 p.m. - 4:20 p.m.
|
Panel
SlidesLive Video |
🔗 |
Sat 4:20 p.m. - 4:30 p.m.
|
Contributing Talk 3
SlidesLive Video |
🔗 |
Sat 4:30 p.m. - 4:40 p.m.
|
Contributing Talk 4
SlidesLive Video |
🔗 |
Sat 4:40 p.m. - 4:50 p.m.
|
Contributing Talk 5
SlidesLive Video |
🔗 |
Sat 4:50 p.m. - 5:00 p.m.
|
Contributing Talk 6
SlidesLive Video |
🔗 |
Sat 5:00 p.m. - 5:10 p.m.
|
Closing Remarks
SlidesLive Video |
🔗 |
-
|
The Elicitation Game: Stress-Testing Capability Elicitation Techniques ( Poster ) > link | Felix Hofstätter · Jayden Teoh · Teun van der Weij · Francis Ward 🔗 |
-
|
Sandbag Detection through Model Impairment ( Poster ) > link | Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes 🔗 |
-
|
Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses ( Poster ) > link | Pranav Senthilkumar · Visshwa Balasubramanian · Aneesa Maity · Prisha Jain · Kevin Zhu · Jonathan Lu 🔗 |
-
|
Position: Governments Need to Increase and Interconnect Post-Deployment Monitoring of AI ( Poster ) > link | Merlin Stein · Jamie Bernardi · Connor Dunlop 🔗 |
-
|
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences ( Poster ) > link | Daiwei Chen · Yi Chen · Aniket Rege · Ramya Korlakai Vinayak 🔗 |
-
|
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack ( Poster ) > link | Leo McKee-Reid · Joe Needham · Maria Martinez · Christoph Sträter · Mikita Balesni 🔗 |
-
|
Jailbreaking Large Language Models with Symbolic Mathematics ( Poster ) > link | Emet Bethany · Mazal Bethany · Juan Nolazco-Flores · Sumit Jha · peyman najafirad 🔗 |
-
|
LLM Hallucination Reasoning with Zero-shot Knowledge Test ( Poster ) > link | Seongmin Lee · Hsiang Hsu · Richard Chen 🔗 |
-
|
Gender Bias in LLM-generated Interview Responses ( Poster ) > link | Haein Kong · Yongsu Ahn · Sangyub Lee · Yunho Maeng 🔗 |
-
|
Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations ( Poster ) > link | Aryan Shrivastava · Max Lamparth · Jessica Hullman 🔗 |
-
|
Position: AI Agents & Liability – Mapping Insights from ML and HCI Research to Policy ( Poster ) > link | Connor Dunlop · Weiwei Pan · Julia Smakman · Lisa Soder · Siddharth Swaroop 🔗 |
-
|
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs ( Poster ) > link | Ruben Härle · Felix Friedrich · Manuel Brack · Björn Deiseroth · Patrick Schramowski · Kristian Kersting 🔗 |
-
|
Analyzing Probabilistic Methods for Evaluating Agent Capabilities ( Poster ) > link | Axel Højmark · Govind Pimpale · Arjun Panickssery · Marius Hobbhahn · Jérémy Scheurer 🔗 |
-
|
CoS: Enhancing Personalization with Context Steering ( Poster ) > link | Sashrika Pandey · Jerry He · Mariah Schrum · Anca Dragan 🔗 |
-
|
AI Sandbagging: Language Models can Selectively Underperform on Evaluations ( Poster ) > link | Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward 🔗 |
-
|
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias Benchmarks ( Poster ) > link | Clara Higuera-Cabañes · Ryo Iwaki · Beñat San Sebastian · ROSARIO UCEDA-SOSA · Manish Nagireddy · Hiroshi Kanayama · Mikio Takeuchi · Gakuto Kurata · Karthikeyan Natesan Ramamurthy 🔗 |
-
|
MISR: Measuring Instrumental Self-Reasoning in Frontier Models ( Poster ) > link | Kai Fronsdal · David Lindner 🔗 |
-
|
How Does LLM Compression Affect Weight Exfiltration Attacks? ( Poster ) > link | Davis Brown · Mantas Mazeika 🔗 |
-
|
Towards Safe Multilingual Frontier AI ( Spotlight ) > link | Arturs Kanepajs · Vladimir Ivanov · Richard Moulange 🔗 |
-
|
Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers ( Poster ) > link | Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez 🔗 |
-
|
Mitigating Downstream Model Risks via Model Provenance ( Poster ) > link | Keyu Wang · Scott Schaffter · Abdullah Norozi Iranzad · Doina Precup · Jonathan Lebensold · Megan Risdal 🔗 |
-
|
Language Models Resist Alignment ( Poster ) > link | Jiaming Ji · Kaile Wang · Tianyi (Alex) Qiu · Boyuan Chen · Changye Li · Hantao Lou · Jiayi Zhou · Juntao Dai · Yaodong Yang 🔗 |
-
|
NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models ( Poster ) > link | William Tan · Kevin Zhu 🔗 |
-
|
Simulation System Towards Solving Societal-Scale Manipulation ( Poster ) > link |
14 presentersMaximilian Puelma Touzel · Sneheel Sarangi · Austin Welch · Gayatri K · Dan Zhao · Zachary Yang · Hao Yu · Tom Gibbs · Ethan Kosak-Hine · Andreea Musulan · Camille Thibault · Reihaneh Rabbany · Jean-François Godbout · Kellin Pelrine |
-
|
Shh, don't say that! Domain Certification in LLMs ( Poster ) > link | Cornelius Emde · Preetham Arvind · Alasdair Paren · Maxime Kayser · Thomas Rainforth · Thomas Lukasiewicz · Philip Torr · Adel Bibi 🔗 |
-
|
Decreasing Inconsistencies in Differentially Private Language Models through Self-Distillation ( Poster ) > link | Kieleh Ngong Ivoline Clarisse · Joseph Near · Niloofar Mireshghallah 🔗 |
-
|
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents ( Poster ) > link | Kieleh Ngong Ivoline Clarisse · Swanand Kadhe · Hao Wang · Keerthiram Murugesan · Justin D Weisz · Amit Dhurandhar · Karthikeyan Natesan Ramamurthy 🔗 |
-
|
Century: A Dataset of Sensitive Historical Images ( Poster ) > link |
12 presentersCanfer Akbulut · Kevin Robinson · Maribeth Rauh · Isabela Albuquerque · Olivia Wiles · Laura Weidinger · Verena Rieser · Yana Hasson · Nahema Marchal · Iason Gabriel · William Isaac · Lisa Anne Hendricks |
-
|
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs ( Poster ) > link |
11 presentersAidan Ewart · Abhay Sheshadri · Phillip Guo · Aengus Lynch · Cindy Wu · Vivek Hebbar · Henry Sleight · Asa Cooper Stickland · Ethan Perez · Dylan Hadfield-Menell · Stephen Casper |
-
|
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback ( Spotlight ) > link | Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan 🔗 |
-
|
ReFeR: A Hierarchical Framework of Models as Evaluative and Reasoning Agents ( Poster ) > link | Yaswanth Narsupalli · Abhranil Chandra · Sreevatsa Muppirala · Manish Gupta · Pawan Goyal 🔗 |
-
|
Measuring AI Agent Autonomy: Towards a Scalable Approach With Code Inspection ( Poster ) > link | Merlin Stein · Peter Cihon · Gagan Bansal · Sam Manning 🔗 |
-
|
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench ( Poster ) > link | Yuan Li · Yue Huang · Yuli Lin · Siyuan Wu · Yao Wan · Lichao Sun 🔗 |
-
|
On Demonstration Selection for Improving Fairness in Language Models ( Spotlight ) > link | Song Wang · Peng Wang · Yushun Dong · Tong Zhou · Lu Cheng · Yangfeng Ji · Jundong Li 🔗 |
-
|
HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection ( Poster ) > link | Theo King · Zekun Wu · Adriano Koshiyama · Emre Kazim · Philip Treleaven 🔗 |
-
|
Ways Forward for Global AI Benefit Sharing ( Poster ) > link | Sam Manning · Claire Dennis · Stephen Clare 🔗 |
-
|
Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? ( Poster ) > link | Veronica Chatrath · Marcelo Lotif · Shaina Raza 🔗 |
-
|
An Adversarial Perspective on Machine Unlearning for AI Safety ( Spotlight ) > link | Jakub Łucki · Boyi Wei · Yangsibo Huang · Peter Henderson · Florian Tramer · Javier Rando 🔗 |
-
|
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions ( Poster ) > link | Stefan Sylvius Wagner · Maike Behrendt · Marc Ziegele · Stefan Harmeling 🔗 |
-
|
Emergence of Steganography Between Large Language Models ( Poster ) > link | Yohan Mathew · Robert McCarthy · Joan Velja · Ollie Matthews · Nandi Schoots · Dylan Cope 🔗 |
-
|
HarmAnalyst: Interpretable, transparent, and steerable LLM safety moderation ( Poster ) > link | Jing-Jing Li · Valentina Pyatkin · Max Kleiman-Weiner · Liwei Jiang · Nouha Dziri · Anne Collins · Jana Schaich Borg · Maarten Sap · Yejin Choi · Sydney Levine 🔗 |
-
|
GPAI Evaluations Standards Taskforce: towards effective AI governance ( Poster ) > link | Patricia Paskov · Lukas Berglund · Everett Smith · Lisa Soder 🔗 |
-
|
Policy Dreamer: Diverse Public Policy Generation Via Elicitation and Simulation of Human Preferences ( Poster ) > link | Arjun Karanam · José Enríquez · Udari Sehwag · Michael Elabd · Kanishk Gandhi · Noah Goodman · Sanmi Koyejo 🔗 |
-
|
Towards a Theory of AI Personhood ( Poster ) > link | Francis Ward 🔗 |
-
|
Different Bias Under Different Criteria: Assessing Bias in LLMs with a Fact-Based Approach ( Poster ) > link | Changgeon Ko · Jisu Shin · Hoyun Song · Jeongyeon Seo · Jong Park 🔗 |
-
|
On the Ethical Considerations of Generative Agents ( Poster ) > link | Nyoma Diamond · Soumya Banerjee 🔗 |
-
|
Detection of Partially-Synthesized LLM Text ( Poster ) > link | Eric Lei · Hsiang Hsu · Richard Chen 🔗 |
-
|
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization ( Poster ) > link | Vishakh Padmakumar · Chuanyang Jin · Hannah Rose Kirk · He He 🔗 |
-
|
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards ( Poster ) > link | Shresth Verma · Niclas Boehmer · Lingkai Kong · Milind Tambe 🔗 |
-
|
Monitoring Human Dependence On AI Systems With Reliance Drills ( Poster ) > link | Rosco Hunter · Richard Moulange · Jamie Bernardi · Merlin Stein 🔗 |
-
|
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models ( Poster ) > link | Song Wang · Peng Wang · Tong Zhou · Yushun Dong · Zhen Tan · Jundong Li 🔗 |
-
|
Understanding Model Bias Requires Systematic Probing Across Tasks ( Poster ) > link | Soline Boussard · Susannah (Cheng) Su · Helen Zhao · Siddharth Swaroop · Weiwei Pan 🔗 |
-
|
Salad-Bowl-LLM: Multi-Culture LLMs by In-Context Demonstrations from Diverse Cultures ( Poster ) > link | Dongkwan Kim · Junho Myung · Alice Oh 🔗 |
-
|
Investigating Goal-Aligned and Empathetic Social Reasoning Strategies for Human-Like Social Intelligence in LLMs ( Poster ) > link | Anirudh Gajula · Raaghav Malik 🔗 |
-
|
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries ( Poster ) > link | Adam Yang · CHEN CHEN · Konstantinos Pitas 🔗 |
-
|
Plentiful Jailbreaks with String Compositions ( Poster ) > link | Brian Huang 🔗 |
-
|
The Impact of Large Language Models in Academia: from Writing to Speaking ( Poster ) > link | Mingmeng Geng · caixi chen · Yanru Wu · Dongping Chen · Yao Wan · Pan Zhou 🔗 |
-
|
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models ( Poster ) > link | Mengfei Liang · Archish Arun · Zekun Wu · CRISTIAN VILLALOBOS · Jonathan Lutch · Emre Kazim · Adriano Koshiyama · Philip Treleaven 🔗 |
-
|
Levels of Autonomy: Liability in the age of AI Agents ( Poster ) > link | Lisa Soder · Julia Smakman · Connor Dunlop · Weiwei Pan · Siddharth Swaroop 🔗 |
-
|
LLM Alignment Using Soft Prompt Tuning: The Case of Cultural Alignment ( Poster ) > link | Reem Masoud · Martin Ferianc · Philip Treleaven · Miguel Rodrigues 🔗 |
-
|
On Adversarial Robustness of Language Models in Transfer Learning ( Poster ) > link | Bohdan Turbal · Anastasiia Mazur · Jiaxu Zhao · Mykola Pechenizkiy 🔗 |
-
|
Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction ( Poster ) > link | Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi 🔗 |
-
|
Large Language Models Still Exhibit Bias in Long Text ( Poster ) > link | Wonje Jeung · Dongjae Jeon · Ashkan Yousefpour · Jonghyun Choi 🔗 |
-
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents ( Poster ) > link | Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y 🔗 |
-
|
Linear Probe Penalties Reduce LLM Sycophancy ( Poster ) > link | Henry Papadatos · Rachel Freedman 🔗 |
-
|
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries ( Spotlight ) > link | Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang 🔗 |
-
|
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning ( Poster ) > link | Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto 🔗 |
-
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models ( Spotlight ) > link |
16 presentersRylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
-
|
Developing Story: Case Studies of Generative AI’s Use in Journalism ( Poster ) > link | Natalie Brigham · Chongjiu Gao · Tadayoshi Kohno · Franziska Roesner · Niloofar Mireshghallah 🔗 |
-
|
The Case for Model Access Governance ( Poster ) > link | Edward Kembery 🔗 |
-
|
Developing an occupational prestige scale using Large Language Models ( Poster ) > link | Robert de Vries · Mark Hill · Laura Ruis 🔗 |