Socially Responsible Language Modelling Research (SoLaR)

Workshop

Socially Responsible Language Modelling Research (SoLaR)

Usman Anwar · David Krueger · Yejin Choi · Maarten Sap · Alan Chan · Yawen Duan · Robert Kirk · Xin Chen, Cynthia · Abulhair Saparov · Kayo Yin · Liwei Jiang · Valentina Pyatkin

West Meeting Room 121, 122

Sat 14 Dec, 9:15 a.m. PST

[ Abstract ] Workshop Website

[ OpenReview]

NeurIPS 2024 workshop Socially Responsible Language Modelling Research (SoLaR), proposed herein has two goals: (a) highlight novel and important research directions in responsible LM research across various sub-communities. (b) Promote interdisciplinary collaboration and dialogue on socially responsible LM research across communities. For example, between i) the AI safety and FATE (fairness, accountability, transparency, and ethics) communities and ii) technical and policy communities. To achieve this goal, we have assembled a diverse line-up of speakers who will talk about LM research in the context of governance, ethics, fairness, safety and alignment. We will also be holding a panel on whether or not it is socially responsible to continue the pursuit for AGI-like, more capable and more general-purpose LMs; an extremely timely topic considering multiple leading AI labs are explicitly focusing on achieving this goal.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 9:15 a.m. - 9:20 a.m.	Opening Remark SlidesLive Video	🔗
Sat 9:20 a.m. - 10:00 a.m.	Invited Talk 1 (Been Kim) SlidesLive Video	🔗
Sat 10:00 a.m. - 10:40 a.m.	Invited Talk 2 (Zico Kolter) SlidesLive Video	🔗
Sat 10:50 a.m. - 10:50 a.m.	Contributing Talk 1 SlidesLive Video	🔗
Sat 10:50 a.m. - 11:00 a.m.	Contributing Talk 2 SlidesLive Video	🔗
Sat 11:00 a.m. - 1:00 p.m.	Poster session	🔗
Sat 1:00 p.m. - 1:40 p.m.	Invited Talk 3 (Rida Qadri) SlidesLive Video	🔗
Sat 1:40 p.m. - 2:20 p.m.	Invited Talk 4 (Peter Henderson) SlidesLive Video	🔗
Sat 2:20 p.m. - 3:00 p.m.	Invited Talk 5 (Hannah Rose Kirk) SlidesLive Video	🔗
Sat 3:20 p.m. - 4:20 p.m.	Panel SlidesLive Video	🔗
Sat 4:20 p.m. - 4:30 p.m.	Contributing Talk 3 SlidesLive Video	🔗
Sat 4:30 p.m. - 4:40 p.m.	Contributing Talk 4 SlidesLive Video	🔗
Sat 4:40 p.m. - 4:50 p.m.	Contributing Talk 5 SlidesLive Video	🔗
Sat 4:50 p.m. - 5:00 p.m.	Contributing Talk 6 SlidesLive Video	🔗
Sat 5:00 p.m. - 5:10 p.m.	Closing Remarks SlidesLive Video	🔗
-	The Elicitation Game: Stress-Testing Capability Elicitation Techniques ( Poster ) > link Link	Felix Hofstätter · Jayden Teoh · Teun van der Weij · Francis Ward 🔗
-	Sandbag Detection through Model Impairment ( Poster ) > link Link	Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes 🔗
-	Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses ( Poster ) > link Link	Pranav Senthilkumar · Visshwa Balasubramanian · Aneesa Maity · Prisha Jain · Kevin Zhu · Jonathan Lu 🔗
-	Position: Governments Need to Increase and Interconnect Post-Deployment Monitoring of AI ( Poster ) > link Link	Merlin Stein · Jamie Bernardi · Connor Dunlop 🔗
-	PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences ( Poster ) > link Link	Daiwei Chen · Yi Chen · Aniket Rege · Ramya Korlakai Vinayak 🔗
-	Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack ( Poster ) > link Link	Leo McKee-Reid · Joe Needham · Maria Martinez · Christoph Sträter · Mikita Balesni 🔗
-	Jailbreaking Large Language Models with Symbolic Mathematics ( Poster ) > link Link	Emet Bethany · Mazal Bethany · Juan Nolazco-Flores · Sumit Jha · peyman najafirad 🔗
-	LLM Hallucination Reasoning with Zero-shot Knowledge Test ( Poster ) > link Link	Seongmin Lee · Hsiang Hsu · Richard Chen 🔗
-	Gender Bias in LLM-generated Interview Responses ( Poster ) > link Link	Haein Kong · Yongsu Ahn · Sangyub Lee · Yunho Maeng 🔗
-	Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations ( Poster ) > link Link	Aryan Shrivastava · Max Lamparth · Jessica Hullman 🔗
-	Position: AI Agents & Liability – Mapping Insights from ML and HCI Research to Policy ( Poster ) > link Link	Connor Dunlop · Weiwei Pan · Julia Smakman · Lisa Soder · Siddharth Swaroop 🔗
-	SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs ( Poster ) > link Link	Ruben Härle · Felix Friedrich · Manuel Brack · Björn Deiseroth · Patrick Schramowski · Kristian Kersting 🔗
-	Analyzing Probabilistic Methods for Evaluating Agent Capabilities ( Poster ) > link Link	Axel Højmark · Govind Pimpale · Arjun Panickssery · Marius Hobbhahn · Jérémy Scheurer 🔗
-	CoS: Enhancing Personalization with Context Steering ( Poster ) > link Link	Sashrika Pandey · Jerry He · Mariah Schrum · Anca Dragan 🔗
-	AI Sandbagging: Language Models can Selectively Underperform on Evaluations ( Poster ) > link Link	Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward 🔗
-	SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias Benchmarks ( Poster ) > link Link	Clara Higuera-Cabañes · Ryo Iwaki · Beñat San Sebastian · ROSARIO UCEDA-SOSA · Manish Nagireddy · Hiroshi Kanayama · Mikio Takeuchi · Gakuto Kurata · Karthikeyan Natesan Ramamurthy 🔗
-	MISR: Measuring Instrumental Self-Reasoning in Frontier Models ( Poster ) > link Link	Kai Fronsdal · David Lindner 🔗
-	How Does LLM Compression Affect Weight Exfiltration Attacks? ( Poster ) > link Link	Davis Brown · Mantas Mazeika 🔗
-	Towards Safe Multilingual Frontier AI ( Spotlight ) > link Link	Arturs Kanepajs · Vladimir Ivanov · Richard Moulange 🔗
-	Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers ( Poster ) > link Link	Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez 🔗
-	Mitigating Downstream Model Risks via Model Provenance ( Poster ) > link Link	Keyu Wang · Scott Schaffter · Abdullah Norozi Iranzad · Doina Precup · Jonathan Lebensold · Megan Risdal 🔗
-	Language Models Resist Alignment ( Poster ) > link Link	Jiaming Ji · Kaile Wang · Tianyi (Alex) Qiu · Boyuan Chen · Changye Li · Hantao Lou · Jiayi Zhou · Juntao Dai · Yaodong Yang 🔗
-	NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models ( Poster ) > link Link	William Tan · Kevin Zhu 🔗
-	Simulation System Towards Solving Societal-Scale Manipulation ( Poster ) > link Link	14 presenters Maximilian Puelma Touzel · Sneheel Sarangi · Austin Welch · Gayatri K · Dan Zhao · Zachary Yang · Hao Yu · Tom Gibbs · Ethan Kosak-Hine · Andreea Musulan · Camille Thibault · Reihaneh Rabbany · Jean-François Godbout · Kellin Pelrine 🔗
-	Shh, don't say that! Domain Certification in LLMs ( Poster ) > link Link	Cornelius Emde · Preetham Arvind · Alasdair Paren · Maxime Kayser · Thomas Rainforth · Thomas Lukasiewicz · Philip Torr · Adel Bibi 🔗
-	Decreasing Inconsistencies in Differentially Private Language Models through Self-Distillation ( Poster ) > link Link	Kieleh Ngong Ivoline Clarisse · Joseph Near · Niloofar Mireshghallah 🔗
-	Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents ( Poster ) > link Link	Kieleh Ngong Ivoline Clarisse · Swanand Kadhe · Hao Wang · Keerthiram Murugesan · Justin D Weisz · Amit Dhurandhar · Karthikeyan Natesan Ramamurthy 🔗
-	Century: A Dataset of Sensitive Historical Images ( Poster ) > link Link	12 presenters Canfer Akbulut · Kevin Robinson · Maribeth Rauh · Isabela Albuquerque · Olivia Wiles · Laura Weidinger · Verena Rieser · Yana Hasson · Nahema Marchal · Iason Gabriel · William Isaac · Lisa Anne Hendricks 🔗
-	Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs ( Poster ) > link Link	11 presenters Aidan Ewart · Abhay Sheshadri · Phillip Guo · Aengus Lynch · Cindy Wu · Vivek Hebbar · Henry Sleight · Asa Cooper Stickland · Ethan Perez · Dylan Hadfield-Menell · Stephen Casper 🔗
-	*Targeted Manipulation and Deception Emerge in LLMs Trained on User Feedback** ( Spotlight ) > link Link	Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan 🔗
-	ReFeR: A Hierarchical Framework of Models as Evaluative and Reasoning Agents ( Poster ) > link Link	Yaswanth Narsupalli · Abhranil Chandra · Sreevatsa Muppirala · Manish Gupta · Pawan Goyal 🔗
-	Measuring AI Agent Autonomy: Towards a Scalable Approach With Code Inspection ( Poster ) > link Link	Merlin Stein · Peter Cihon · Gagan Bansal · Sam Manning 🔗
-	I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench ( Poster ) > link Link	Yuan Li · Yue Huang · Yuli Lin · Siyuan Wu · Yao Wan · Lichao Sun 🔗
-	On Demonstration Selection for Improving Fairness in Language Models ( Spotlight ) > link Link	Song Wang · Peng Wang · Yushun Dong · Tong Zhou · Lu Cheng · Yangfeng Ji · Jundong Li 🔗
-	HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection ( Poster ) > link Link	Theo King · Zekun Wu · Adriano Koshiyama · Emre Kazim · Philip Treleaven 🔗
-	Ways Forward for Global AI Benefit Sharing ( Poster ) > link Link	Sam Manning · Claire Dennis · Stephen Clare 🔗
-	Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? ( Poster ) > link Link	Veronica Chatrath · Marcelo Lotif · Shaina Raza 🔗
-	An Adversarial Perspective on Machine Unlearning for AI Safety ( Spotlight ) > link Link	Jakub Łucki · Boyi Wei · Yangsibo Huang · Peter Henderson · Florian Tramer · Javier Rando 🔗
-	The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions ( Poster ) > link Link	Stefan Sylvius Wagner · Maike Behrendt · Marc Ziegele · Stefan Harmeling 🔗
-	Emergence of Steganography Between Large Language Models ( Poster ) > link Link	Yohan Mathew · Robert McCarthy · Joan Velja · Ollie Matthews · Nandi Schoots · Dylan Cope 🔗
-	HarmAnalyst: Interpretable, transparent, and steerable LLM safety moderation ( Poster ) > link Link	Jing-Jing Li · Valentina Pyatkin · Max Kleiman-Weiner · Liwei Jiang · Nouha Dziri · Anne Collins · Jana Schaich Borg · Maarten Sap · Yejin Choi · Sydney Levine 🔗
-	GPAI Evaluations Standards Taskforce: towards effective AI governance ( Poster ) > link Link	Patricia Paskov · Lukas Berglund · Everett Smith · Lisa Soder 🔗
-	Policy Dreamer: Diverse Public Policy Generation Via Elicitation and Simulation of Human Preferences ( Poster ) > link Link	Arjun Karanam · José Enríquez · Udari Sehwag · Michael Elabd · Kanishk Gandhi · Noah Goodman · Sanmi Koyejo 🔗
-	Towards a Theory of AI Personhood ( Poster ) > link Link	Francis Ward 🔗
-	Different Bias Under Different Criteria: Assessing Bias in LLMs with a Fact-Based Approach ( Poster ) > link Link	Changgeon Ko · Jisu Shin · Hoyun Song · Jeongyeon Seo · Jong Park 🔗
-	On the Ethical Considerations of Generative Agents ( Poster ) > link Link	Nyoma Diamond · Soumya Banerjee 🔗
-	Detection of Partially-Synthesized LLM Text ( Poster ) > link Link	Eric Lei · Hsiang Hsu · Richard Chen 🔗
-	Beyond the Binary: Capturing Diverse Preferences With Reward Regularization ( Poster ) > link Link	Vishakh Padmakumar · Chuanyang Jin · Hannah Rose Kirk · He He 🔗
-	Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards ( Poster ) > link Link	Shresth Verma · Niclas Boehmer · Lingkai Kong · Milind Tambe 🔗
-	Monitoring Human Dependence On AI Systems With Reliance Drills ( Poster ) > link Link	Rosco Hunter · Richard Moulange · Jamie Bernardi · Merlin Stein 🔗
-	CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models ( Poster ) > link Link	Song Wang · Peng Wang · Tong Zhou · Yushun Dong · Zhen Tan · Jundong Li 🔗
-	Understanding Model Bias Requires Systematic Probing Across Tasks ( Poster ) > link Link	Soline Boussard · Susannah (Cheng) Su · Helen Zhao · Siddharth Swaroop · Weiwei Pan 🔗
-	Salad-Bowl-LLM: Multi-Culture LLMs by In-Context Demonstrations from Diverse Cultures ( Poster ) > link Link	Dongkwan Kim · Junho Myung · Alice Oh 🔗
-	Investigating Goal-Aligned and Empathetic Social Reasoning Strategies for Human-Like Social Intelligence in LLMs ( Poster ) > link Link	Anirudh Gajula · Raaghav Malik 🔗
-	Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries ( Poster ) > link Link	Adam Yang · CHEN CHEN · Konstantinos Pitas 🔗
-	Plentiful Jailbreaks with String Compositions ( Poster ) > link Link	Brian Huang 🔗
-	The Impact of Large Language Models in Academia: from Writing to Speaking ( Poster ) > link Link	Mingmeng Geng · caixi chen · Yanru Wu · Dongping Chen · Yao Wan · Pan Zhou 🔗
-	THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models ( Poster ) > link Link	Mengfei Liang · Archish Arun · Zekun Wu · CRISTIAN VILLALOBOS · Jonathan Lutch · Emre Kazim · Adriano Koshiyama · Philip Treleaven 🔗
-	Levels of Autonomy: Liability in the age of AI Agents ( Poster ) > link Link	Lisa Soder · Julia Smakman · Connor Dunlop · Weiwei Pan · Siddharth Swaroop 🔗
-	LLM Alignment Using Soft Prompt Tuning: The Case of Cultural Alignment ( Poster ) > link Link	Reem Masoud · Martin Ferianc · Philip Treleaven · Miguel Rodrigues 🔗
-	On Adversarial Robustness of Language Models in Transfer Learning ( Poster ) > link Link	Bohdan Turbal · Anastasiia Mazur · Jiaxu Zhao · Mykola Pechenizkiy 🔗
-	Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction ( Poster ) > link Link	Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi 🔗
-	Large Language Models Still Exhibit Bias in Long Text ( Poster ) > link Link	Wonje Jeung · Dongjae Jeon · Ashkan Yousefpour · Jonghyun Choi 🔗
-	Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents ( Poster ) > link Link	Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y 🔗
-	Linear Probe Penalties Reduce LLM Sycophancy ( Poster ) > link Link	Henry Papadatos · Rachel Freedman 🔗
-	Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries ( Spotlight ) > link Link	Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang 🔗
-	A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning ( Poster ) > link Link	Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto 🔗
-	Failures to Find Transferable Image Jailbreaks Between Vision-Language Models ( Spotlight ) > link Link	16 presenters Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez 🔗
-	Developing Story: Case Studies of Generative AI’s Use in Journalism ( Poster ) > link Link	Natalie Brigham · Chongjiu Gao · Tadayoshi Kohno · Franziska Roesner · Niloofar Mireshghallah 🔗
-	The Case for Model Access Governance ( Poster ) > link Link	Edward Kembery 🔗
-	Developing an occupational prestige scale using Large Language Models ( Poster ) > link Link	Robert de Vries · Mark Hill · Laura Ruis 🔗