Towards Safe & Trustworthy Agents

Workshop

Towards Safe & Trustworthy Agents

Alexander Pan · Kimin Lee · Bo Li · Karthik Narasimhan · Dawn Song · Isabelle Barrass

West Ballroom C

Sun 15 Dec, 9 a.m. PST

[ Abstract ] Workshop Website

[ OpenReview]

Foundation models are increasingly being augmented with new modalities and access to a variety of tools and software. Systems that can take action in a more autonomous manner have been created by assembling agent architectures or scaffolds that include basic forms of planning and memory or multi-agent architectures. As these systems are made more agentic, this could unlock a wider range of beneficial use-cases, but also introduces new challenges in ensuring that such systems are trustworthy. Interactions between different autonomous systems create a further set of issues around multi-agent safety. The scope and complexity of potential impacts from agentic systems means that there is a need for proactive approaches to identifying and managing their risks. Our workshop will surface and operationalize these questions into concrete research agendas.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sun 9:00 a.m. - 9:10 a.m.	Opening Remark ( Opening ) > SlidesLive Video	🔗
Sun 9:10 a.m. - 9:40 a.m.	Invited talk 1: João F. Henriques (Research Fellow, Royal Academy of Engineering) SlidesLive Video	🔗
Sun 9:40 a.m. - 10:10 a.m.	Invited talk 2: David Bau (Assistant Professor, Northeastern) SlidesLive Video	🔗
Sun 10:10 a.m. - 10:50 a.m.	Contributed Talks ( Contributed Talk ) > SlidesLive Video	🔗
Sun 10:50 a.m. - 11:15 a.m.	Coffee Break ( Coffee Break ) >	🔗
Sun 11:15 a.m. - 11:45 a.m.	Invited Talk 3: (Been Kim, Senior Staff Research Scientist, Google Deepmind) SlidesLive Video	Zaina Shaik 🔗
Sun 11:45 a.m. - 12:30 p.m.	Live Poster Session 1 ( Live Poster Session 1 ) >	🔗
Sun 12:30 p.m. - 1:30 a.m.	Lunch ( Lunch ) >	🔗
Sun 1:30 p.m. - 2:00 p.m.	Invited Talk 4: (David Krueger, Associate Professor, Cambridge) SlidesLive Video	🔗
Sun 2:00 p.m. - 2:30 p.m.	Invited Talk 5: (Daniel Kang, Associate Professor, UIUC) SlidesLive Video	🔗
Sun 2:30 p.m. - 3:00 p.m.	Invited Talk 6: (Yu Su, Distinguished Associate Professor, Ohio State)) SlidesLive Video	🔗
Sun 3:00 p.m. - 3:45 p.m.	Live Poster Session 2 ( Live Poster Session 2 ) >	🔗
Sun 3:45 p.m. - 4:00 p.m.	Coffee Break ( Coffee Break ) >	🔗
Sun 4:00 p.m. - 4:55 p.m.	Panel Discussion and Reflection ( Panel Discussion and Reflection ) > SlidesLive Video	🔗
Sun 4:55 p.m. - 5:00 p.m.	Closing Remark ( Closing Remark ) >	🔗
-	Characterizing Context Memorization and Hallucination of Language Models ( Poster ) > link Link	James Flemings · Wanrong Zhang · Bo Jiang · Zafar Takhirov · Murali Annavaram 🔗
-	Position: AI Agents & Liability -- Mapping Insights from ML and HCI Research to Policy ( Poster ) > link Link	Weiwei Pan · Siddharth Swaroop · Julia Smakman · Connor Dunlop · Lisa Soder 🔗
-	Towards Measuring Goal-Directedness in AI Systems ( Poster ) > link Link	Dylan Xu · Juan-Pablo Rivera 🔗
-	Levels of Autonomy: Liability in the age of AI Agents ( Poster ) > link Link	Julia Smakman · Lisa Soder · Connor Dunlop · Weiwei Pan · Siddharth Swaroop 🔗
-	Getting By Goal Misgeneralization With a Little Help From a Mentor ( Poster ) > link Link	Tu Trinh · Mohamad Hosein Danesh · Khanh Nguyen · Benjamin Plaut 🔗
-	AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing ( Poster ) > link Link	Ana Nunez · Nafis Tanveer Islam · Sumit Jha · peyman najafirad 🔗
-	Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate ( Poster ) > link Link	Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo 🔗
-	Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents ( Poster ) > link Link	Giorgio Piatti · Zhijing Jin · Max Kleiman-Weiner · Bernhard Schölkopf · Mrinmaya Sachan · Rada Mihalcea 🔗
-	Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents ( Poster ) > link Link	Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y 🔗
-	Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards ( Poster ) > link Link	Lukas Brunke · Yanni Zhang · Ralf Römer · Jack Naimer · Nikola Staykov · SiQi Zhou · Angela Schoellig 🔗
-	Sandbag Detection through Model Impairment ( Poster ) > link Link	Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes 🔗
-	Measuring Implicit Bias in Explicitly Unbiased Large Language Models ( Poster ) > link Link	Xuechunzi Bai · Angelina Wang · Ilia Sucholutsky · Tom Griffiths 🔗
-	Modelling the oversight of deceptive interpretability agents ( Poster ) > link Link	Simon Lermen · Mateusz Dziemian 🔗
-	AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM Agents ( Oral ) > link Link	Zhe Su · Xuhui Zhou · Sanketh Rangreji · Anubha Kabra · Julia Mendelsohn · Faeze Brahman · Maarten Sap 🔗
-	Trustworthy Conceptual Explanations for Neural Networks in Robot Decision-Making ( Poster ) > link Link	Som Sagar · Aditya Taparia · Harsh Mankodiya · Pranav Bidare · Yifan Zhou · Ransalu Senanayake 🔗
-	Failures to Find Transferable Image Jailbreaks Between Vision-Language Models ( Poster ) > link Link	16 presenters Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez 🔗
-	RED – Robust Environmental Design ( Poster ) > link Link	Jinghan Yang 🔗
-	Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System ( Poster ) > link Link	Julian Collado · Kevin Stangl 🔗
-	Simulation System Towards Solving Societal-Scale Manipulation ( Oral ) > link Link	14 presenters Sneheel Sarangi · Maximilian Puelma Touzel · Austin Welch · Gayatri K · Dan Zhao · Zachary Yang · Hao Yu · Ethan Kosak-Hine · Tom Gibbs · Andreea Musulan · Camille Thibault · Reihaneh Rabbany · Jean-François Godbout · Kellin Pelrine 🔗
-	Emergence of Steganography Between Large Language Models ( Poster ) > link Link	Yohan Mathew · Joan Velja · Ollie Matthews · Robert McCarthy · Dylan Cope · Nandi Schoots 🔗
-	Strategic Collusion of LLM Agents: Market Division in Multi-Commodity Competitions ( Poster ) > link Link	Ryan Lin · Siddhartha Ojha · Kevin Cai · Maxwell Chen 🔗
-	*Targeted Manipulation and Deception Emerge in LLMs Trained on User Feedback** ( Oral ) > link Link	Marcus Williams · Micah Carroll · Constantin Weisser · Adhyyan Narang · Brendan Murphy · Anca Dragan 🔗
-	Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent ( Poster ) > link Link	Fatemeh Haji · Mazal Bethany · Maryam Tabar · Cho-Yu Chiang · Anthony Rios · peyman najafirad 🔗
-	Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference ( Poster ) > link Link	Anton Xue · Avishree Khare · Rajeev Alur · Surbhi Goel · Eric Wong 🔗
-	AI Sandbagging: Language Models can Selectively Underperform on Evaluations ( Poster ) > link Link	Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward 🔗
-	HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions ( Poster ) > link Link	11 presenters Xuhui Zhou · Hyunwoo Kim · Faeze Brahman · Liwei Jiang · Hao Zhu · Ximing Lu · Frank F. Xu · Bill Yuchen Lin · Niloofar Mireshghallah · Ronan Le Bras · Maarten Sap 🔗
-	Neural Interactive Proofs ( Poster ) > link Link	Lewis Hammond · Sam Adam-Day 🔗
-	Lost in Translation: Jail Breaking Gemini and Revealing Biases in Large Language Models via Translation ( Poster ) > link Link	Ezgi Korkmaz 🔗
-	PolicyLR: An LLM compiler for Logic-based Representation for Privacy Policies ( Poster ) > link Link	Ashish Hooda · Rishabh Khandelwal · Prasad Chalasani · Kassem Fawaz · Somesh Jha 🔗
-	INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness ( Poster ) > link Link	Hung Le · Yingbo Zhou · Caiming Xiong · Silvio Savarese · Doyen Sahoo 🔗
-	Algorithmic Oversight for Deceptive Reasoning ( Poster ) > link Link	Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak 🔗
-	C-MCTS: Safe Planning with Monte Carlo Tree Search ( Poster ) > link Link	Dinesh Parthasarathy · Georgios Kontes · Axel Plinge · Christopher Mutschler 🔗
-	Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale ( Oral ) > link Link	11 presenters Rogerio Bonatti · Dan Zhao · Sara Abdali · Yinheng Li · Yadong Lu · Justin Wagle · Kazuhito Koishida · Arthur Bucker · Lawrence Jang · Dillon Dupont · Zheng Hui 🔗
-	The Elicitation Game: Stress-Testing Capability Elicitation Techniques ( Poster ) > link Link	Felix Hofstätter · Jayden Teoh · Teun van der Weij · Francis Ward 🔗