Workshop
Towards Safe & Trustworthy Agents
Alexander Pan · Kimin Lee · Bo Li · Karthik Narasimhan · Dawn Song · Isabelle Barrass
West Ballroom C
Sun 15 Dec, 9 a.m. PST
Foundation models are increasingly being augmented with new modalities and access to a variety of tools and software. Systems that can take action in a more autonomous manner have been created by assembling agent architectures or scaffolds that include basic forms of planning and memory or multi-agent architectures. As these systems are made more agentic, this could unlock a wider range of beneficial use-cases, but also introduces new challenges in ensuring that such systems are trustworthy. Interactions between different autonomous systems create a further set of issues around multi-agent safety. The scope and complexity of potential impacts from agentic systems means that there is a need for proactive approaches to identifying and managing their risks. Our workshop will surface and operationalize these questions into concrete research agendas.
Schedule
Sun 9:00 a.m. - 9:10 a.m.
|
Opening Remark
(
Opening
)
>
SlidesLive Video |
🔗 |
Sun 9:10 a.m. - 9:40 a.m.
|
Invited talk 1: João F. Henriques (Research Fellow, Royal Academy of Engineering)
SlidesLive Video |
🔗 |
Sun 9:40 a.m. - 10:10 a.m.
|
Invited talk 2: David Bau (Assistant Professor, Northeastern)
SlidesLive Video |
🔗 |
Sun 10:10 a.m. - 10:50 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sun 10:50 a.m. - 11:15 a.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
Sun 11:15 a.m. - 11:45 a.m.
|
Invited Talk 3: (Been Kim, Senior Staff Research Scientist, Google Deepmind)
SlidesLive Video |
Zaina Shaik 🔗 |
Sun 11:45 a.m. - 12:30 p.m.
|
Live Poster Session 1
(
Live Poster Session 1
)
>
|
🔗 |
Sun 12:30 p.m. - 1:30 a.m.
|
Lunch
(
Lunch
)
>
|
🔗 |
Sun 1:30 p.m. - 2:00 p.m.
|
Invited Talk 4: (David Krueger, Associate Professor, Cambridge)
SlidesLive Video |
🔗 |
Sun 2:00 p.m. - 2:30 p.m.
|
Invited Talk 5: (Daniel Kang, Associate Professor, UIUC)
SlidesLive Video |
🔗 |
Sun 2:30 p.m. - 3:00 p.m.
|
Invited Talk 6: (Yu Su, Distinguished Associate Professor, Ohio State))
SlidesLive Video |
🔗 |
Sun 3:00 p.m. - 3:45 p.m.
|
Live Poster Session 2
(
Live Poster Session 2
)
>
|
🔗 |
Sun 3:45 p.m. - 4:00 p.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
Sun 4:00 p.m. - 4:55 p.m.
|
Panel Discussion and Reflection
(
Panel Discussion and Reflection
)
>
SlidesLive Video |
🔗 |
Sun 4:55 p.m. - 5:00 p.m.
|
Closing Remark
(
Closing Remark
)
>
|
🔗 |
-
|
Characterizing Context Memorization and Hallucination of Language Models ( Poster ) > link | James Flemings · Wanrong Zhang · Bo Jiang · Zafar Takhirov · Murali Annavaram 🔗 |
-
|
Position: AI Agents & Liability -- Mapping Insights from ML and HCI Research to Policy ( Poster ) > link | Weiwei Pan · Siddharth Swaroop · Julia Smakman · Connor Dunlop · Lisa Soder 🔗 |
-
|
Towards Measuring Goal-Directedness in AI Systems ( Poster ) > link | Dylan Xu · Juan-Pablo Rivera 🔗 |
-
|
Levels of Autonomy: Liability in the age of AI Agents ( Poster ) > link | Julia Smakman · Lisa Soder · Connor Dunlop · Weiwei Pan · Siddharth Swaroop 🔗 |
-
|
Getting By Goal Misgeneralization With a Little Help From a Mentor ( Poster ) > link | Tu Trinh · Mohamad Hosein Danesh · Khanh Nguyen · Benjamin Plaut 🔗 |
-
|
AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing ( Poster ) > link | Ana Nunez · Nafis Tanveer Islam · Sumit Jha · peyman najafirad 🔗 |
-
|
Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate ( Poster ) > link | Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo 🔗 |
-
|
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents ( Poster ) > link | Giorgio Piatti · Zhijing Jin · Max Kleiman-Weiner · Bernhard Schölkopf · Mrinmaya Sachan · Rada Mihalcea 🔗 |
-
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents ( Poster ) > link | Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y 🔗 |
-
|
Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards ( Poster ) > link | Lukas Brunke · Yanni Zhang · Ralf Römer · Jack Naimer · Nikola Staykov · SiQi Zhou · Angela Schoellig 🔗 |
-
|
Sandbag Detection through Model Impairment ( Poster ) > link | Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes 🔗 |
-
|
Measuring Implicit Bias in Explicitly Unbiased Large Language Models ( Poster ) > link | Xuechunzi Bai · Angelina Wang · Ilia Sucholutsky · Tom Griffiths 🔗 |
-
|
Modelling the oversight of deceptive interpretability agents ( Poster ) > link | Simon Lermen · Mateusz Dziemian 🔗 |
-
|
AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM Agents ( Oral ) > link | Zhe Su · Xuhui Zhou · Sanketh Rangreji · Anubha Kabra · Julia Mendelsohn · Faeze Brahman · Maarten Sap 🔗 |
-
|
Trustworthy Conceptual Explanations for Neural Networks in Robot Decision-Making ( Poster ) > link | Som Sagar · Aditya Taparia · Harsh Mankodiya · Pranav Bidare · Yifan Zhou · Ransalu Senanayake 🔗 |
-
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models ( Poster ) > link |
16 presentersRylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
-
|
RED – Robust Environmental Design ( Poster ) > link | Jinghan Yang 🔗 |
-
|
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System ( Poster ) > link | Julian Collado · Kevin Stangl 🔗 |
-
|
Simulation System Towards Solving Societal-Scale Manipulation ( Oral ) > link |
14 presentersSneheel Sarangi · Maximilian Puelma Touzel · Austin Welch · Gayatri K · Dan Zhao · Zachary Yang · Hao Yu · Ethan Kosak-Hine · Tom Gibbs · Andreea Musulan · Camille Thibault · Reihaneh Rabbany · Jean-François Godbout · Kellin Pelrine |
-
|
Emergence of Steganography Between Large Language Models ( Poster ) > link | Yohan Mathew · Joan Velja · Ollie Matthews · Robert McCarthy · Dylan Cope · Nandi Schoots 🔗 |
-
|
Strategic Collusion of LLM Agents: Market Division in Multi-Commodity Competitions ( Poster ) > link | Ryan Lin · Siddhartha Ojha · Kevin Cai · Maxwell Chen 🔗 |
-
|
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback ( Oral ) > link | Marcus Williams · Micah Carroll · Constantin Weisser · Adhyyan Narang · Brendan Murphy · Anca Dragan 🔗 |
-
|
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent ( Poster ) > link | Fatemeh Haji · Mazal Bethany · Maryam Tabar · Cho-Yu Chiang · Anthony Rios · peyman najafirad 🔗 |
-
|
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference ( Poster ) > link | Anton Xue · Avishree Khare · Rajeev Alur · Surbhi Goel · Eric Wong 🔗 |
-
|
AI Sandbagging: Language Models can Selectively Underperform on Evaluations ( Poster ) > link | Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward 🔗 |
-
|
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions ( Poster ) > link |
11 presentersXuhui Zhou · Hyunwoo Kim · Faeze Brahman · Liwei Jiang · Hao Zhu · Ximing Lu · Frank F. Xu · Bill Yuchen Lin · Niloofar Mireshghallah · Ronan Le Bras · Maarten Sap |
-
|
Neural Interactive Proofs ( Poster ) > link | Lewis Hammond · Sam Adam-Day 🔗 |
-
|
Lost in Translation: Jail Breaking Gemini and Revealing Biases in Large Language Models via Translation ( Poster ) > link | Ezgi Korkmaz 🔗 |
-
|
PolicyLR: An LLM compiler for Logic-based Representation for Privacy Policies ( Poster ) > link | Ashish Hooda · Rishabh Khandelwal · Prasad Chalasani · Kassem Fawaz · Somesh Jha 🔗 |
-
|
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness ( Poster ) > link | Hung Le · Yingbo Zhou · Caiming Xiong · Silvio Savarese · Doyen Sahoo 🔗 |
-
|
Algorithmic Oversight for Deceptive Reasoning ( Poster ) > link | Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak 🔗 |
-
|
C-MCTS: Safe Planning with Monte Carlo Tree Search ( Poster ) > link | Dinesh Parthasarathy · Georgios Kontes · Axel Plinge · Christopher Mutschler 🔗 |
-
|
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale ( Oral ) > link |
11 presentersRogerio Bonatti · Dan Zhao · Sara Abdali · Yinheng Li · Yadong Lu · Justin Wagle · Kazuhito Koishida · Arthur Bucker · Lawrence Jang · Dillon Dupont · Zheng Hui |
-
|
The Elicitation Game: Stress-Testing Capability Elicitation Techniques ( Poster ) > link | Felix Hofstätter · Jayden Teoh · Teun van der Weij · Francis Ward 🔗 |