Workshop
Workshop on Machine Learning Safety
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini
Virtual
Fri 9 Dec, 7 a.m. PST
Designing systems to operate safely in real-world settings is a topic of growing interest in machine learning. As ML becomes more capable and widespread, long-term and long-tail safety risks will grow in importance. To make the adoption of ML more beneficial, various aspects of safety engineering and oversight need to be proactively addressed by the research community. This workshop will bring together researchers from machine learning communities to focus on research topics in Robustness, Monitoring, Alignment, and Systemic Safety.
* Robustness is designing systems to be reliable in the face of adversaries and highly unusual situations.
* Monitoring is detecting anomalies, malicious use, and discovering unintended model functionality.
* Alignment is building models that represent and safely optimize difficult-to-specify human values.
* Systemic Safety is using ML to address broader risks related to how ML systems are handled, such as cyberattacks, facilitating cooperation, or improving the decision-making of public servants.
Schedule
Fri 7:00 a.m. - 7:10 a.m.
|
Opening Remarks
(
Speaker
)
>
SlidesLive Video |
🔗 |
Fri 7:10 a.m. - 7:40 a.m.
|
Sharon Li: How to Handle Distributional Shifts? Challenges, Research Progress and Future Directions
(
Speaker
)
>
SlidesLive Video |
🔗 |
Fri 7:40 a.m. - 8:25 a.m.
|
Morning Poster Session ( Poster Session ) > link | 🔗 |
Fri 8:25 a.m. - 8:45 a.m.
|
Coffee Break
|
🔗 |
Fri 8:45 a.m. - 9:15 a.m.
|
Bo Li: Trustworthy Machine Learning via Learning with Reasoning
(
Speaker
)
>
SlidesLive Video |
Bo Li 🔗 |
Fri 9:15 a.m. - 10:00 a.m.
|
Afternoon Poster Session ( Poster Session ) > link | 🔗 |
Fri 10:00 a.m. - 10:45 a.m.
|
Lunch
|
🔗 |
Fri 10:45 a.m. - 11:15 a.m.
|
Dorsa Sadigh: Aligning Robot Representations with Humans
(
Speaker
)
>
SlidesLive Video |
Dorsa Sadigh 🔗 |
Fri 11:15 a.m. - 11:45 a.m.
|
David Krueger: Sources of Specification Failure.
(
Speaker
)
>
SlidesLive Video |
David Krueger 🔗 |
Fri 11:45 a.m. - 12:00 p.m.
|
Coffee Break
(
Speaker
)
>
|
🔗 |
Fri 12:00 p.m. - 12:30 p.m.
|
David Bau: Direct model editing: a framework for understanding model knowledge
(
Speaker
)
>
SlidesLive Video |
🔗 |
Fri 12:30 p.m. - 1:00 p.m.
|
Sam Bowman: What's the deal with AI safety?
(
Speaker
)
>
SlidesLive Video |
Samuel Bowman 🔗 |
Fri 1:00 p.m. - 1:55 p.m.
|
Live Panel Discussion with the Invited Speakers
(
Discussion Panel
)
>
SlidesLive Video |
🔗 |
Fri 1:55 p.m. - 2:00 p.m.
|
Closing Remarks
(
Speaker
)
>
SlidesLive Video |
🔗 |
-
|
Formalizing the Problem of Side Effect Regularization
(
Poster
)
>
|
Alex Turner · Aseem Saxena · Prasad Tadepalli 🔗 |
-
|
Measuring Robustness with Black-Box Adversarial Attack using Reinforcement Learning
(
Poster
)
>
|
Soumyendu Sarkar · Sajad Mousavi · Ashwin Ramesh Babu · Vineet Gundecha · Sahand Ghorbanpour · Alexander Shmakov 🔗 |
-
|
Investigating causal understanding in LLMs
(
Poster
)
>
|
Marius Hobbhahn · Tom Lieberum · David Seiler 🔗 |
-
|
Reflection Mechanisms as an Alignment Target: A Survey
(
Poster
)
>
|
Marius Hobbhahn · Eric Landgrebe · Elizabeth Barnes 🔗 |
-
|
Interpolating Compressed Parameter Subspaces
(
Poster
)
>
|
Siddhartha Datta · Nigel Shadbolt 🔗 |
-
|
Probabilistically Robust PAC Learning
(
Poster
)
>
|
Vinod Raman · Ambuj Tewari · UNIQUE SUBEDI 🔗 |
-
|
Multiple Remote Adversarial Patches: Generating Patches based on Diffusion Models for Object Detection using CNNs
(
Poster
)
>
|
Kento Oonishi · Tsunato Nakai · Daisuke Suzuki 🔗 |
-
|
Misspecification in Inverse Reinforcement Learning
(
Poster
)
>
|
Joar Skalse · Alessandro Abate 🔗 |
-
|
Red-Teaming the Stable Diffusion Safety Filter
(
Poster
)
>
|
Javier Rando · Daniel Paleka · David Lindner · Lennart Heim · Florian Tramer 🔗 |
-
|
Tracking the Risk of Machine Learning Systems with Partial Monitoring
(
Poster
)
>
|
Maxime Heuillet · Audrey Durand 🔗 |
-
|
The Reward Hypothesis is False
(
Poster
)
>
|
Joar Skalse · Alessandro Abate 🔗 |
-
|
Training Time Adversarial Attack Aiming the Vulnerability of Continual Learning
(
Poster
)
>
|
Gyojin Han · Jaehyun Choi · HyeongGwon Hong · Junmo Kim 🔗 |
-
|
Measuring Reliability of Large Language Models through Semantic Consistency
(
Poster
)
>
|
Harsh Raj · Domenic Rosati · Subhabrata Majumdar 🔗 |
-
|
CUDA: Curriculum of Data Augmentation for Long-tailed Recognition
(
Poster
)
>
|
Sumyeong Ahn · Jongwoo Ko · Se-Young Yun 🔗 |
-
|
Certified defences hurt generalisation
(
Poster
)
>
|
Piersilvio De Bartolomeis · Jacob Clarysse · Fanny Yang · Amartya Sanyal 🔗 |
-
|
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
(
Poster
)
>
|
Olivia Wiles · Isabela Albuquerque · Sven Gowal 🔗 |
-
|
Context-Adaptive Deep Neural Networks via Bridge-Mode Connectivity
(
Poster
)
>
|
Nathan Drenkow · Alvin Tan · Clayton Ashcraft · Kiran Karra 🔗 |
-
|
Constraining Low-level Representations to Define Effective Confidence Scores
(
Poster
)
>
|
Joao Monteiro · Pau Rodriguez · Pierre-Andre Noel · Issam Hadj Laradji · David Vázquez 🔗 |
-
|
On the Robustness of Safe Reinforcement Learning under Observational Perturbations
(
Poster
)
>
|
ZUXIN LIU · Zijian Guo · Zhepeng Cen · Huan Zhang · Jie Tan · Bo Li · DING ZHAO 🔗 |
-
|
Improving Zero-shot Generalization and Robustness of Multi-modal Models
(
Poster
)
>
|
Yunhao Ge · Jie Ren · Ming-Hsuan Yang · Yuxiao Wang · Andrew Gallagher · Hartwig Adam · Laurent Itti · Balaji Lakshminarayanan · Jiaping Zhao 🔗 |
-
|
Disclosing the Biases in Large Language Models via Reward Structured Questions
(
Poster
)
>
|
Ezgi Korkmaz 🔗 |
-
|
Dynamic Stochastic Ensemble with Adversarial Robust Lottery Ticket Subnetworks
(
Poster
)
>
|
Qi Peng · Wenlin Liu · Qin RuoXi · Libin Hou · Bin Yan · Linyuan Wang 🔗 |
-
|
Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation
(
Poster
)
>
|
yifan zhang · Hanlin Zhang · Zachary Lipton · Li Erran Li · Eric Xing 🔗 |
-
|
Bandits with Costly Reward Observations
(
Poster
)
>
|
Aaron Tucker · Caleb Biddulph · Claire Wang · Thorsten Joachims 🔗 |
-
|
RobustAugMix: Joint Optimization of Natural and Adversarial Robustness
(
Poster
)
>
|
Josue Martinez-Martinez · Olivia Brown 🔗 |
-
|
Pre-training Robust Feature Extractor Against Clean-label Data Poisoning Attacks
(
Poster
)
>
|
Ting Zhou · Hanshu Yan · Lei LIU · Jingfeng Zhang · Bo Han 🔗 |
-
|
MoAT: Meta-Evaluation of Anti-Malware Trustworthiness
(
Poster
)
>
|
Sharon Lin · Marc Fyrbiak · Christof Paar 🔗 |
-
|
Cold Posteriors through PAC-Bayes
(
Poster
)
>
|
Konstantinos Pitas · Julyan Arbel 🔗 |
-
|
How Sure to Be Safe? Difficulty, Confidence and Negative Side Effects
(
Poster
)
>
|
John Burden · José Hernández-Orallo · Sean O hEigeartaigh 🔗 |
-
|
Towards Defining Deception in Structural Causal Games
(
Poster
)
>
|
Francis Ward 🔗 |
-
|
System Safety Engineering for Social and Ethical ML Risks: A Case Study
(
Poster
)
>
|
Edgar Jatho · Logan Mailloux · Shalaleh Rismani · Eugene Williams · Joshua Kroll 🔗 |
-
|
Quantifying Misalignment Between Agents
(
Poster
)
>
|
Aidan Kierans · Hananel Hazan · Shiri Dori-Hacohen 🔗 |
-
|
Lower Bounds on 0-1 Loss for Multi-class Classification with a Test-time Attacker
(
Poster
)
>
|
Sihui Dai · Wenxin Ding · Arjun Nitin Bhagoji · Daniel Cullina · Prateek Mittal · Ben Zhao 🔗 |
-
|
HEAT: Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection
(
Poster
)
>
|
Marc Lafon · Clément Rambour · Nicolas THOME 🔗 |
-
|
Fine-grain Inference on Out-of-Distribution Data with Hierarchical Classification
(
Poster
)
>
|
Randolph Linderman · Jingyang Zhang · Nathan Inkawhich · Hai Li · Yiran Chen 🔗 |
-
|
Indiscriminate Data Poisoning Attacks on Neural Networks
(
Poster
)
>
|
Yiwei Lu · Gautam Kamath · Yaoliang Yu 🔗 |
-
|
Mitigating Lies in Vision-Language Models
(
Poster
)
>
|
Junbo Li · Xianhang Li · Cihang Xie 🔗 |
-
|
Risk-aware Bayesian Reinforcement Learning for Cautious Exploration
(
Poster
)
>
|
Rohan Mitta · Hosein Hasanbeig · Daniel Kroening · Alessandro Abate 🔗 |
-
|
The Expertise Problem: Learning from Specialized Feedback
(
Poster
)
>
|
Oliver Daniels-Koch · Rachel Freedman 🔗 |
-
|
Cryptographic Auditing for Collaborative Learning
(
Poster
)
>
|
Hidde Lycklama · Nicolas Küchler · Alexander Viand · Emanuel Opel · Lukas Burkhalter · Anwar Hithnawi 🔗 |
-
|
Certifiable Metric One Class Learning with adversarially trained Lipschitz Classifier
(
Poster
)
>
|
Louis Béthune · Mathieu Serrurier 🔗 |
-
|
An Adversarial Robustness Perspective on the Topology of Neural Networks
(
Poster
)
>
|
Morgane Goibert · Elvis Dohmatob · Thomas Ricatte 🔗 |
-
|
Falsehoods that ML researchers believe about OOD detection
(
Poster
)
>
|
Andi Zhang · Damon Wischik 🔗 |
-
|
Ignore Previous Prompt: Attack Techniques For Language Models
(
Poster
)
>
|
Fabio Perez · Ian Ribeiro 🔗 |
-
|
Towards Adversarial Purification using Denoising AutoEncoders
(
Poster
)
>
|
Dvij Kalaria · Aritra Hazra · Partha Chakrabarti 🔗 |
-
|
Continual Poisoning of Generative Models to Promote Catastrophic Forgetting
(
Poster
)
>
|
Siteng Kang · Xinhua Zhang 🔗 |
-
|
Adversarial Attacks on Transformers-Based Malware Detectors
(
Poster
)
>
|
Yash Jakhotiya · Heramb Patil · Jugal Rawlani 🔗 |
-
|
A Cooperative Reinforcement Learning Environment for Detecting and Penalizing Betrayal
(
Poster
)
>
|
Nikiforos Pittaras 🔗 |
-
|
REAP: A Large-Scale Realistic Adversarial Patch Benchmark
(
Poster
)
>
|
Nabeel Hingun · Chawin Sitawarin · Jerry Li · David Wagner 🔗 |
-
|
Adversarial Policies Beat Professional-Level Go AIs
(
Poster
)
>
|
Tony Wang · Adam Gleave · Nora Belrose · Tom Tseng · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Joseph Miller · Sergey Levine · Stuart Russell 🔗 |
-
|
A Deep Dive into Dataset Imbalance and Bias in Face Identification
(
Poster
)
>
|
Valeriia Cherepanova · Steven Reich · Samuel Dooley · Hossein Souri · John Dickerson · Micah Goldblum · Tom Goldstein 🔗 |
-
|
Part-Based Models Improve Adversarial Robustness
(
Poster
)
>
|
Chawin Sitawarin · Kornrapat Pongmala · Yizheng Chen · Nicholas Carlini · David Wagner 🔗 |
-
|
Smoothed-SGDmax: A Stability-Inspired Algorithm to Improve Adversarial Generalization
(
Poster
)
>
|
Jiancong Xiao · Jiawei Zhang · Zhiquan Luo · Asuman Ozdaglar 🔗 |
-
|
Hidden Poison: Machine unlearning enables camouflaged poisoning attacks
(
Poster
)
>
|
Jimmy Di · Jack Douglas · Jayadev Acharya · Gautam Kamath · Ayush Sekhari 🔗 |
-
|
DrML: Diagnosing and Rectifying Vision Models using Language
(
Poster
)
>
|
Yuhui Zhang · Jeff Z. HaoChen · Shih-Cheng Huang · Kuan-Chieh Wang · James Zou · Serena Yeung 🔗 |
-
|
Deceiving the CKA Similarity Measure in Deep Learning
(
Poster
)
>
|
MohammadReza Davari · Stefan Horoi · Amine Natik · Guillaume Lajoie · Guy Wolf · Eugene Belilovsky 🔗 |
-
|
A Mechanistic Lens on Mode Connectivity
(
Poster
)
>
|
Ekdeep S Lubana · Eric Bigelow · Robert Dick · David Krueger · Hidenori Tanaka 🔗 |
-
|
Visual Prompting for Adversarial Robustness
(
Poster
)
>
|
Aochuan Chen · Peter Lorenz · Yuguang Yao · Pin-Yu Chen · Sijia Liu 🔗 |
-
|
Identification of the Adversary from a Single Adversarial Example
(
Poster
)
>
|
Minhao Cheng · Rui Min 🔗 |
-
|
Mitigating Dataset Bias by Using Per-sample Gradient
(
Poster
)
>
|
Sumyeong Ahn · SeongYoon Kim · Se-Young Yun 🔗 |
-
|
A General Framework for Safe Decision Making: A Convex Duality Approach
(
Poster
)
>
|
Martino Bernasconi · Federico Cacciamani · Nicola Gatti · Francesco Trovò 🔗 |
-
|
A Unifying Framework for Online Safe Optimization
(
Poster
)
>
|
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Giulia Romano · Nicola Gatti 🔗 |
-
|
Targeted Adversarial Self-Supervised Learning
(
Poster
)
>
|
Minseon Kim · Hyeonjeong Ha · Sooel Son · Sung Ju Hwang 🔗 |
-
|
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries
(
Poster
)
>
|
Yuxin Wen · Arpit Bansal · Hamid Kazemi · Eitan Borgnia · Micah Goldblum · Jonas Geiping · Tom Goldstein 🔗 |
-
|
Can Large Language Models Truly Follow your Instructions?
(
Poster
)
>
|
Joel Jang · Seonghyeon Ye · Minjoon Seo 🔗 |
-
|
Broken Neural Scaling Laws
(
Poster
)
>
|
Ethan Caballero · kshitij Gupta · Irina Rish · David Krueger 🔗 |
-
|
Do Domain Generalization Methods Generalize Well?
(
Poster
)
>
|
Akshay Mehra · Bhavya Kailkhura · Pin-Yu Chen · Jihun Hamm 🔗 |
-
|
What You See is What You Get: Principled Deep Learning via Distributional Generalization
(
Poster
)
>
|
Bogdan Kulynych · Yao-Yuan Yang · Yaodong Yu · Jaroslaw Blasiok · Preetum Nakkiran 🔗 |
-
|
Adversarial poisoning attacks on reinforcement learning-driven energy pricing
(
Poster
)
>
|
Sam Gunn · Doseok Jang · Orr Paradise · Lucas Spangher · Costas J Spanos 🔗 |
-
|
OOD Detection with Class Ratio Estimation
(
Poster
)
>
|
Mingtian Zhang · Andi Zhang · Tim Xiao · Yitong Sun · Steven McDonagh 🔗 |
-
|
Alignment as a Dynamic Process
(
Poster
)
>
|
Paul de Font-Reaulx 🔗 |
-
|
The Use of Non-epistemic Values to Account for Bias in Automated Decision Making
(
Poster
)
>
|
Jesse Hoey · Gabrielle Chan · Mathieu Doucet · Christopher Risi · Freya Zhang 🔗 |
-
|
Few-Shot Transferable Robust Representation Learning via Bilevel Attacks
(
Poster
)
>
|
Minseon Kim · Hyeonjeong Ha · Sung Ju Hwang 🔗 |
-
|
What 'Out-of-distribution' Is and Is Not
(
Poster
)
>
|
Sebastian Farquhar · Yarin Gal 🔗 |
-
|
Adversarial Robustness of Deep Inverse Reinforcement Learning
(
Poster
)
>
|
Ezgi Korkmaz 🔗 |
-
|
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
(
Poster
)
>
|
Lorenz Kuhn · Yarin Gal · Sebastian Farquhar 🔗 |
-
|
On Outlier Exposure with Generative Models
(
Poster
)
>
|
Konstantin Kirchheim · Frank Ortmeier 🔗 |
-
|
An Efficient Framework for Monitoring Subgroup Performance of Machine Learning Systems
(
Poster
)
>
|
Huong Ha 🔗 |
-
|
Spectral Robustness Analysis of Deep Imitation Learning
(
Poster
)
>
|
Ezgi Korkmaz 🔗 |
-
|
Interpretable Reward Learning via Differentiable Decision Trees
(
Poster
)
>
|
Akansha Kalra · Daniel S. Brown 🔗 |
-
|
Steering Large Language Models using APE
(
Poster
)
>
|
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Keiran Paster · Silviu Pitis · Harris Chan · Jimmy Ba 🔗 |
-
|
A Multi-Level Framework for the AI Alignment Problem
(
Poster
)
>
|
Betty L Hou · Brian Green 🔗 |
-
|
Error Resilient Deep Neural Networks using Neuron Gradient Statistics
(
Poster
)
>
|
Chandramouli Amarnath · Abhijit Chatterjee · Kwondo Ma · Mohamed Mejri 🔗 |
-
|
Aligning Robot Representations with Humans
(
Poster
)
>
|
Andreea Bobu · Andi Peng · Pulkit Agrawal · Julie A Shah · Anca Dragan 🔗 |
-
|
Deep Reinforcement Learning Policies Learn Shared Adversarial Directions Across MDPs
(
Poster
)
>
|
Ezgi Korkmaz 🔗 |
-
|
Instance-Aware Observer Network for Out-of-Distribution Object Segmentation
(
Poster
)
>
|
Victor Besnier · Andrei Bursuc · Alexandre Briot · David Picard 🔗 |
-
|
A general framework for reward function distances
(
Poster
)
>
|
Erik Jenner · Joar Skalse · Adam Gleave 🔗 |
-
|
Certifiable Robustness Against Patch Attacks Using an ERM Oracle
(
Poster
)
>
|
Kevin Stangl · Avrim Blum · Omar Montasser · Saba Ahmadi 🔗 |
-
|
On the Adversarial Robustness of Vision Transformers
(
Poster
)
>
|
Rulin Shao · Zhouxing Shi · Jinfeng Yi · Pin-Yu Chen · Cho-Jui Hsieh 🔗 |
-
|
Unified Probabilistic Neural Architecture and Weight Ensembling Improves Model Robustness
(
Poster
)
>
|
Sumegha Premchandar · Sanket Jantre · Prasanna Balaprakash · Sandeep Madireddy 🔗 |
-
|
All’s Well That Ends Well: Avoiding Side Effects with Distance-Impact Penalties
(
Poster
)
>
|
Charlie Griffin · Joar Skalse · Lewis Hammond · Alessandro Abate 🔗 |
-
|
System III: Learning with Domain Knowledge for Safety Constraints
(
Poster
)
>
|
Fazl Barez · Hosein Hasanbeig · Alessandro Abate 🔗 |
-
|
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?
(
Poster
)
>
|
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal 🔗 |
-
|
Boundary Adversarial Examples Against Adversarial Overfitting
(
Poster
)
>
|
Muhammad Zaid Hameed · Beat Buesser 🔗 |
-
|
Two-Turn Debate Does Not Help Humans Answer Hard Reading Comprehension Questions
(
Poster
)
>
|
Alicia Parrish · Harsh Trivedi · Nikita Nangia · Jason Phang · Vishakh Padmakumar · Amanpreet Singh Saimbhi · Samuel Bowman 🔗 |
-
|
Panning for Gold in Federated Learning: Targeted Text Extraction under Arbitrarily Large-Scale Aggregation
(
Poster
)
>
|
Hong-Min Chu · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein 🔗 |
-
|
Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety
(
Poster
)
>
|
Josh Albrecht · Ellie Kitanidis · Abraham Fetterman 🔗 |
-
|
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
(
Poster
)
>
|
Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt 🔗 |
-
|
From plane crashes to algorithmic harm: applicability of safety engineering frameworks for responsible ML
(
Poster
)
>
|
Shalaleh Rismani · Renee Shelby · Andrew Smart · Edgar Jatho · Joshua Kroll · AJung Moon · Negar Rostamzadeh 🔗 |
-
|
Best of Both Worlds: Towards Adversarial Robustness with Transduction and Rejection
(
Poster
)
>
|
Nils Palumbo · Yang Guo · Xi Wu · Jiefeng Chen · Yingyu Liang · Somesh Jha 🔗 |
-
|
c-MBA: Adversarial Attack for Cooperative MARL Using Learned Dynamics Model
(
Poster
)
>
|
Nhan H Pham · Lam Nguyen · Jie Chen · Thanh Lam Hoang · Subhro Das · Lily Weng 🔗 |
-
|
Adversarial Attacks on Feature Visualization Methods
(
Poster
)
>
|
Michael Eickenberg · Eugene Belilovsky · Jonathan Marty 🔗 |
-
|
Embedding Reliability: On the Predictability of Downstream Performance
(
Poster
)
>
|
Shervin Ardeshir · Navid Azizan 🔗 |
-
|
On The Fragility of Learned Reward Functions
(
Poster
)
>
|
Lev McKinney · Yawen Duan · David Krueger · Adam Gleave 🔗 |
-
|
Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models
(
Poster
)
>
|
Liam Fowl · Jonas Geiping · Steven Reich · Yuxin Wen · Wojciech Czaja · Micah Goldblum · Tom Goldstein 🔗 |
-
|
Adversarial Robustness for Tabular Data through Cost and Utility Awareness
(
Poster
)
>
|
Klim Kireev · Bogdan Kulynych · Carmela Troncoso 🔗 |
-
|
Epistemic Side Effects & Avoiding Them (Sometimes)
(
Poster
)
>
|
Toryn Klassen · Parand Alizadeh Alamdari · Sheila McIlraith 🔗 |
-
|
Improving the Robustness of Conditional Language Models by Detecting and Removing Input Noise
(
Poster
)
>
|
Kundan Krishna · Yao Zhao · Jie Ren · Balaji Lakshminarayanan · Jiaming Luo · Mohammad Saleh · Peter Liu 🔗 |
-
|
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
(
Poster
)
>
|
Stephen Casper · Kaivalya Hariharan · Dylan Hadfield-Menell 🔗 |
-
|
On Representation Learning Under Class Imbalance
(
Poster
)
>
|
Ravid Shwartz-Ziv · Micah Goldblum · Yucen Li · C. Bayan Bruss · Andrew Gordon Wilson 🔗 |
-
|
Neural Autoregressive Refinement for Self-Supervised Anomaly Detection in Accelerator Physics
(
Poster
)
>
|
Jiaxin Zhang 🔗 |
-
|
Robust Representation Learning for Group Shifts and Adversarial Examples
(
Poster
)
>
|
Ming-Chang Chiu · Xuezhe Ma 🔗 |
-
|
DP-InstaHide: Data Augmentations Provably Enhance Guarantees Against Dataset Manipulations
(
Poster
)
>
|
Eitan Borgnia · Jonas Geiping · Valeriia Cherepanova · Liam Fowl · Arjun Gupta · Amin Ghiasi · Furong Huang · Micah Goldblum · Tom Goldstein 🔗 |
-
|
Geometric attacks on batch normalization
(
Poster
)
>
|
Amur Ghose · Apurv Gupta · Yaoliang Yu · Pascal Poupart 🔗 |
-
|
Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes
(
Poster
)
>
|
Sina Baharlouei · Fatemeh Sheikholeslami · Meisam Razaviyayn · J. Zico Kolter 🔗 |
-
|
On the Abilities of Mathematical Extrapolation with Implicit Models
(
Poster
)
>
|
Juliette Decugis · Alicia Tsai · Ashwin Ganesh · Max Emerling · Laurent El Ghaoui 🔗 |
-
|
Netflix and Forget: Fast Severance From Memorizing Training Data in Recommendations
(
Poster
)
>
|
Xinlei XU · Jiankai Sun · Xin Yang · Yuanshun Yao · Chong Wang 🔗 |
-
|
Evaluating Worst Case Adversarial Weather Perturbations Robustness
(
Poster
)
>
|
Yihan Wang · Yunhao Ba · Howard Zhang · Huan Zhang · Achuta Kadambi · Stefano Soatto · Alex Wong · Cho-Jui Hsieh 🔗 |
-
|
Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety
(
Poster
)
>
|
Allen Schmaltz · Danielle Rasooly 🔗 |
-
|
BAAT: Towards Sample-specific Backdoor Attack with Clean Labels
(
Poster
)
>
|
Yiming Li · Mingyan Zhu · Chengxiao Luo · Haiqing Weng · Yong Jiang · Tao Wei · Shu-Tao Xia 🔗 |
-
|
Avoiding Calvinist Decision Traps using Structural Causal Models
(
Poster
)
>
|
Arvind Raghavan 🔗 |
-
|
Out-Of-Distribution Detection Is Not All You Need
(
Poster
)
>
|
Joris Guerin · Kevin Delmas · Raul S Ferreira · Jérémie Guiochet 🔗 |
-
|
Revisiting Robustness in Graph Machine Learning
(
Poster
)
>
|
Lukas Gosch · Daniel Sturm · Simon Geisler · Stephan Günnemann 🔗 |
-
|
Deep Reinforcement Learning Policies in the Frequency Domain
(
Poster
)
>
|
Ezgi Korkmaz 🔗 |
-
|
Assistance with large language models
(
Poster
)
>
|
Dmitrii Krasheninnikov · Egor Krasheninnikov · David Krueger 🔗 |
-
|
Policy Resilience to Environment Poisoning Attack on Reinforcement Learning
(
Poster
)
>
|
Hang Xu · Zinovi Rabinovich 🔗 |
-
|
Image recognition time for humans predicts adversarial vulnerability for models
(
Poster
)
>
|
David Mayo · Jesse Cummings · Xinyu Lin · Boris Katz · Andrei Barbu 🔗 |
-
|
Rational Multi-Objective Agents Must Admit Non-Markov Reward Representations
(
Poster
)
>
|
Silviu Pitis · Duncan Bailey · Jimmy Ba 🔗 |
-
|
Runtime Monitors for Operational Design Domains of Black-Box ML-Models
(
Poster
)
>
|
Hazem Torfah · Sanjit A. Seshia 🔗 |
-
|
Unifying Grokking and Double Descent
(
Poster
)
>
|
Xander Davies · Lauro Langosco · David Krueger 🔗 |
-
|
Assessing Robustness of Image Recognition Models to Changes in the Computational Environment
(
Poster
)
>
|
Nikolaos Louloudakis · Perry Gibson · José Cano · Ajitha Rajan 🔗 |
-
|
Fake It Until You Make It : Towards Accurate Near-Distribution Novelty Detection
(
Poster
)
>
|
Hossein Mirzaei · Mohammadreza Salehi · Sajjad Shahabi · Efstratios Gavves · Cees Snoek · Mohammad Sabokrou · Mohammad Hossein Rohban 🔗 |
-
|
Revisiting Hyperparameter Tuning with Differential Privacy
(
Poster
)
>
|
Youlong Ding · Xueyang Wu 🔗 |