Workshop
Backdoors in Deep Learning: The Good, the Bad, and the Ugly
Khoa D Doan · Aniruddha Saha · Anh Tran · Yingjie Lao · Kok-Seng Wong · Ang Li · HARIPRIYA HARIKUMAR · Eugene Bagdasaryan · Micah Goldblum · Tom Goldstein
Room 203 - 205
Fri 15 Dec, 7 a.m. PST
Deep neural networks (DNNs) are revolutionizing almost all AI domains and have become the core of many modern AI systems. While having superior performance compared to classical methods, DNNs are also facing new security problems, such as adversarial and backdoor attacks, that are hard to discover and resolve due to their black-box-like property. Backdoor attacks, particularly, are a brand-new threat that was only discovered in 2017 but has gained attention quickly in the research community. The number of backdoor-related papers grew from 21 to around 110 after only one year (2019-2020). In 2022 alone, there were more than 200 papers on backdoor learning, showing a high research interest in this domain.Backdoor attacks are possible because of insecure model pretraining and outsourcing practices. Due to the complexity and the tremendous cost of collecting data and training models, many individuals/companies just employ models or training data from third parties. Malicious third parties can add backdoors into their models or poison their released data before delivering it to the victims to gain illegal benefits. This threat seriously damages the safety and trustworthiness of AI development. Lately, many studies on backdoor attacks and defenses have been conducted to prevent this critical vulnerability.While most works consider backdoor ``evil'', some studies exploit it for good purposes. A popular approach is to use the backdoor as a watermark to detect illegal use of commercialized data/models. A few works employ the backdoor as a trapdoor for adversarial defense. Learning the working mechanism of backdoor also elevates a deeper understanding of how deep learning models work.This workshop is designed to provide a comprehensive understanding of the current state of backdoor research. We also want to raise awareness of the AI community on this important security problem, and motivate researchers to build safe and trustful AI systems.
Schedule
Fri 7:00 a.m. - 7:30 a.m.
|
A Blessing in Disguise: Backdoor Attacks as Watermarks for Dataset Copyright Protection
(
Invited Talk
)
>
SlidesLive Video |
Yiming Li 🔗 |
Fri 7:30 a.m. - 8:00 a.m.
|
Recent Advances in Backdoor Defense and Benchmark
(
Invited Talk
)
>
SlidesLive Video |
Baoyuan Wu 🔗 |
Fri 8:00 a.m. - 8:30 a.m.
|
COFFEE BREAK
(
COFFEE BREAK
)
>
|
🔗 |
Fri 8:30 a.m. - 9:00 a.m.
|
Invited Talk
(
Invited Talk
)
>
SlidesLive Video |
Jonas Geiping 🔗 |
Fri 9:00 a.m. - 9:15 a.m.
|
Effective Backdoor Mitigation Depends on the Pre-training Objective
(
Oral
)
>
link
SlidesLive Video |
Sahil Verma · Gantavya Bhatt · Soumye Singhal · Arnav Das · Chirag Shah · John Dickerson · Jeff A Bilmes 🔗 |
Fri 9:15 a.m. - 9:45 a.m.
|
Universal jailbreak backdoors from poisoned human feedback
(
Invited Talk
)
>
SlidesLive Video |
Florian Tramer 🔗 |
Fri 9:45 a.m. - 11:00 a.m.
|
LUNCH BREAK
(
LUNCH BREAK
)
>
|
🔗 |
Fri 11:00 a.m. - 11:15 a.m.
|
VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models
(
Oral
)
>
link
SlidesLive Video |
Sheng-Yen Chou · Pin-Yu Chen · Tsung-Yi Ho 🔗 |
Fri 11:15 a.m. - 11:30 a.m.
|
The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline ( Oral ) > link | Haonan Wang · Qianli Shen · Yao Tong · Yang Zhang · Kenji Kawaguchi 🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Is this model mine? On stealing and defending machine learning models.
(
Invited Talk
)
>
SlidesLive Video |
Adam Dziedzic 🔗 |
Fri 12:00 p.m. - 12:30 p.m.
|
Invited Talk
(
Invited Talk
)
>
SlidesLive Video |
Ruoxi Jia 🔗 |
Fri 12:30 p.m. - 1:00 p.m.
|
COFFEE BREAK
(
COFFEE BREAK
)
>
|
🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
On the Limitation of Backdoor Detection Methods ( Poster ) > link | Georg Pichler · Marco Romanelli · Divya Prakash Manivannan · Prashanth Krishnamurthy · Farshad Khorrami · Siddharth Garg 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
How to remove backdoors in diffusion models? ( Poster ) > link |
11 presentersShengwei An · Sheng-Yen Chou · Kaiyuan Zhang · Qiuling Xu · Guanhong Tao · Guangyu Shen · Siyuan Cheng · Shiqing Ma · Pin-Yu Chen · Tsung-Yi Ho · Xiangyu Zhang |
Fri 1:00 p.m. - 1:45 p.m.
|
Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning ( Poster ) > link | Taejin Kim · Jiarui Li · Nikhil Madaan · Shubhranshu Singh · Carlee Joe-Wong 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
How to Backdoor HyperNetwork in Personalized Federated Learning? ( Poster ) > link | Phung Lai · Hai Phan · Issa Khalil · Abdallah Khreishah · Xintao Wu 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Universal Trojan Signatures in Reinforcement Learning ( Poster ) > link | Manoj Acharya · Weichao Zhou · Anirban Roy · Xiao Lin · Wenchao Li · Susmit Jha 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Analyzing And Editing Inner Mechanisms of Backdoored Language Models ( Poster ) > link | Max Lamparth · Ann-Katrin Reuel 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Detecting Backdoors with Meta-Models ( Poster ) > link | Lauro Langosco · Neel Alex · William Baker · David Quarel · Herbie Bradley · David Krueger 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Benchmark Probing: Investigating Data Leakage in Large Language Models ( Poster ) > link | Chunyuan Deng · Yilun Zhao · Xiangru Tang · Mark Gerstein · Arman Cohan 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data ( Poster ) > link | Lukas Struppek · Martin Bernhard Hentschel · Clifton Poth · Dominik Hintersdorf · Kristian Kersting 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
$D^3$: Detoxing Deep Learning Dataset ( Poster ) > link | Lu Yan · Siyuan Cheng · Guangyu Shen · Guanhong Tao · Xuan Chen · Kaiyuan Zhang · Yunshu Mao · Xiangyu Zhang 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Defending Our Privacy With Backdoors ( Poster ) > link | Dominik Hintersdorf · Lukas Struppek · Daniel Neider · Kristian Kersting 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Clean-label Backdoor Attacks by Selectively Poisoning with Limited Information from Target Class ( Poster ) > link | Nguyen Hung-Quang · Ngoc-Hieu Nguyen · The Anh Ta · Thanh Nguyen-Tang · Hoang Thanh-Tung · Khoa D Doan 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection ( Poster ) > link | Saket Sanjeev Chaturvedi · Lan Zhang · Wenbin Zhang · Pan He · Xiaoyong Yuan 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks ( Poster ) > link | Shuli Jiang · Swanand Kadhe · Yi Zhou · Ling Cai · Nathalie Baracaldo 🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models ( Poster ) > link | Zhuoshi Pan · Yuguang Yao · Gaowen Liu · Bingquan Shen · H. Vicky Zhao · Ramana Kompella · Sijia Liu 🔗 |
Fri 1:45 p.m. - 2:00 p.m.
|
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
(
Oral
)
>
link
SlidesLive Video |
Jun Yan · Vikas Yadav · Shiyang Li · Lichang Chen · Zheng Tang · Hai Wang · Vijay Srinivasan · Xiang Ren · Hongxia Jin 🔗 |
Fri 2:00 p.m. - 2:15 p.m.
|
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
(
Oral
)
>
link
SlidesLive Video |
Zhen Xiang · Fengqing Jiang · Zidi Xiong · Bhaskar Ramasubramanian · Radha Poovendran · Bo Li 🔗 |
Fri 2:15 p.m. - 2:45 p.m.
|
Decoding Backdoors in LLMs and Their Implications
(
Invited Talk
)
>
SlidesLive Video |
Bo Li 🔗 |
Fri 2:45 p.m. -
|
PANEL DISCUSSION
(
PANEL DISCUSSION
)
>
SlidesLive Video |
🔗 |