Poster
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly
From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models
Zhuoshi Pan · Yuguang Yao · Gaowen Liu · Bingquan Shen · H. Vicky Zhao · Ramana Kompella · Sijia Liu
While state-of-the-art diffusion models (DMs) excel in image generation, concerns regarding their security persist. Earlier research highlighted DMs' vulnerability to backdoor attacks, but these studies placed stricter requirements than conventional methods like 'BadNets' in image classification. This is because the former necessitates modifications to the diffusion sampling and training procedures. Unlike the prior work, we investigate whether generating backdoor attacks in DMs can be as simple as BadNets, i.e., by only contaminating the training dataset without tampering the original diffusion process. In this more realistic backdoor setting, we uncover bilateral backdoor effects that not only serve an adversarial purpose (compromising the functionality of DMs) but also offer a defensive advantage (which can be leveraged for backdoor defense). On one hand, a BadNets-like backdoor attack remains effective in DMs for producing incorrect images that do not align with the intended text conditions. On the other hand, backdoored DMs exhibit an increased ratio of backdoor triggers, a phenomenon referred as 'trigger amplification', among the generated images. We show that the latter insight can be utilized to improve the existing backdoor detectors for the detection of backdoor-poisoned data points. Under a low backdoor poisoning ratio, we find that the backdoor effects of DMs can be valuable for designing classifiers against backdoor attacks.