Poster
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly
$D^3$: Detoxing Deep Learning Dataset
Lu Yan · Siyuan Cheng · Guangyu Shen · Guanhong Tao · Xuan Chen · Kaiyuan Zhang · Yunshu Mao · Xiangyu Zhang
Abstract:
Data poisoning is a prominent threat to Deep Learning applications. In backdoor attack, training samples are poisoned with a specific input pattern or transformation called trigger such that the trained model misclassifies in the presence of trigger.Despite a broad spectrum of defense techniques against data poisoning and backdoor attacks, these defenses are often outpaced by the increasing complexity and sophistication of attacks. In response to this growing threat, this paper introduces $D^3$, a novel dataset detoxification technique that leverages differential analysis methodology to extract triggers from compromised test samples captured in the wild. Specifically, we formulate the challenge of poison extraction as a constrained optimization problem and use iterative gradient descent with semantic restrictions. Upon successful extraction, $D^3$ enhances the dataset by incorporating the poison into clean validation samples and builds a classifier to separate clean and poisoned training samples. This post-mortem approach provides a robust complement to existing defenses, particularly when they fail to detect complex, stealthy poisoning attacks. $D^3$ is evaluated on 42 poisoned datasets with 18 different types of poisons, including the subtle clean-label poisoning, dynamic attack, and input-aware attack. It achieves over 95\% precision and 95\% recall on average, substantially outperforming the state-of-the-art.
Chat is not available.