Poster
in
Workshop: Workshop on Machine Learning Safety
Towards Adversarial Purification using Denoising AutoEncoders
Dvij Kalaria · Aritra Hazra · Partha Chakrabarti
With the rapid advancement and increased use of deep learning models in image identification, security becomes a major concern to their deployment in safety-critical systems. The deep learning architectures are often susceptible to adversarial attacks which are often obtained by making subtle perturbations to normal images, which are mostly imperceptible to humans, but can seriously confuse the state-of-the-art machine learning models. We propose a framework, named APuDAE, leveraging Denoising AutoEncoders (DAEs) to purify these samples by using them in an adaptive way and thus improve the classification accuracy of the target classifier networks. We also show how using DAEs adaptively instead directly, improves classification accuracy further and is more robust to the possibility of designing adaptive attacks to fool them. We demonstrate our results over MNIST, CIFAR-10, ImageNet dataset and show how our framework APuDAE provides comparable and in most cases better performance to the baseline methods in purifying adversaries.