Poster
in
Workshop: 3rd Workshop on New Frontiers in Adversarial Machine Learning (AdvML-Frontiers)
Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness
Stanislav Fort · Balaji Lakshminarayanan
Keywords: [ Adversarial Robustness ]
Adversarial examples pose a significant challenge to the robustness, reliability, and alignment of deep neural networks. We propose a novel, easy-to-use approach to achieving high-quality representations that lead to adversarial robustness through the use of multi-resolution input representations and dynamic self-ensembling of intermediate layer predictions. We demonstrate that intermediate layer predictions exhibit inherent robustness to adversarial attacks crafted to fool the full classifier and propose a robust aggregation mechanism based on Vickrey auction, which we call CrossMax, to dynamically ensemble them.By combining multi-resolution inputs and robust ensembling, we achieve significant adversarial robustness on CIFAR-10 and CIFAR-100 datasets without any adversarial training or extra data, reaching an adversarial accuracy of approximately 72% (CIFAR-10) and 48% (CIFAR-100) on the RobustBench AutoAttack suite (L_{\infty} = 8/255) with a finetuned ImageNet-pretrained ResNet152. This result is comparable with the top three models on CIFAR-10 and represents a +5% gain compared to the best current dedicated approach on CIFAR-100. Adding simple adversarial training on top, we achieve ≈78% on CIFAR-10 and ≈51% on CIFAR-100, improving the state-of-the-art by 5% and 9%, respectively, with greater gains on the more challenging dataset.We validate our approach through extensive experiments and provide insights into the interplay between adversarial robustness and the hierarchical nature of deep representations. Additionally, we show that simple gradient-based attacks against our model produce human-interpretable images of the target classes as well as interpretable image changes. As a byproduct, using our multi-resolution prior, we turn pre-trained classifiers and CLIP models into controllable image generators and develop successful transferable attacks on large vision-language models.