Poster
in
Workshop: Workshop on Machine Learning Safety
Certifiable Robustness Against Patch Attacks Using an ERM Oracle
Kevin Stangl · Avrim Blum · Omar Montasser · Saba Ahmadi
Consider patch attacks, where at test-time an adversary manipulates a test image with a patch in order to induce a targeted mis-classification. We consider a recent defense to patch attacks, Patch-Cleanser (Xiang et al., 2022). The Patch-Cleanser algorithm requires a prediction model to have a “two-mask correctness” property, meaning that the prediction model should correctly classify any image whenany two blank masks replace portions of the image. To this end, Xiang et al. (2022) learn a prediction model to be robust to two-mask operations by augmenting the training set by adding pairs of masks at random locations of training images, and performing empirical risk minimization (ERM) on the augmented dataset. However, in the non-realizable setting when no predictor is perfectly correct on all two-mask operations on all images, we exhibit an example where ERM fails. To overcome this challenge, we propose a different algorithm that provably learns a predictor robust to all two-mask operations using an ERM oracle, based on prior work by Feige et al. (2015a) .