NeurIPS Poster Convergence of Adversarial Training in Overparametrized Neural Networks

Poster

Convergence of Adversarial Training in Overparametrized Neural Networks

Ruiqi Gao · Tianle Cai · Haochuan Li · Cho-Jui Hsieh · Liwei Wang · Jason Lee

East Exhibition Hall B, C #115

Keywords: [ Algorithms ] [ Adversarial Learning ] [ Applications -> Privacy, Anonymity, and Security; Optimization -> Non-Convex Optimization; Theory; Theory ] [ Hardness of Learni ]

[ Abstract ]

Abstract: Neural networks are vulnerable to adversarial examples, i.e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network. Adversarial training \cite{madry2017towards}, a heuristic form of robust optimization that alternates between minimization and maximization steps, has proven to be among the most successful methods to train networks to be robust against a pre-defined family of perturbations. This paper provides a partial answer to the success of adversarial training, by showing that it converges to a network where the surrogate loss with respect to the the attack algorithm is within $\epsilon$ of the optimal robust loss. Then we show that the optimal robust loss is also close to zero, hence adversarial training finds a robust classifier. The analysis technique leverages recent work on the analysis of neural networks via Neural Tangent Kernel (NTK), combined with motivation from online-learning when the maximization is solved by a heuristic, and the expressiveness of the NTK kernel in the $\ell_\infty$-norm. In addition, we also prove that robust interpolation requires more model capacity, supporting the evidence that adversarial training requires wider networks.

Live content is unavailable. Log in and register to view live content