Invited Talk
in
Workshop: Mathematics of Modern Machine Learning (M3L)
Flat Minima and Generalization: from Matrix Sensing to Neural Networks
Maryam Fazel
When do overparameterized neural networks avoid overfitting and generalize to unseen data? Empirical evidence suggests that the shape of the training loss function near the solution matters---the minima where the loss is “flatter” tend to lead to better generalization. Yet quantifying flatness and its rigorous analysis, even in simple models, has remained elusive.
In this talk, we examine overparameterized nonconvex models such as low-rank matrix recovery, matrix completion, robust PCA, and a 2-layer neural network as test cases. We show that under standard statistical assumptions, "flat" minima (minima with the smallest local average curvature, measured by the trace of the Hessian matrix) provably generalize in all these cases. These algorithm-agnostic results suggest a theoretical basis for favoring methods that bias iterates towards flat solutions, and help inform the design of better training algorithms.