NeurIPS Flat Minima and Generalization: from Matrix Sensing to Neural Networks

Invited Talk
in
Workshop: Mathematics of Modern Machine Learning (M3L)

Flat Minima and Generalization: from Matrix Sensing to Neural Networks

Maryam Fazel

[ Abstract ]

Sat 14 Dec 9 a.m. PST — 9:45 a.m. PST

Abstract:

When do overparameterized neural networks avoid overfitting and generalize to unseen data? Empirical evidence suggests that the shape of the training loss function near the solution matters---the minima where the loss is “flatter” tend to lead to better generalization. Yet quantifying flatness and its rigorous analysis, even in simple models, has remained elusive.

In this talk, we examine overparameterized nonconvex models such as low-rank matrix recovery, matrix completion, robust PCA, and a 2-layer neural network as test cases. We show that under standard statistical assumptions, "flat" minima (minima with the smallest local average curvature, measured by the trace of the Hessian matrix) provably generalize in all these cases. These algorithm-agnostic results suggest a theoretical basis for favoring methods that bias iterates towards flat solutions, and help inform the design of better training algorithms.

Chat is not available.

Invited Talk in Workshop: Mathematics of Modern Machine Learning (M3L)

Flat Minima and Generalization: from Matrix Sensing to Neural Networks

Maryam Fazel

Invited Talk
in
Workshop: Mathematics of Modern Machine Learning (M3L)