Poster
in
Workshop: OPT 2023: Optimization for Machine Learning
Noise Injection Irons Out Local Minima and Saddle Points
Konstantin Mishchenko · Sebastian Stich
Abstract:
Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. It has been observed in practice, that injecting artificial noise into stochastic gradient descent (SGD) can sometimes improve training and generalization performance.In this work, we formalize noise injection as a smoothing operator and (review and derive) convergence guarantees of SGD under smoothing. We empirically found that Gaussian smoothing works really well for training two-layer neural networks, but these findings to not translate to deeper nets. We would like to use this contribution to stimulate a discussion in the community to further investigate the impact of noise in training machine learning models.
Chat is not available.