Poster
in
Workshop: OPT 2021: Optimization for Machine Learning
Escaping Local Minima With Stochastic Noise
Harshvardhan Harshvardhan · Sebastian Stich
With the recent advancements in Deep Learning, non-convex problems have received wide spread attention. However, while stochastic gradient descent (SGD) can often successfully optimize these non-convex models in practice, the theoretical analysis---mapping SGD to Gradient Descent (GD) in expectation---cannot explain global convergence, as GD gets trapped in local minima and stationary points. We introduce a novel proof framework to analyze stochastic methods on a class of structured non-convex functions. We prove that SGD converges linearly to approximate global minima on non-convex functions that are `close' to a convex-like (strongly convex or P\L) function when its iterates are perturbed with stochastic noise of appropriate strength (smoothing). Our analysis applies to a strictly larger class of functions than studied in prior work. We demonstrate that special cases of smoothing are equivalent to vanilla SGD in expectation, which explains why SGD can escape local minima in practice.