Poster
in
Workshop: Heavy Tails in ML: Structure, Stability, Dynamics
High Probability Guarantees for Random Reshuffling
Hengxu Yu · Xiao Li
Keywords: [ high-probability sample complexity ] [ stopping criterion ] [ shuffled SGD ] [ Random Reshuffling ] [ the last iteration result ]
We study the probabilistic behaviors of stochastic gradient descent with random reshuffling (RR) on nonconvex problems. We prove that the same complexity (except for a logrithmic term) as that of in expectation case also holds with high probability, which characterizes the performance of RR for a single run instead of averaging infinitely many realizations. Our analysis does not impose any additional assumptions on the stochastic gradient errors, which admits heavy tails. This is in contrast to high probabiltiy analyses of SGD that rely on sub-Gaussian stochastic gradient errors or tricks like clipping, momentum, etc. Furthermore, leveraging the established high probability error bounds, we propose a simple stopping criterion for RR that introduces few computational costs. We prove that the function value strictly decreases with a high probability before the stopping criterion is triggered, ensuring that the criterion will indeed be activated. Finally, a "last iterate'' result is built for the iteration returned with this stopping criterion. We believe that our new developments for RR serve as a stepping stone towards enabling more refined high probability analyses for characterizing its performance.