Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Optimization for ML Workshop

On Convergence of SGD with Adaptive Clipping

Egor Shulgin · Peter Richtarik


Abstract:

Stochastic Gradient Descent (SGD) with gradient clipping has emerged as a powerful technique for stabilizing neural network training and enabling differentially private optimization. While constant clipping has been extensively studied, adaptive methods like quantile clipping have shown empirical success without thorough theoretical understanding. This paper provides the first comprehensive analysis of SGD with gradient quantile clipping (QC-SGD). We demonstrate that QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but show this can be mitigated through a carefully designed quantile and step size schedule. Our analysis reveals a critical interplay between quantile values and step sizes. We establish theoretical foundations for this widely-used heuristic and identify open problems to guide future research.

Chat is not available.