Poster
in
Workshop: OPT 2023: Optimization for Machine Learning
A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning
Haoxiang Wang · Zhanhong Jiang · Chao Liu · Soumik Sarkar · Dongxiang Jiang · Young Lee
In this paper, we propose a new algorithm, termed Predicting Clipping Asynchronous Stochastic Gradient Descent (aka, PC-ASGD) to address the issue of staleness and time delay in asynchronous distributed learning settings. Specifically, PC-ASGD has two steps - the predicting step leverages the gradient prediction using Taylor expansion to reduce the staleness of the outdated weights whilethe clipping step selectively drops the outdated weights to alleviate their negative effects. A tradeoff parameter is introduced to balance the effects between these two steps. We theoretically present the convergence rate considering the effects of delay of the proposed algorithm with constant step size when the smooth objective functions are nonconvex. For empirical validation, we demonstrate the performance of the algorithm with two deep neural network architectures on two benchmark datasets.