Poster
in
Workshop: OPT 2021: Optimization for Machine Learning
Adam vs. SGD: Closing the generalization gap on image classification
Aman Gupta · Rohan Ramanath · Jun Shi · Sathiya Keerthi
Adam is an adaptive deep neural network training optimizer that has been widely used across a variety of applications. However, on image classification problems, its generalization performance is significantly worse than stochastic gradient descent (SGD). By tuning several inner hyperparameters of Adam, it is possible to lift the performance of Adam and close this gap; but this makes the use of Adam computationally expensive. In this paper, we use a new training approach based on layer-wise weight normalization (LAWN) to solidly improve Adam's performance and close the gap with SGD. LAWN also helps reduce the impact of batch size on Adam's performance. With speed in tact and performance vastly improved, the Adam-LAWN combination becomes an attractive optimizer for use in image classification.