Poster
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
Ionut-Vlad Modoranu · Mher Safaryan · Grigory Malinovsky · Eldar Kurtić · Thomas Robert · Peter Richtarik · Dan Alistarh
We propose a new variant of the Adam optimizer called microAdam that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. This is achieved by compressing the gradient information before it is fed into the optimizer state, while controlling the compression error via a new instance of the classical \emph{error feedback} mechanism from distributed optimization in which \emph{the error correction information is itself compressed}. We show that the resulting approach maintains convergence guarantees competitive to those of AMSGrad, while providing strong practical performance. Specifically, we provide an efficient GPU implementation, and show that, on both BERT- and LLaMA-family models, microAdam provides practical convergence competitive to that of the uncompressed Adam baseline, with considerably lower memory usage.
Live content is unavailable. Log in and register to view live content