Poster
Communication Compression for Decentralized Training
Hanlin Tang · Shaoduo Gan · Ce Zhang · Tong Zhang · Ji Liu
Room 210 #74
Keywords: [ Optimization ] [ Non-Convex Optimization ]
[
Abstract
]
Abstract:
Optimizing distributed learning systems is an art
of balancing between computation and communication.
There have been two lines of research that try to
deal with slower networks: {\em communication
compression} for
low bandwidth networks, and {\em decentralization} for
high latency networks. In this paper, We explore
a natural question: {\em can the combination
of both techniques lead to
a system that is robust to both bandwidth
and latency?}
Although the system implication of such combination
is trivial, the underlying theoretical principle and
algorithm design is challenging: unlike centralized algorithms, simply compressing
{\rc exchanged information,
even in an unbiased stochastic way,
within the decentralized network would accumulate the error and cause divergence.}
In this paper, we develop
a framework of quantized, decentralized training and
propose two different strategies, which we call
{\em extrapolation compression} and {\em difference compression}.
We analyze both algorithms and prove
both converge at the rate of $O(1/\sqrt{nT})$
where $n$ is the number of workers and $T$ is the
number of iterations, matching the convergence rate for
full precision, centralized training. We validate
our algorithms and find that our proposed algorithm outperforms
the best of merely decentralized and merely quantized
algorithm significantly for networks with {\em both}
high latency and low bandwidth.
Live content is unavailable. Log in and register to view live content