Communication-Adaptive Stochastic Gradient Methods for Distributed Learning

This paper targets developing algorithms for solving distributed learning problems in a communication-efficient fashion, by generalizing the recent method of lazily aggregated gradient (LAG) to deal with stochastic gradient — justifying the name of the new method LASG. While LAG is effective at reducing communication without sacrificing the rate of convergence, we show it only works with deterministic gradients. We introduce new rules and analysis for LASG that are tailored for stochastic gradients, so it effectively saves downloads, uploads, or both for distributed stochastic gradient descent. LASG achieves impressive empirical performance — it typically saves total communication by an order of magnitude. LASG can be used together with gradient quantization to bring more savings.