Improved Communication Lower Bounds for Distributed Optimisation

Motivated by the interest in communication-efficient methods for distributed machine learning, we consider the communication complexity of minimising a sum of $d$-dimensional functions $\sum_{i = 1}^N f_i (x)$, where each function $f_i$ is held by a one of the $N$ different machines. Such tasks arise naturally in large-scale optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. As our main result, we show that $\Omega( Nd \log d / \varepsilon)$ bits in total need to be communicated between the machines to find an additive $\epsilon$-approximation to the minimum of $\sum_{i = 1}^N f_i (x)$. The results holds for deterministic algorithms, and randomised algorithms under some restrictions on the parameter values. Importantly, our lower bounds require no assumptions on the structure of the algorithm, and are matched within constant factors for strongly convex objectives by a new variant of quantised gradient descent. The lower bounds are obtained by bringing over tools from communication complexity to distributed optimisation, an approach we hope will find further use in future.

[1]  David P. Woodruff,et al.  When Distributed Computation Is Communication Expensive , 2013, DISC.

[2]  Fabian Kuhn,et al.  On the power of the congested clique model , 2014, PODC.

[3]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[4]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[5]  Dan Alistarh,et al.  Distributed Mean Estimation with Optimal Error Bounds , 2020, ArXiv.

[6]  Mark Braverman,et al.  Tight Bounds for Set Disjointness in the Message Passing Model , 2013, ArXiv.

[7]  The Communication Complexity of Optimization , 2020, SODA.

[8]  John N. Tsitsiklis,et al.  Communication complexity of convex optimization , 1986, 1986 25th IEEE Conference on Decision and Control.

[9]  Ohad Shamir,et al.  Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.

[10]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[11]  Nathan Srebro,et al.  Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization , 2018, NeurIPS.

[12]  Tengyu Ma,et al.  On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.

[13]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[14]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[15]  Torsten Hoefler,et al.  Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .

[16]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[17]  Dan Alistarh,et al.  New Bounds For Distributed Mean Estimation and Variance Reduction , 2021, ICLR.

[18]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[19]  Danny Dolev,et al.  Determinism vs. Nondeterminism in Multiparty Communication Complexity , 1992, SIAM J. Comput..

[20]  David P. Woodruff,et al.  When distributed computation is communication expensive , 2013, Distributed Computing.

[21]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[22]  Ji Liu,et al.  DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.

[23]  Keren Censor-Hillel,et al.  Near-Linear Lower Bounds for Distributed Distance Computations, Even in Sparse Networks , 2016, DISC.

[24]  Na Li,et al.  On Maintaining Linear Convergence of Distributed Learning and Optimization under Limited Communication , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[25]  Martin Jaggi,et al.  Fully Quantized Distributed Gradient Descent , 2018 .

[26]  Andrew Chi-Chih Yao,et al.  Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[27]  Amit Chakrabarti,et al.  An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance , 2012, SIAM J. Comput..

[28]  Qin Zhang,et al.  Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy , 2011, SIAM J. Comput..

[29]  Ananda Theertha Suresh,et al.  Distributed Mean Estimation with Limited Communication , 2016, ICML.

[30]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..