Linear Convergent Decentralized Optimization with Compression

Communication compression has been extensively adopted to speed up large-scale distributed optimization. However, most existing decentralized algorithms with compression are unsatisfactory in terms of convergence rate and stability. In this paper, we delineate two key obstacles in the algorithm design -- data heterogeneity and compression error. Our attempt to explicitly overcome these obstacles leads to a novel decentralized algorithm named LEAD. This algorithm is the first \underline{L}in\underline{EA}r convergent \underline{D}ecentralized algorithm with communication compression. Our theory describes the coupled dynamics of the inaccurate model propagation and optimization process. We also provide the first consensus error bound without assuming bounded gradients. Empirical experiments validate our theoretical analysis and show that the proposed algorithm achieves state-of-the-art computation and communication efficiency.

[1]  Tao Yang,et al.  Minimum-Time Consensus-Based Approach for Power System Applications , 2016, IEEE Transactions on Industrial Electronics.

[2]  Peter Richtárik,et al.  SGD: General Analysis and Improved Rates , 2019, ICML 2019.

[3]  Aryan Mokhtari,et al.  Robust and Communication-Efficient Collaborative Learning , 2019, NeurIPS.

[4]  Peter Richtárik,et al.  Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.

[5]  Christopher De Sa,et al.  Moniqua: Modulo Quantized Communication in Decentralized SGD , 2020, ICML.

[6]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[7]  Ji Liu,et al.  DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.

[8]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[9]  Ji,et al.  DeepSqueeze : Decentralization Meets Error-Compensated Compression , 2019 .

[10]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  R. Olfati-Saber,et al.  Consensus Filters for Sensor Networks and Distributed Sensor Fusion , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[12]  Junzhou Huang,et al.  Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.

[13]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[14]  Peter Richtárik,et al.  SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.

[15]  Ruggero Carli,et al.  Gossip consensus algorithms via quantized communication , 2009, Autom..

[16]  Ming Yan,et al.  On the Linear Convergence of Two Decentralized Algorithms , 2019, Journal of Optimization Theory and Applications.

[17]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[18]  Wei Shi,et al.  A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates , 2017, IEEE Transactions on Signal Processing.

[19]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[20]  Martin Jaggi,et al.  Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.

[21]  Xiaorui Liu,et al.  A Double Residual Compression Algorithm for Efficient Distributed Learning , 2019, AISTATS.

[22]  Hanlin Tang,et al.  Communication Compression for Decentralized Training , 2018, NeurIPS.

[23]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[24]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[25]  Martin Jaggi,et al.  Sparsified SGD with Memory , 2018, NeurIPS.

[26]  Martin Jaggi,et al.  Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.

[27]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[28]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[29]  Kamyar Azizzadenesheli,et al.  signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[30]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.