Communication Compression for Decentralized Nonconvex Optimization

This paper considers decentralized nonconvex optimization with the cost functions being distributed over agents. Noting that information compression is a key tool to reduce the heavy communication load for decentralized algorithms as agents iteratively communicate with neighbors, we propose three decentralized primal–dual algorithms with compressed communication. The first two algorithms are applicable to a general class of compressors with bounded relative compression error and the third algorithm is suitable for two general classes of compressors with bounded absolute compression error. We show that the proposed decentralized algorithms with compressed communication have comparable convergence properties as state-of-the-art algorithms without communication compression. Specifically, we show that they can find first-order stationary points with sublinear convergence rate O(1/T ) when each local cost function is smooth, where T is the total number of iterations, and find global optima with linear convergence rate under an additional condition that the global cost function satisfies the Polyak–Łojasiewicz condition. Numerical simulations are provided to illustrate the effectiveness of the theoretical results.

[1]  Jiliang Tang,et al.  Linear Convergent Decentralized Optimization with Compression , 2021, ICLR.

[2]  Yiguang Hong,et al.  Quantized Subgradient Algorithm and Data-Rate Analysis for Distributed Optimization , 2014, IEEE Transactions on Control of Network Systems.

[3]  Jian Li,et al.  A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization , 2018, NeurIPS.

[4]  Christopher De Sa,et al.  Moniqua: Modulo Quantized Communication in Decentralized SGD , 2020, ICML.

[5]  Aryan Mokhtari,et al.  Robust and Communication-Efficient Collaborative Learning , 2019, NeurIPS.

[6]  Martin Jaggi,et al.  A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! , 2020, AISTATS.

[7]  Jemin George,et al.  SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization , 2020, IEEE Journal on Selected Areas in Information Theory.

[8]  Hanlin Tang,et al.  Communication Compression for Decentralized Training , 2018, NeurIPS.

[9]  Georgios B. Giannakis,et al.  Distributed Clustering Using Wireless Sensor Networks , 2011, IEEE Journal of Selected Topics in Signal Processing.

[10]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[11]  Thinh T. Doan,et al.  Convergence Rates of Distributed Gradient Methods Under Random Quantization: A Stochastic Approximation Approach , 2021, IEEE Transactions on Automatic Control.

[12]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[13]  John N. Tsitsiklis,et al.  Distributed subgradient methods and quantization effects , 2008, 2008 47th IEEE Conference on Decision and Control.

[14]  Suhas Diggavi,et al.  SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Optimization , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[15]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[16]  Peter Richtárik,et al.  On Biased Compression for Distributed Learning , 2020, ArXiv.

[17]  Na Li,et al.  Distributed Zero-Order Algorithms for Nonconvex Multi-Agent optimization , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Sebastian U. Stich,et al.  Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.

[19]  Vyacheslav Kungurtsev,et al.  Second-order Guarantees of Gradient Algorithms over Networks , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Brian D. O. Anderson,et al.  Distributed Algorithms with Finite Data Rates that Solve Linear Equations , 2020, SIAM J. Optim..

[21]  Gesualdo Scutari,et al.  Finite Rate Quantized Distributed optimization with Geometric Convergence , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[22]  Mingyi Hong,et al.  Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization , 2019, Math. Program..

[23]  Na Li,et al.  On Maintaining Linear Convergence of Distributed Learning and Optimization Under Limited Communication , 2019, IEEE Transactions on Signal Processing.

[24]  Mikael Johansson,et al.  Compressed Gradient Methods with Hessian-Aided Error Compensation , 2019 .

[25]  Shengyuan Xu,et al.  Distributed dual averaging method for multi-agent optimization with quantized communication , 2012, Syst. Control. Lett..

[26]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[27]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[28]  Anna Scaglione,et al.  A consensus-based decentralized algorithm for non-convex optimization with application to dictionary learning , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Aryan Mokhtari,et al.  An Exact Quantized Decentralized Gradient Descent Algorithm , 2018, IEEE Transactions on Signal Processing.

[30]  Jiaqi Zhang,et al.  Innovation Compression for Communication-Efficient Distributed Optimization With Linear Convergence , 2021, IEEE Transactions on Automatic Control.

[31]  Ligang Wu,et al.  Quantized Distributed Gradient Tracking Algorithm With Linear Convergence in Directed Networks , 2021, IEEE Transactions on Automatic Control.

[32]  Thinh T. Doan,et al.  Fast Convergence Rates of Distributed Subgradient Methods With Adaptive Quantization , 2018, IEEE Transactions on Automatic Control.

[33]  Mingyi Hong,et al.  Distributed Learning in the Nonconvex World: From batch data to streaming and beyond , 2020, IEEE Signal Processing Magazine.

[34]  Martin Jaggi,et al.  Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.

[35]  Aryan Mokhtari,et al.  Quantized Decentralized Stochastic Learning over Directed Graphs , 2020, ICML.

[36]  Xinlei Yi,et al.  Linear Convergence of First- and Zeroth-Order Primal–Dual Algorithms for Distributed Nonconvex Optimization , 2019, IEEE Transactions on Automatic Control.

[37]  Naoki Hayashi,et al.  Linear Convergence of Consensus-Based Quantized Optimization for Smooth and Strongly Convex Cost Functions , 2021, IEEE Transactions on Automatic Control.

[38]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[39]  Mingyi Hong,et al.  Quantized consensus ADMM for multi-agent distributed optimization , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Karl Henrik Johansson,et al.  Distributed Optimization for Second-Order Multi-Agent Systems with Dynamic Event-Triggered Communication , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[41]  Mingyi Hong,et al.  Distributed Non-Convex First-Order optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[42]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[43]  Shi Pu,et al.  Compressed Gradient Tracking Methods for Decentralized Optimization with Linear Convergence , 2021 .