Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time. We solve two fundamental problems for this task. First, we establish the first lower bounds on the number of decentralized communication rounds and the number of local computations required to find an -accurate solution. Second, we design two optimal algorithms that attain these lower bounds: (i) a variant of the recently proposed algorithm ADOM (Kovalev et al., 2021) enhanced via a multi-consensus subroutine, which is optimal in the case when access to the dual gradients is assumed, and (ii) a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.

[1]  L. Zadeh,et al.  Time-Varying Networks, I , 1961, Proceedings of the IRE.

[2]  Marc Teboulle,et al.  An $O(1/k)$ Gradient Method for Network Resource Allocation Problems , 2014, IEEE Transactions on Control of Network Systems.

[3]  Alexander Gasnikov,et al.  ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks , 2021, ICML.

[4]  A. Gasnikov,et al.  Decentralized and Parallelized Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems , 2019, 1904.09015.

[5]  Dmitry Kovalev,et al.  Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization , 2020, NeurIPS.

[6]  Zhouchen Lin,et al.  A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods , 2018, 1810.01053.

[7]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[8]  Martin Jaggi,et al.  Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.

[9]  Eduard A. Gorbunov,et al.  Linearly Converging Error Compensated SGD , 2020, NeurIPS.

[10]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[11]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[12]  Joakim Jaldén,et al.  PANDA: A Dual Linearly Converging Method for Distributed Optimization Over Time-Varying Undirected Graphs , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[13]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[14]  Laurent Condat,et al.  Proximal Splitting Algorithms: A Tour of Recent Advances, with New Twists. , 2020 .

[15]  Wei Shi,et al.  Push–Pull Gradient Methods for Distributed Optimization in Networks , 2021, IEEE Transactions on Automatic Control.

[16]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[17]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[18]  Laurent Condat,et al.  An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints , 2021 .

[19]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.

[20]  Bart De Schutter,et al.  Accelerated gradient methods and dual decomposition in distributed model predictive control , 2013, Autom..

[21]  Sebastian U. Stich,et al.  The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, 1909.05350.

[22]  A. Gasnikov,et al.  Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks , 2020, OPTIMA.

[23]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[24]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[25]  Ufuk Topcu,et al.  Optimal decentralized protocol for electric vehicle charging , 2011, IEEE Transactions on Power Systems.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Le Song,et al.  Estimating time-varying networks , 2008, ISMB 2008.

[28]  Georgios B. Giannakis,et al.  Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity , 2010, IEEE Transactions on Signal Processing.

[29]  Laurent Condat,et al.  Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms , 2020, ArXiv.

[30]  Haishan Ye,et al.  Multi-consensus Decentralized Accelerated Gradient Descent , 2020, ArXiv.

[31]  Zhouchen Lin,et al.  Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization , 2021, ArXiv.