Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove the O(( γ 1−σγ ) 2 √ L ) and O(( γ 1−σγ ) 1.5 √ L μ log 1 ) complexities for the practical single loop accelerated gradient tracking over timevarying graphs when the problems are nonstrongly convex and strongly convex, respectively, where γ and σγ are two common constants charactering the network connectivity, is the desired precision, and L and μ are the smoothness and strong convexity constants, respectively. Our complexities improve significantly over the ones of O( 1 5/7 ) and O((Lμ ) 5/7 1 (1−σ)1.5 log 1 ), respectively, which were proved in the original literature only for static graphs, where 1 1−σ equals γ 1−σγ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to O(1) and O( γ 1−σγ ) for the computation and communication complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.

[1]  Dimitri P. Bertsekas,et al.  Distributed asynchronous computation of fixed points , 1983, Math. Program..

[2]  Alexander Gasnikov,et al.  An Accelerated Method For Decentralized Distributed Stochastic Optimization Over Time-Varying Graphs , 2021 .

[3]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[4]  Louis A. Hageman,et al.  Iterative Solution of Large Linear Systems. , 1971 .

[5]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[6]  Haishan Ye,et al.  Multi-consensus Decentralized Accelerated Gradient Descent , 2020, ArXiv.

[7]  Joakim Jaldén,et al.  PANDA: A Dual Linearly Converging Method for Distributed Optimization Over Time-Varying Undirected Graphs , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[8]  Wei Shi,et al.  A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates , 2017, IEEE Transactions on Signal Processing.

[9]  Angelia Nedic,et al.  Asynchronous Broadcast-Based Convex Optimization Over a Network , 2011, IEEE Transactions on Automatic Control.

[10]  Ying Sun,et al.  Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks , 2020, AISTATS.

[11]  A. Gasnikov,et al.  Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks , 2020, OPTIMA.

[12]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[13]  Ioannis Ch. Paschalidis,et al.  Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions , 2018, J. Mach. Learn. Res..

[14]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[15]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[16]  Huan Li,et al.  On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent , 2018, J. Mach. Learn. Res..

[17]  R. Murray,et al.  Decentralized Multi-Agent Optimization via Dual Decomposition , 2011 .

[18]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[19]  Huan Li,et al.  Accelerated Optimization for Machine Learning: First-Order Algorithms , 2020 .

[20]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[21]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[22]  Mingyi Hong,et al.  Stochastic Proximal Gradient Consensus Over Random Networks , 2015, IEEE Transactions on Signal Processing.

[23]  Wotao Yin,et al.  Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters , 2018, IEEE Transactions on Signal Processing.

[24]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[27]  Haris Vikalo,et al.  Communication-Efficient Decentralized Optimization Over Time-Varying Directed Graphs. , 2020 .

[28]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[29]  Usman A. Khan,et al.  A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.

[30]  Stefanie Jegelka,et al.  IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method , 2020, NeurIPS.

[31]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[32]  Zhouchen Lin,et al.  Revisiting EXTRA for Smooth Distributed Optimization , 2020, SIAM J. Optim..

[33]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[34]  Asuman E. Ozdaglar,et al.  Convergence Rate of Distributed ADMM Over Networks , 2016, IEEE Transactions on Automatic Control.

[35]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[36]  N. S. Aybat,et al.  Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization , 2015, IEEE Transactions on Automatic Control.

[37]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[38]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.

[39]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[40]  Angelia Nedi'c,et al.  Optimal Distributed Convex Optimization on Slowly Time-Varying Graphs , 2018, IEEE Transactions on Control of Network Systems.

[41]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[42]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[43]  Kilian Q. Weinberger,et al.  Optimal Convergence Rates for Convex Distributed Optimization in Networks , 2019, J. Mach. Learn. Res..

[44]  Dmitry Kovalev,et al.  Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization , 2020, NeurIPS.

[45]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[46]  Pascal Bianchi,et al.  Explicit Convergence Rate of a Distributed Alternating Direction Method of Multipliers , 2013, IEEE Transactions on Automatic Control.

[47]  Usman A. Khan,et al.  Optimization over time-varying directed graphs with row and column-stochastic matrices , 2018, 1810.07393.

[48]  Gesualdo Scutari,et al.  Distributed nonconvex constrained optimization over time-varying digraphs , 2018, Mathematical Programming.

[49]  Yongchun Fang,et al.  Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization , 2020, 2009.04373.

[50]  Wicak Ananduta,et al.  Accelerated Multi-Agent Optimization Method over Stochastic Networks , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[51]  Ali H. Sayed,et al.  Decentralized Proximal Gradient Algorithms With Linear Convergence Rates , 2019, IEEE Transactions on Automatic Control.

[52]  A. Ozdaglar,et al.  Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks , 2019, J. Mach. Learn. Res..

[53]  Joakim Jaldén,et al.  Eco-panda: A Computationally Economic, Geometrically Converging Dual Optimization Method on Time-varying Undirected Graphs , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[55]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[56]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[57]  Jennifer A. Scott,et al.  Chebyshev acceleration of iterative refinement , 2014, Numerical Algorithms.

[58]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[59]  Zhanxing Zhu,et al.  Neural Information Processing Systems (NIPS) , 2015 .

[60]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[61]  José M. F. Moura,et al.  Convergence Rates of Distributed Nesterov-Like Gradient Methods on Random Networks , 2013, IEEE Transactions on Signal Processing.

[62]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[63]  Haishan Ye,et al.  Decentralized Accelerated Proximal Gradient Descent , 2020, NeurIPS.

[64]  Hadrien Hendrikx,et al.  An Optimal Algorithm for Decentralized Finite Sum Optimization , 2020, SIAM J. Optim..

[65]  A. Gasnikov,et al.  Decentralized and Parallelized Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems , 2019, 1904.09015.

[66]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[67]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[68]  Dusan Jakovetic,et al.  A Unification and Generalization of Exact Distributed First-Order Methods , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[69]  Alexander Gasnikov,et al.  ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks , 2021, ICML.