Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters

In this article, we study the communication, and (sub)gradient computation costs in distributed optimization. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization, and it obtains the near optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\epsilon (1-\sigma _2(W))}}\log \frac{1}{\epsilon })$</tex-math></inline-formula> communication complexity, and the optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\epsilon }})$</tex-math></inline-formula> gradient computation complexity for <inline-formula><tex-math notation="LaTeX">$L$</tex-math></inline-formula>-smooth convex problems, where <inline-formula><tex-math notation="LaTeX">$\sigma _2(W)$</tex-math></inline-formula> denotes the second largest singular value of the weight matrix <inline-formula><tex-math notation="LaTeX">$W$</tex-math></inline-formula> associated to the network, and <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula> is the target accuracy. When the problem is <inline-formula><tex-math notation="LaTeX">$\mu$</tex-math></inline-formula>-strongly convex, and <inline-formula><tex-math notation="LaTeX">$L$</tex-math></inline-formula>-smooth, our algorithm has the near optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\mu (1-\sigma _2(W))}}\log ^2\frac{1}{\epsilon })$</tex-math></inline-formula> complexity for communications, and the optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\mu }}\log \frac{1}{\epsilon })$</tex-math></inline-formula> complexity for gradient computations. Our communication complexities are only worse by a factor of <inline-formula><tex-math notation="LaTeX">$(\log \frac{1}{\epsilon })$</tex-math></inline-formula> than the lower bounds. Our second algorithm is designed for nonsmooth distributed optimization, and it achieves both the optimal <inline-formula><tex-math notation="LaTeX">$O(\frac{1}{\epsilon \sqrt{1-\sigma _2(W)}})$</tex-math></inline-formula> communication complexity, and <inline-formula><tex-math notation="LaTeX">$O(\frac{1}{\epsilon ^2})$</tex-math></inline-formula> subgradient computation complexity, which match the lower bounds for nonsmooth distributed optimization.

[1]  Angelia Nedic,et al.  Distributed Non-Autonomous Power Control through Distributed Convex Optimization , 2009, IEEE INFOCOM 2009.

[2]  Dusan Jakovetic,et al.  A Unification and Generalization of Exact Distributed First-Order Methods , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[3]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[4]  Asuman E. Ozdaglar,et al.  Convergence Rate of Distributed ADMM Over Networks , 2016, IEEE Transactions on Automatic Control.

[5]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[6]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[7]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[8]  N. S. Aybat,et al.  Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization , 2015, IEEE Transactions on Automatic Control.

[9]  Zhouchen Lin,et al.  Accelerated Alternating Direction Method of Multipliers: An Optimal O(1 / K) Nonergodic Analysis , 2016, Journal of Scientific Computing.

[10]  Albert S. Berahas,et al.  Balancing Communication and Computation in Distributed Optimization , 2017, IEEE Transactions on Automatic Control.

[11]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[12]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[13]  Emiliano Dall'Anese,et al.  Fast Consensus by the Alternating Direction Multipliers Method , 2011, IEEE Transactions on Signal Processing.

[14]  Kilian Q. Weinberger,et al.  Optimal Convergence Rates for Convex Distributed Optimization in Networks , 2019, J. Mach. Learn. Res..

[15]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[16]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[17]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[18]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.

[19]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[20]  Guanghui Lan,et al.  Gradient sliding for composite optimization , 2014, Mathematical Programming.

[21]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[22]  Wei Shi,et al.  Expander graph and communication-efficient decentralized optimization , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[23]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[24]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[25]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[26]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[27]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[28]  Angelia Nedic,et al.  Asynchronous Broadcast-Based Convex Optimization Over a Network , 2011, IEEE Transactions on Automatic Control.

[29]  Guanghui Lan,et al.  Accelerated gradient sliding for structured convex optimization , 2016, Computational Optimization and Applications.

[30]  Pascal Bianchi,et al.  Explicit Convergence Rate of a Distributed Alternating Direction Method of Multipliers , 2013, IEEE Transactions on Automatic Control.

[31]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[32]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[33]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[34]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[35]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[36]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[37]  Renato D. C. Monteiro,et al.  Iteration-complexity of first-order penalty methods for convex programming , 2013, Math. Program..

[38]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[39]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[40]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[41]  Jennifer A. Scott,et al.  Chebyshev acceleration of iterative refinement , 2014, Numerical Algorithms.

[42]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[43]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[44]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[45]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[46]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[47]  Ion Necoara,et al.  Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming , 2015, Optim. Methods Softw..

[48]  Hao Yu,et al.  On the Convergence Time of Dual Subgradient Methods for Strongly Convex Programs , 2015, IEEE Transactions on Automatic Control.

[49]  A. Stephen Morse,et al.  Accelerated linear iterations for distributed averaging , 2011, Annu. Rev. Control..

[50]  Wei Ren,et al.  Consensus based formation control strategies for multi-vehicle systems , 2006, 2006 American Control Conference.

[51]  Hui Yu,et al.  Average Consensus for Directed Networks of Multi-agent with Time-Varying Delay , 2010, ICSI.

[52]  R. Murray,et al.  Decentralized Multi-Agent Optimization via Dual Decomposition , 2011 .