Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters

In this article, we study the communication, and (sub)gradient computation costs in distributed optimization. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization, and it obtains the near optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\epsilon (1-\sigma _2(W))}}\log \frac{1}{\epsilon })$</tex-math></inline-formula> communication complexity, and the optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\epsilon }})$</tex-math></inline-formula> gradient computation complexity for <inline-formula><tex-math notation="LaTeX">$L$</tex-math></inline-formula>-smooth convex problems, where <inline-formula><tex-math notation="LaTeX">$\sigma _2(W)$</tex-math></inline-formula> denotes the second largest singular value of the weight matrix <inline-formula><tex-math notation="LaTeX">$W$</tex-math></inline-formula> associated to the network, and <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula> is the target accuracy. When the problem is <inline-formula><tex-math notation="LaTeX">$\mu$</tex-math></inline-formula>-strongly convex, and <inline-formula><tex-math notation="LaTeX">$L$</tex-math></inline-formula>-smooth, our algorithm has the near optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\mu (1-\sigma _2(W))}}\log ^2\frac{1}{\epsilon })$</tex-math></inline-formula> complexity for communications, and the optimal <inline-formula><tex-math notation="LaTeX">$O(\sqrt{\frac{L}{\mu }}\log \frac{1}{\epsilon })$</tex-math></inline-formula> complexity for gradient computations. Our communication complexities are only worse by a factor of <inline-formula><tex-math notation="LaTeX">$(\log \frac{1}{\epsilon })$</tex-math></inline-formula> than the lower bounds. Our second algorithm is designed for nonsmooth distributed optimization, and it achieves both the optimal <inline-formula><tex-math notation="LaTeX">$O(\frac{1}{\epsilon \sqrt{1-\sigma _2(W)}})$</tex-math></inline-formula> communication complexity, and <inline-formula><tex-math notation="LaTeX">$O(\frac{1}{\epsilon ^2})$</tex-math></inline-formula> subgradient computation complexity, which match the lower bounds for nonsmooth distributed optimization.

[1]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[2]  Wei Shi,et al.  Expander graph and communication-efficient decentralized optimization , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[3]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[4]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[5]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[6]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[7]  Angelia Nedic,et al.  Asynchronous Broadcast-Based Convex Optimization Over a Network , 2011, IEEE Transactions on Automatic Control.

[8]  Asuman E. Ozdaglar,et al.  Convergence Rate of Distributed ADMM Over Networks , 2016, IEEE Transactions on Automatic Control.

[9]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[10]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[11]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[12]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[13]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[14]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[15]  Zhouchen Lin,et al.  Accelerated Alternating Direction Method of Multipliers: An Optimal O(1 / K) Nonergodic Analysis , 2016, Journal of Scientific Computing.

[16]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[17]  Wei Ren,et al.  Consensus based formation control strategies for multi-vehicle systems , 2006, 2006 American Control Conference.

[18]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[19]  Kilian Q. Weinberger,et al.  Optimal Convergence Rates for Convex Distributed Optimization in Networks , 2019, J. Mach. Learn. Res..

[20]  Pascal Bianchi,et al.  Explicit Convergence Rate of a Distributed Alternating Direction Method of Multipliers , 2013, IEEE Transactions on Automatic Control.

[21]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.

[22]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[23]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[24]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[27]  Guanghui Lan,et al.  Gradient sliding for composite optimization , 2014, Mathematical Programming.

[28]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[29]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[30]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[31]  N. S. Aybat,et al.  Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization , 2015, IEEE Transactions on Automatic Control.

[32]  A. Stephen Morse,et al.  Accelerated linear iterations for distributed averaging , 2011, Annu. Rev. Control..

[33]  Hao Yu,et al.  On the Convergence Time of Dual Subgradient Methods for Strongly Convex Programs , 2015, IEEE Transactions on Automatic Control.

[34]  Hui Yu,et al.  Average Consensus for Directed Networks of Multi-agent with Time-Varying Delay , 2010, ICSI.

[35]  Emiliano Dall'Anese,et al.  Fast Consensus by the Alternating Direction Multipliers Method , 2011, IEEE Transactions on Signal Processing.

[36]  Renato D. C. Monteiro,et al.  Iteration-complexity of first-order penalty methods for convex programming , 2013, Math. Program..

[37]  Guanghui Lan,et al.  Accelerated gradient sliding for structured convex optimization , 2016, Computational Optimization and Applications.

[38]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[39]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[40]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[41]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[42]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[43]  Angelia Nedic,et al.  Distributed Non-Autonomous Power Control through Distributed Convex Optimization , 2009, IEEE INFOCOM 2009.

[44]  Albert S. Berahas,et al.  Balancing Communication and Computation in Distributed Optimization , 2017, IEEE Transactions on Automatic Control.

[45]  Jennifer A. Scott,et al.  Chebyshev acceleration of iterative refinement , 2014, Numerical Algorithms.

[46]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[47]  Ion Necoara,et al.  Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming , 2015, Optim. Methods Softw..

[48]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[49]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[50]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[51]  Dusan Jakovetic,et al.  A Unification and Generalization of Exact Distributed First-Order Methods , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[52]  R. Murray,et al.  Decentralized Multi-Agent Optimization via Dual Decomposition , 2011 .