Accelerated Distributed Nesterov Gradient Descent

This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an accelerated distributed Nesterov gradient descent method. When the objective function is convex and <inline-formula><tex-math notation="LaTeX">$L$</tex-math></inline-formula>-smooth, we show that it achieves a <inline-formula><tex-math notation="LaTeX">$O(\frac{1}{t^{1.4-\epsilon }})$</tex-math></inline-formula> convergence rate for all <inline-formula><tex-math notation="LaTeX">$\epsilon \in (0,1.4)$</tex-math></inline-formula>. We also show the convergence rate can be improved to <inline-formula><tex-math notation="LaTeX">$O(\frac{1}{t^2})$</tex-math></inline-formula> if the objective function is a composition of a linear map and a strongly convex and smooth function. When the objective function is <inline-formula><tex-math notation="LaTeX">$\mu$</tex-math></inline-formula>-strongly convex and <inline-formula><tex-math notation="LaTeX">$L$</tex-math></inline-formula>-smooth, we show that it achieves a linear convergence rate of <inline-formula><tex-math notation="LaTeX">$O([ 1 - C (\frac{\mu }{L})^{5/7} ]^t)$</tex-math></inline-formula>, where <inline-formula><tex-math notation="LaTeX">$\frac{L}{\mu }$</tex-math></inline-formula> is the condition number of the objective, and <inline-formula><tex-math notation="LaTeX">$C>0$</tex-math></inline-formula> is some constant that does not depend on <inline-formula><tex-math notation="LaTeX">$\frac{L}{\mu }$</tex-math></inline-formula>.

[1]  Wei Shi,et al.  Geometrically convergent distributed optimization with uncoordinated step-sizes , 2016, 2017 American Control Conference (ACC).

[2]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[3]  Georgios B. Giannakis,et al.  Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity , 2010, IEEE Transactions on Signal Processing.

[4]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[5]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[6]  Annie I-An Chen,et al.  Fast Distributed First-Order Methods , 2012 .

[7]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[8]  John S. Baras,et al.  Performance Evaluation of the Consensus-Based Distributed Subgradient Method Under Random Communication Topologies , 2011, IEEE Journal of Selected Topics in Signal Processing.

[9]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[10]  A. Ozdaglar,et al.  Convergence analysis of distributed subgradient methods over random networks , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[11]  Chenguang Xi,et al.  On the Linear Convergence of Distributed Optimization over Directed Graphs , 2015, 1510.02149.

[12]  John N. Tsitsiklis,et al.  Convergence Speed in Distributed Consensus and Averaging , 2009, SIAM J. Control. Optim..

[13]  Gesualdo Scutari,et al.  Distributed nonconvex optimization over networks , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[14]  Sonia Martínez,et al.  On Distributed Convex Optimization Under Inequality and Equality Constraints , 2010, IEEE Transactions on Automatic Control.

[15]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[16]  Darinka Dentcheva,et al.  An augmented Lagrangian method for distributed optimization , 2014, Mathematical Programming.

[17]  Asuman E. Ozdaglar,et al.  Distributed multi-agent optimization with state-dependent communication , 2010, Math. Program..

[18]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[19]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[20]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[21]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[22]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[23]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[24]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[25]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[26]  Wei Shi,et al.  A Push-Pull Gradient Method for Distributed Optimization in Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[27]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[28]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[29]  Usman A. Khan,et al.  ADD-OPT: Accelerated Distributed Directed Optimization , 2016, IEEE Transactions on Automatic Control.

[30]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent for smooth and strongly convex functions , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[31]  Björn Johansson,et al.  On Distributed Optimization in Networked Systems , 2008 .

[32]  Wotao Yin,et al.  ExtraPush for Convex Smooth Decentralized Optimization over Directed Networks , 2015, ArXiv.

[33]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[34]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[35]  Y. Nesterov,et al.  First-order methods with inexact oracle: the strongly convex case , 2013 .

[36]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[37]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[38]  Angelia Nedic,et al.  Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.

[39]  Asuman E. Ozdaglar,et al.  A fast distributed proximal-gradient method , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[40]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[41]  Usman A. Khan,et al.  A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.

[42]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[43]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[44]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[45]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[46]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[47]  Alex Olshevsky,et al.  Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[48]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[49]  Asuman E. Ozdaglar,et al.  Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[50]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[51]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .