Fast Distributed Gradient Methods

We study distributed optimization problems when N nodes minimize the sum of their individual costs subject to a common vector variable. The costs are convex, have Lipschitz continuous gradient (with constant L), and bounded gradient. We propose two fast distributed gradient algorithms based on the centralized Nesterov gradient algorithm and establish their convergence rates in terms of the per-node communications K and the per-node gradient evaluations k. Our first method, Distributed Nesterov Gradient, achieves rates O( logK/K) and O(logk/k). Our second method, Distributed Nesterov gradient with Consensus iterations, assumes at all nodes knowledge of L and μ(W) - the second largest singular value of the N ×N doubly stochastic weight matrix W. It achieves rates O( 1/ K2-ξ) and O( 1/k2) ( ξ > 0 arbitrarily small). Further, we give for both methods explicit dependence of the convergence constants on N and W. Simulation examples illustrate our findings.

[1]  R. Murray,et al.  Decentralized Multi-Agent Optimization via Dual Decomposition , 2011 .

[2]  Annie I-An Chen,et al.  Fast Distributed First-Order Methods , 2012 .

[3]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[4]  Stephen P. Boyd,et al.  A scheme for robust distributed sensor fusion based on average consensus , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[5]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[6]  José M. F. Moura,et al.  Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms With Directed Gossip Communication , 2010, IEEE Transactions on Signal Processing.

[7]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[8]  M. Johansson,et al.  Accelerated gradient methods for networked optimization , 2011 .

[9]  Johan A. K. Suykens,et al.  Application of a Smoothing Technique to Decomposition in Convex Optimization , 2008, IEEE Transactions on Automatic Control.

[10]  Angelia Nedic,et al.  Asynchronous Broadcast-Based Convex Optimization Over a Network , 2011, IEEE Transactions on Automatic Control.

[11]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[13]  José M. F. Moura,et al.  Distributed Nesterov-like gradient algorithms , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[14]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[15]  João M. F. Xavier,et al.  Basis Pursuit in sensor networks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Sonia Martínez,et al.  On Distributed Convex Optimization Under Inequality and Equality Constraints , 2010, IEEE Transactions on Automatic Control.

[17]  Karl Henrik Johansson,et al.  On decentralized negotiation of optimal consensus , 2008, Autom..

[18]  A. Ozdaglar,et al.  Convergence analysis of distributed subgradient methods over random networks , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[19]  Angelia Nedic,et al.  Multiuser Optimization: Distributed Algorithms and Error Analysis , 2011, SIAM J. Optim..

[20]  Asuman E. Ozdaglar,et al.  Distributed multi-agent optimization with state-dependent communication , 2010, Math. Program..

[21]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[22]  David Kempe,et al.  A decentralized algorithm for spectral analysis , 2004, STOC '04.

[23]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[24]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[25]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[26]  David A. Wismer,et al.  Optimization methods for large-scale systems ... with applications , 1971 .

[27]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[28]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  Angelia Nedic,et al.  Asynchronous gossip algorithms for stochastic optimization , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[30]  Angelia Nedic,et al.  Distributed Asynchronous Constrained Stochastic Optimization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[31]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[32]  Euhanna Ghadimi,et al.  Accelerated gradient methods for networked optimization , 2011, Proceedings of the 2011 American Control Conference.

[33]  João M. F. Xavier,et al.  Distributed Basis Pursuit , 2010, IEEE Transactions on Signal Processing.

[34]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[35]  Alejandro Ribeiro,et al.  A distributed line search for network optimization , 2012, 2012 American Control Conference (ACC).

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  Michael G. Rabbat,et al.  Distributed consensus and optimization under communication delays , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[39]  Georgios B. Giannakis,et al.  Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity , 2010, IEEE Transactions on Signal Processing.

[40]  John S. Baras,et al.  Performance Evaluation of the Consensus-Based Distributed Subgradient Method Under Random Communication Topologies , 2011, IEEE Journal of Selected Topics in Signal Processing.

[41]  Ali H. Sayed,et al.  Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.

[42]  Asuman E. Ozdaglar,et al.  A fast distributed proximal-gradient method , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[43]  Karl Henrik Johansson,et al.  Finite-time and Asymptotic Convergence of Distributed Averaging and Maximizing Algorithms , 2012, ArXiv.

[44]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[45]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[46]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[47]  Ali H. Sayed,et al.  Adaptive estimation algorithms over distributed networks , 2006 .