Distributed Nesterov gradient methods for random networks: Convergence in probability and convergence rates

We consider distributed optimization where N nodes in a generic, connected network minimize the sum of their individual, locally known, convex costs. Existing literature proposes distributed gradient-like methods that are attractive due to computationally cheap iterations and provable resilience to random inter-node communication failures, but such methods have slow theoretical and empirical convergence rates. Building from the centralized Nesterov gradient methods, we propose accelerated distributed gradient-like methods and establish that they achieve strictly faster rates than existing distributed methods. At the same time, our methods maintain cheap iterations and resilience to random communication failures. Specifically, for convex, differentiable local costs with Lipschitz continuous and bounded derivative, we establish (with respect to the cost function optimality) convergence in probability and convergence rates in expectation and in second moment.

[1]  Ali H. Sayed,et al.  Diffusion LMS Strategies for Distributed Estimation , 2010, IEEE Transactions on Signal Processing.

[2]  José M. F. Moura,et al.  Convergence Rates of Distributed Nesterov-Like Gradient Methods on Random Networks , 2013, IEEE Transactions on Signal Processing.

[3]  Alireza Tahbaz-Salehi,et al.  A Necessary and Sufficient Condition for Consensus Over Random Networks , 2008, IEEE Transactions on Automatic Control.

[4]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[5]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[6]  Alireza Tahbaz-Salehi,et al.  On consensus over random networks , 2006 .

[7]  Asuman E. Ozdaglar,et al.  A fast distributed proximal-gradient method , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  A. Ozdaglar,et al.  Convergence analysis of distributed subgradient methods over random networks , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[9]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[10]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[11]  Angelia Nedic,et al.  Asynchronous gossip algorithms for stochastic optimization , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[12]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.