Distributed Nesterov-like gradient algorithms

In classical, centralized optimization, the Nesterov gradient algorithm reduces the number of iterations to produce an ε-accurate solution (in terms of the cost function) with respect to ordinary gradient from O(1/ε) to equation. This improvement is achieved on a class of convex functions with Lipschitz continuous first derivative, and it comes at a very small additional computational cost per iteration. In this paper, we consider distributed optimization, where nodes in the network cooperatively minimize the sum of their private costs subject to a global constraint. To solve this problem, recent literature proposes distributed (sub)gradient algorithms, that are attractive due to computationally inexpensive iterations, but that converge slowly-the ε error is achieved in O(1/ε2) iterations. Here, building from the Nesterov gradient algorithm, we present a distributed, constant step size, Nesterov-like gradient algorithm that converges much faster than existing distributed (sub)gradient methods, with zero additional communications and very small additional computations per iteration k. We show that our algorithm converges to a solution neighborhood, such that, for a convex compact constraint set and optimized stepsize, the convergence time is O(1/ε). We achieve this on a class of convex, coercive, continuously differentiable private costs with Lipschitz first derivative. We derive our algorithm through a useful penalty, network's Laplacian matrix-based reformulation of the original problem (referred to as the clone problem) - the proposed method is precisely the Nesterov-gradient applied on the clone problem. Finally, we illustrate the performance of our algorithm on distributed learning of a classifier via logistic loss.

[1]  David A. Wismer,et al.  Optimization methods for large-scale systems ... with applications , 1971 .

[2]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[3]  A. Ozdaglar,et al.  Convergence analysis of distributed subgradient methods over random networks , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[4]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[5]  Angelia Nedic,et al.  Asynchronous Broadcast-Based Convex Optimization Over a Network , 2011, IEEE Transactions on Automatic Control.

[6]  Michael G. Rabbat,et al.  Distributed dual averaging for convex optimization under communication delays , 2012, 2012 American Control Conference (ACC).

[7]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[8]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[9]  Martin J. Wainwright,et al.  Dual averaging for distributed optimization , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[11]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[12]  Alejandro Ribeiro,et al.  Accelerated dual descent for network optimization , 2011, Proceedings of the 2011 American Control Conference.

[13]  Angelia Nedic,et al.  Asynchronous gossip algorithms for stochastic optimization , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[14]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[15]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[16]  José M. F. Moura,et al.  Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms With Directed Gossip Communication , 2010, IEEE Transactions on Signal Processing.