Harnessing Smoothness to Accelerate Distributed Optimization

There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. The literature has developed consensus-based distributed (sub)gradient descent (DGD) methods and has shown that they have the same convergence rate $O(\frac{\log t}{\sqrt{t}})$ as the centralized (sub)gradient methods (CGD), when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of $O(\frac{1}{t})$. If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGD to achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  John N. Tsitsiklis,et al.  Convergence Speed in Distributed Consensus and Averaging , 2009, SIAM J. Control. Optim..

[3]  Gesualdo Scutari,et al.  Distributed nonconvex optimization over networks , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[4]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[5]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[6]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[7]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[8]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[9]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[10]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[11]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[12]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[13]  Asuman E. Ozdaglar,et al.  Distributed multi-agent optimization with state-dependent communication , 2010, Math. Program..

[14]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[15]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[16]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[17]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[18]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[19]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[20]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[21]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[22]  N. Wormald Models of random regular graphs , 2010 .

[23]  Georgios B. Giannakis,et al.  Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity , 2010, IEEE Transactions on Signal Processing.

[24]  John S. Baras,et al.  Performance Evaluation of the Consensus-Based Distributed Subgradient Method Under Random Communication Topologies , 2011, IEEE Journal of Selected Topics in Signal Processing.

[25]  Van H. Vu,et al.  Generating Random Regular Graphs , 2003, STOC '03.

[26]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[27]  Dragana Bajović,et al.  Newton-like Method with Diagonal Correction for Distributed Optimization , 2015, SIAM J. Optim..

[28]  Asuman E. Ozdaglar,et al.  A fast distributed proximal-gradient method , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  Usman A. Khan,et al.  ADD-OPT: Accelerated Distributed Directed Optimization , 2016, IEEE Transactions on Automatic Control.

[30]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[31]  Wei Shi,et al.  Geometrically convergent distributed optimization with uncoordinated step-sizes , 2016, 2017 American Control Conference (ACC).

[32]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[33]  Alex Olshevsky,et al.  Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[34]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[35]  Aryan Mokhtari,et al.  Decentralized Quasi-Newton Methods , 2016, IEEE Transactions on Signal Processing.

[36]  John N. Tsitsiklis,et al.  Communication complexity of convex optimization , 1986, 1986 25th IEEE Conference on Decision and Control.

[37]  Annie I-An Chen,et al.  Fast Distributed First-Order Methods , 2012 .

[38]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[39]  Sonia Martínez,et al.  On Distributed Convex Optimization Under Inequality and Equality Constraints , 2010, IEEE Transactions on Automatic Control.

[40]  A. Ozdaglar,et al.  Convergence analysis of distributed subgradient methods over random networks , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[41]  Aryan Mokhtari,et al.  A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[42]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[43]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent for smooth and strongly convex functions , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[44]  Björn Johansson,et al.  On Distributed Optimization in Networked Systems , 2008 .

[45]  N. Wormald,et al.  Models of the , 2010 .

[46]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.