On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems, due to the large scale of the data sets, the data and computation must be distributed over processors resulting in the need for distributed algorithms. In this paper, we consider a popular distributed gradient-based consensus algorithm, which only requires local computation and communication. An important problem in this area is to analyze the convergence rate of such algorithms in the presence of communication delays that are inevitable in distributed systems. We prove the convergence of the gradient-based consensus algorithm in the presence of uniform, but possibly arbitrarily large, communication delays between the processors. Moreover, we obtain an upper bound on the rate of convergence of the algorithm as a function of the network size, topology, and the inter-processor communication delays.

[1]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[2]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[3]  Asuman E. Ozdaglar,et al.  Broadcast-based distributed alternating direction method of multipliers , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[5]  Asuman E. Ozdaglar,et al.  Convergence rate for consensus with delays , 2010, J. Glob. Optim..

[6]  J.N. Tsitsiklis,et al.  Convergence in Multiagent Coordination, Consensus, and Flocking , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[7]  Michael G. Rabbat,et al.  Distributed dual averaging for convex optimization under communication delays , 2012, 2012 American Control Conference (ACC).

[8]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[9]  Frank Allgöwer,et al.  Consensus in Multi-Agent Systems With Coupling Delays and Switching Topology , 2011, IEEE Transactions on Automatic Control.

[10]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[11]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[12]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[13]  Bahman Gharesifard,et al.  Distributed Continuous-Time Convex Optimization on Weight-Balanced Digraphs , 2012, IEEE Transactions on Automatic Control.

[14]  Convex Optimization in Signal Processing and Communications , 2010 .

[15]  Michael G. Rabbat,et al.  The Impact of Communication Delays on Distributed Consensus Algorithms , 2012, ArXiv.

[16]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[17]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[18]  Bahman Gharesifard,et al.  Continuous-time distributed convex optimization on time-varying directed networks , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[19]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[20]  Michael G. Rabbat,et al.  Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21]  Michael G. Rabbat,et al.  Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[22]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[23]  Christoforos N. Hadjicostis,et al.  Distributed Finite-Time Average Consensus in Digraphs in the Presence of Time Delays , 2015, IEEE Transactions on Control of Network Systems.

[24]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[25]  John N. Tsitsiklis,et al.  On distributed averaging algorithms and quantization effects , 2007, 2008 47th IEEE Conference on Decision and Control.

[26]  A. Banerjee Convex Analysis and Optimization , 2006 .

[27]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[28]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[29]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[30]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.