On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems, due to the large scale of the data sets, the data and computation must be distributed over multiple processors resulting in the need for distributed algorithms. In this paper, we consider a popular distributed gradient-based consensus algorithm, which only requires local computation and communication. An important problem in this area is to analyze the convergence rate of such algorithms in the presence of communication delays that are inevitable in distributed systems. We prove the convergence of the gradient-based consensus algorithm in the presence of uniform, but possibly arbitrarily large, communication delays between the processors. Moreover, we obtain an upper bound on the rate of convergence of the algorithm as a function of the network size, topology, and the inter-processor communication delays.

[1]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[2]  Jack K. Hale,et al.  Introduction to Functional Differential Equations , 1993, Applied Mathematical Sciences.

[3]  R. Srikant,et al.  On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[4]  Michael G. Rabbat,et al.  Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Michael G. Rabbat,et al.  Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[6]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[7]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[8]  Christoforos N. Hadjicostis,et al.  Distributed Finite-Time Average Consensus in Digraphs in the Presence of Time Delays , 2015, IEEE Transactions on Control of Network Systems.

[9]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[10]  Asuman E. Ozdaglar,et al.  Convergence rate for consensus with delays , 2010, J. Glob. Optim..

[11]  J.N. Tsitsiklis,et al.  Convergence in Multiagent Coordination, Consensus, and Flocking , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[12]  Michael G. Rabbat,et al.  Distributed dual averaging for convex optimization under communication delays , 2012, 2012 American Control Conference (ACC).

[13]  Asuman E. Ozdaglar,et al.  Broadcast-based distributed alternating direction method of multipliers , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Convex Optimization in Signal Processing and Communications , 2010 .

[15]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  Bahman Gharesifard,et al.  Distributed Continuous-Time Convex Optimization on Weight-Balanced Digraphs , 2012, IEEE Transactions on Automatic Control.

[18]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[19]  Frank Allgöwer,et al.  Consensus in Multi-Agent Systems With Coupling Delays and Switching Topology , 2011, IEEE Transactions on Automatic Control.

[20]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[21]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[22]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[23]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[24]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[27]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[28]  Bahman Gharesifard,et al.  Continuous-time distributed convex optimization on time-varying directed networks , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[29]  Angelia Nedic,et al.  Distributed Optimization Over Time-Varying Directed Graphs , 2015, IEEE Trans. Autom. Control..

[30]  John N. Tsitsiklis,et al.  On Distributed Averaging Algorithms and Quantization Effects , 2008, IEEE Trans. Autom. Control..

[31]  Michael G. Rabbat,et al.  The Impact of Communication Delays on Distributed Consensus Algorithms , 2012, ArXiv.