Decentralized Asynchronous Stochastic Gradient Descent: Convergence Rate Analysis

Decentralized algorithms for multi-agent networks have attracted a considerable research interest. Stochastic gradient descent and its variations are popularly used for developing such algorithms. This paper considers a stochastic gradient descent algorithm in which each node is randomly selected to carry out the update. The stringent computational and communication requirements of the synchronous framework are overcome by proposing an asynchronous variant that allows updates to be carried out using delayed gradients. The performance of the proposed algorithm is analyzed by developing non-asymptotic bounds on the optimality gap, as a function of the number of iterations, and for various diminishing step-size rules. The bounds indicate that the effect of asynchrony on the algorithm convergence rate is minimal. The theoretical findings of this work are further illustrated through solving a distributed estimation problem over a large network. We conclude by presenting the performance comparison of the proposed algorithm against the classical cyclic incremental algorithm.

[1]  Hamid Reza Feyzmahdavian,et al.  An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, CDC.

[2]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[3]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[4]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[5]  Ali H. Sayed,et al.  Incremental Adaptive Strategies Over Distributed Networks , 2007, IEEE Transactions on Signal Processing.

[6]  Angelia Nedic,et al.  Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[7]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[8]  Alexei A. Gaivoronski,et al.  Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[9]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[10]  Leandros Tassiulas,et al.  Resource Allocation and Cross-Layer Control in Wireless Networks , 2006, Found. Trends Netw..

[11]  Georgios B. Giannakis,et al.  Network-Compressive Coding for Wireless Sensors with Correlated Data , 2012, IEEE Transactions on Wireless Communications.

[12]  John N. Tsitsiklis,et al.  Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[13]  Alexander J. Smola,et al.  AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization , 2015, ArXiv.

[14]  Kunle Olukotun,et al.  Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.

[15]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[16]  John N. Tsitsiklis,et al.  On distributed averaging algorithms and quantization effects , 2007, 2008 47th IEEE Conference on Decision and Control.

[17]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[18]  Xiaojing Ye,et al.  Decentralized Consensus Algorithm with Delayed and Stochastic Gradients , 2016, SIAM J. Optim..

[19]  Aleksandar Dogandzic,et al.  Distributed Estimation and Detection for Sensor Networks Using Hidden Markov Random Field Models , 2006, IEEE Transactions on Signal Processing.