Distributed stochastic gradient tracking methods

In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size n , which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  John N. Tsitsiklis,et al.  Distributed asynchronous deterministic and stochastic gradient optimization algorithms , 1986 .

[4]  Endre Szemerédi,et al.  On the second eigenvalue of random regular graphs , 1989, STOC '89.

[5]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[6]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[7]  H. Robbins A Stochastic Approximation Method , 1951 .

[8]  Alexander I. J. Forrester,et al.  Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9]  Jack P. C. Kleijnen Design and Analysis of Simulation Experiments , 2007 .

[10]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[11]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[12]  Choon Yik Tang,et al.  A gossip algorithm for convex consensus optimization over networks , 2010, Proceedings of the 2010 American Control Conference.

[13]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[14]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[15]  Angelia Nedic,et al.  Distributed Asynchronous Constrained Stochastic Optimization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[16]  Asuman E. Ozdaglar,et al.  Distributed multi-agent optimization with state-dependent communication , 2010, Math. Program..

[17]  Ali H. Sayed,et al.  Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.

[18]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[19]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[20]  Gonzalo Mateos,et al.  Distributed Recursive Least-Squares: Stability and Performance Analysis , 2011, IEEE Transactions on Signal Processing.

[21]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[22]  Slawomir Stanczak,et al.  A Distributed Subgradient Method for Dynamic Convex Optimization Problems Under Noisy Information Exchange , 2013, IEEE Journal of Selected Topics in Signal Processing.

[23]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[24]  Gonzalo Mateos,et al.  Proximal-Gradient Algorithms for Tracking Cascades Over Social Networks , 2014, IEEE Journal of Selected Topics in Signal Processing.

[25]  Ali H. Sayed,et al.  Adaptive Penalty-Based Distributed Stochastic Convex Optimization , 2013, IEEE Transactions on Signal Processing.

[26]  Ali H. Sayed,et al.  Adaptive Networks , 2014, Proceedings of the IEEE.

[27]  Ali H. Sayed,et al.  On the Learning Behavior of Adaptive Networks—Part II: Performance Analysis , 2013, IEEE Transactions on Information Theory.

[28]  Darinka Dentcheva,et al.  An augmented Lagrangian method for distributed optimization , 2015, Math. Program..

[29]  Sonia Martínez,et al.  Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication , 2014, Autom..

[30]  Qing Ling,et al.  DLM: Decentralized Linearized Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Signal Processing.

[31]  Ali H. Sayed,et al.  On the Learning Behavior of Adaptive Networks—Part I: Transient Analysis , 2013, IEEE Transactions on Information Theory.

[32]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[33]  Michael M. Zavlanos,et al.  A Distributed Algorithm for Convex Constrained Optimization Under Noise , 2016, IEEE Transactions on Automatic Control.

[34]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[35]  Zongli Lin,et al.  Noise Reduction by Swarming in Social Foraging , 2016, IEEE Transactions on Automatic Control.

[36]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[37]  Vivek S. Borkar,et al.  Nonlinear Gossip , 2016, SIAM J. Control. Optim..

[38]  Angelia Nedic,et al.  Asynchronous Gossip-Based Random Projection Algorithms Over Networks , 2013, IEEE Transactions on Automatic Control.

[39]  R. Srikant,et al.  On projected stochastic gradient descent algorithm with weighted averaging for least squares regression , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[41]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[42]  Angelia Nedić,et al.  Fast Convergence Rates for Distributed Non-Bayesian Learning , 2015, IEEE Transactions on Automatic Control.

[43]  R. Srikant,et al.  Distributed Learning Algorithms for Spectrum Sharing in Spatial Random Access Wireless Networks , 2015, IEEE Transactions on Automatic Control.

[44]  Tamer Başar,et al.  Stochastic Subgradient Algorithms for Strongly Convex Optimization Over Distributed Networks , 2014, IEEE Transactions on Network Science and Engineering.

[45]  N. S. Aybat,et al.  Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization , 2015, IEEE Transactions on Automatic Control.

[46]  Shi Pu,et al.  Swarming for Faster Convergence in Stochastic Optimization , 2018, SIAM J. Control. Optim..

[47]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[48]  Angelia Nedic,et al.  A Distributed Stochastic Gradient Tracking Method , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[49]  Xiaojing Ye,et al.  Decentralized Consensus Algorithm with Delayed and Stochastic Gradients , 2016, SIAM J. Optim..

[50]  Na Li,et al.  Harnessing Smoothness to Accelerate Distributed Optimization , 2016, IEEE Transactions on Control of Network Systems.

[51]  Anit Kumar Sahu,et al.  Convergence Rates for Distributed Stochastic Optimization Over Random Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[52]  Wei Shi,et al.  A Push-Pull Gradient Method for Distributed Optimization in Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[53]  Shi Pu,et al.  A Flocking-Based Approach for Distributed Stochastic Optimization , 2018, Oper. Res..

[54]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[55]  Ioannis Ch. Paschalidis,et al.  Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions , 2018, J. Mach. Learn. Res..

[56]  A. Nedić,et al.  Push–Pull Gradient Methods for Distributed Optimization in Networks , 2018, IEEE Transactions on Automatic Control.

[57]  I. Paschalidis,et al.  A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent , 2019, IEEE Transactions on Automatic Control.