论文信息 - Distributed stochastic gradient tracking methods

Distributed stochastic gradient tracking methods

In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size n , which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.

Angelia Nedic | Shi Pu | A. Nedić | Shi Pu

[1] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3] John N. Tsitsiklis,et al. Distributed asynchronous deterministic and stochastic gradient optimization algorithms , 1986 .

[4] Endre Szemerédi,et al. On the second eigenvalue of random regular graphs , 1989, STOC '89.

[5] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[6] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[7] H. Robbins. A Stochastic Approximation Method , 1951 .

[8] Alexander I. J. Forrester,et al. Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9] Jack P. C. Kleijnen. Design and Analysis of Simulation Experiments , 2007 .

[10] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[11] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[12] Choon Yik Tang,et al. A gossip algorithm for convex consensus optimization over networks , 2010, Proceedings of the 2010 American Control Conference.

[13] Asuman E. Ozdaglar,et al. Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[14] Angelia Nedic,et al. Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[15] Angelia Nedic,et al. Distributed Asynchronous Constrained Stochastic Optimization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[16] Asuman E. Ozdaglar,et al. Distributed multi-agent optimization with state-dependent communication , 2010, Math. Program..

[17] Ali H. Sayed,et al. Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.

[18] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[19] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[20] Gonzalo Mateos,et al. Distributed Recursive Least-Squares: Stability and Performance Analysis , 2011, IEEE Transactions on Signal Processing.

[21] Pascal Bianchi,et al. Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[22] Slawomir Stanczak,et al. A Distributed Subgradient Method for Dynamic Convex Optimization Problems Under Noisy Information Exchange , 2013, IEEE Journal of Selected Topics in Signal Processing.

[23] José M. F. Moura,et al. Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[24] Gonzalo Mateos,et al. Proximal-Gradient Algorithms for Tracking Cascades Over Social Networks , 2014, IEEE Journal of Selected Topics in Signal Processing.

[25] Ali H. Sayed,et al. Adaptive Penalty-Based Distributed Stochastic Convex Optimization , 2013, IEEE Transactions on Signal Processing.

[26] Ali H. Sayed,et al. Adaptive Networks , 2014, Proceedings of the IEEE.

[27] Ali H. Sayed,et al. On the Learning Behavior of Adaptive Networks—Part II: Performance Analysis , 2013, IEEE Transactions on Information Theory.

[28] Darinka Dentcheva,et al. An augmented Lagrangian method for distributed optimization , 2015, Math. Program..

[29] Sonia Martínez,et al. Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication , 2014, Autom..

[30] Qing Ling,et al. DLM: Decentralized Linearized Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Signal Processing.

[31] Ali H. Sayed,et al. On the Learning Behavior of Adaptive Networks—Part I: Transient Analysis , 2013, IEEE Transactions on Information Theory.

[32] Qing Ling,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[33] Michael M. Zavlanos,et al. A Distributed Algorithm for Convex Constrained Optimization Under Noise , 2016, IEEE Transactions on Automatic Control.

[34] Angelia Nedic,et al. Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[35] Zongli Lin,et al. Noise Reduction by Swarming in Social Foraging , 2016, IEEE Transactions on Automatic Control.

[36] Gesualdo Scutari,et al. NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[37] Vivek S. Borkar,et al. Nonlinear Gossip , 2016, SIAM J. Control. Optim..

[38] Angelia Nedic,et al. Asynchronous Gossip-Based Random Projection Algorithms Over Networks , 2013, IEEE Transactions on Automatic Control.

[39] R. Srikant,et al. On projected stochastic gradient descent algorithm with weighted averaging for least squares regression , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[41] Wei Shi,et al. Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[42] Angelia Nedić,et al. Fast Convergence Rates for Distributed Non-Bayesian Learning , 2015, IEEE Transactions on Automatic Control.

[43] R. Srikant,et al. Distributed Learning Algorithms for Spectrum Sharing in Spatial Random Access Wireless Networks , 2015, IEEE Transactions on Automatic Control.

[44] Tamer Başar,et al. Stochastic Subgradient Algorithms for Strongly Convex Optimization Over Distributed Networks , 2014, IEEE Transactions on Network Science and Engineering.

[45] N. S. Aybat,et al. Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization , 2015, IEEE Transactions on Automatic Control.

[46] Shi Pu,et al. Swarming for Faster Convergence in Stochastic Optimization , 2018, SIAM J. Control. Optim..

[47] Michael G. Rabbat,et al. Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[48] Angelia Nedic,et al. A Distributed Stochastic Gradient Tracking Method , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[49] Xiaojing Ye,et al. Decentralized Consensus Algorithm with Delayed and Stochastic Gradients , 2016, SIAM J. Optim..

[50] Na Li,et al. Harnessing Smoothness to Accelerate Distributed Optimization , 2016, IEEE Transactions on Control of Network Systems.

[51] Anit Kumar Sahu,et al. Convergence Rates for Distributed Stochastic Optimization Over Random Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[52] Wei Shi,et al. A Push-Pull Gradient Method for Distributed Optimization in Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[53] Shi Pu,et al. A Flocking-Based Approach for Distributed Stochastic Optimization , 2018, Oper. Res..

[54] Yi Zhou,et al. Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[55] Ioannis Ch. Paschalidis,et al. Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions , 2018, J. Mach. Learn. Res..

[56] A. Nedić,et al. Push–Pull Gradient Methods for Distributed Optimization in Networks , 2018, IEEE Transactions on Automatic Control.

[57] I. Paschalidis,et al. A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent , 2019, IEEE Transactions on Automatic Control.