Momentum methods for stochastic optimization over time-varying directed networks

Abstract Recently, decentralized optimization (DO) has received widespread attentions. Since the scale of DO is usually very large, research on acceleration has become a hotspot. However, the error caused by the global information deficiency for each agent becomes the key problem cumbering the existing elaborately designed accelerated methods from being applied to DO, directly. On the other hand, recent studies illustrate that a group of accelerated methods can be covered from a viewpoint of momentum. In this paper, we propose to follow this methodology to design accelerated algorithms and adapt them to DO over time-varying directed networks, of which the main benefit is that, because the proposed algorithms are derived from momentum, they not only avoid the design of elaborate iterative structures but also inherit the physical interpretability of momentum. Furthermore, we show that our proposed algorithms can achieve sharper convergence rates than their competitors under the same condition. In the end, experiments on a number of benchmark datasets validate well the competitiveness of our algorithms.

[1]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[2]  Zhuo-Xu Cui,et al.  A “Nonconvex+Nonconvex” approach for image restoration with impulse noise removal , 2018, Applied Mathematical Modelling.

[3]  Matthias Kaschube,et al.  Distributed network interactions and their emergence in developing neocortex , 2018, Nature Neuroscience.

[4]  Guanghui Lan,et al.  Primal-dual first-order methods with O (1/e) iteration-complexity for cone programming. , 2011 .

[5]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[6]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[7]  Lei Guo,et al.  Analysis of Distributed Adaptive Filters Based on Diffusion Strategies Over Sensor Networks , 2018, IEEE Transactions on Automatic Control.

[8]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[9]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[10]  Marc Teboulle,et al.  Interior Gradient and Proximal Methods for Convex and Conic Optimization , 2006, SIAM J. Optim..

[11]  Ali H. Sayed,et al.  Exact Diffusion for Distributed Optimization and Learning—Part I: Algorithm Development , 2017, IEEE Transactions on Signal Processing.

[12]  Milind Tambe,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2003 .

[13]  Angelia Nedic,et al.  Distributed Optimization Over Time-Varying Directed Graphs , 2015, IEEE Trans. Autom. Control..

[14]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[15]  Usman A. Khan,et al.  Distributed Subgradient Projection Algorithm Over Directed Graphs , 2016, IEEE Transactions on Automatic Control.

[16]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[17]  P. Kloeden,et al.  Numerical Solution of Stochastic Differential Equations , 1992 .

[18]  Evdokia Nikolova,et al.  A Directed Graph Fourier Transform With Spread Frequency Components , 2018, IEEE Transactions on Signal Processing.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  H. Tamura Decentralized optimization for distributed-lag models of discrete systems , 1975, Autom..

[21]  Qing Ling,et al.  DLM: Decentralized Linearized Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Signal Processing.

[22]  Mingyi Hong,et al.  Stochastic Proximal Gradient Consensus Over Random Networks , 2015, IEEE Transactions on Signal Processing.

[23]  Kai Cai,et al.  Average consensus on general strongly connected digraphs , 2012, Autom..

[24]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[25]  Usman A. Khan,et al.  DEXTRA: A Fast Algorithm for Optimization Over Directed Graphs , 2017, IEEE Transactions on Automatic Control.

[26]  P. Kloeden,et al.  Higher-order implicit strong numerical schemes for stochastic differential equations , 1992 .

[27]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[28]  E Weinan,et al.  Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.

[29]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[30]  Asuman E. Ozdaglar,et al.  Graph balancing for distributed subgradient methods over directed graphs , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[31]  Quanquan Gu,et al.  Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions , 2018, ICML.

[32]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[33]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[34]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[35]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[36]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[37]  Josep M. Guerrero,et al.  Distributed Coordination of Islanded Microgrid Clusters Using a Two-Layer Intermittent Communication Network , 2018, IEEE Transactions on Industrial Informatics.

[38]  Ali H. Sayed,et al.  Exact Diffusion for Distributed Optimization and Learning—Part II: Convergence Analysis , 2017, IEEE Transactions on Signal Processing.

[39]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[40]  Quanquan Gu,et al.  Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms , 2018, AISTATS.

[41]  Xiangfeng Wang,et al.  Asynchronous Distributed ADMM for Large-Scale Optimization—Part I: Algorithm and Convergence Analysis , 2015, IEEE Transactions on Signal Processing.

[42]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[43]  Sandra Hirche,et al.  Distributed Topology Manipulation to Control Epidemic Spreading Over Networks , 2019, IEEE Transactions on Signal Processing.