Decentralized Stochastic Non-Convex Optimization over Weakly Connected Time-Varying Digraphs

In this paper, we consider decentralized stochastic non-convex optimization over a class of weakly connected digraphs. First, we quantify the convergence behaviors of the weight matrices of this type of digraphs. By leveraging the perturbed push sum protocol and gradient tracking techniques, we propose a decentralized stochastic algorithm that is able to converge to the first-order stationary points of non-convex problems with provable convergence rates. Our digraph structure considered in this work generalizes the existing settings such that the proposed algorithm can be applied to more practical decentralized learning scenarios. Numerical results showcase the strengths of our theory and superiority of the proposed algorithm in decentralized training problems compared with the existing counterparts.

[1]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[2]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[3]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[4]  Alejandro Ribeiro,et al.  Consensus in Ad Hoc WSNs With Noisy Links—Part I: Distributed Estimation of Deterministic Signals , 2008, IEEE Transactions on Signal Processing.

[5]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[6]  Ali H. Sayed,et al.  Diffusion adaptive networks with changing topologies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Chai Wah Wu,et al.  Synchronization and convergence of linear dynamics in random directed networks , 2006, IEEE Transactions on Automatic Control.

[8]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[9]  Usman A. Khan,et al.  ADD-OPT: Accelerated Distributed Directed Optimization , 2016, IEEE Transactions on Automatic Control.

[10]  Luc Moreau,et al.  Stability of multiagent systems with time-dependent communication links , 2005, IEEE Transactions on Automatic Control.

[11]  Michael G. Rabbat,et al.  Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.

[12]  Van Sy Mai,et al.  Linear Convergence in Optimization Over Directed Graphs With Row-Stochastic Matrices , 2016, IEEE Transactions on Automatic Control.

[13]  Songtao Lu,et al.  Learn Electronic Health Records by Fully Decentralized Federated Learning , 2019, ArXiv.

[14]  Yunlong Wang,et al.  Decentralized Federated Learning for Electronic Health Records , 2020, 2020 54th Annual Conference on Information Sciences and Systems (CISS).

[15]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[16]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[17]  José M. F. Moura,et al.  Linear Convergence Rate of a Class of Distributed Augmented Lagrangian Algorithms , 2013, IEEE Transactions on Automatic Control.

[18]  Shenghuo Zhu,et al.  Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.

[19]  Daniel Pérez Palomar,et al.  Distributed nonconvex multiagent optimization over time-varying networks , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[20]  Gesualdo Scutari,et al.  Distributed nonconvex constrained optimization over time-varying digraphs , 2018, Mathematical Programming.

[21]  Chai Wah Wu,et al.  Conditions for weak ergodicity of inhomogeneous Markov chains , 2008 .

[22]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[23]  Songtao Lu,et al.  Sparsity-aware adaptive link combination approach over distributed networks , 2014 .

[24]  Songtao Lu,et al.  GNSD: a Gradient-Tracking Based Nonconvex Stochastic Algorithm for Decentralized Optimization , 2019, 2019 IEEE Data Science Workshop (DSW).

[25]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[26]  Chai Wah Wu On some properties of contracting matrices , 2008 .

[27]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[28]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.