Asynchronous Decentralized Optimization in Directed Networks

A popular asynchronous protocol for decentralized optimization is randomized gossip where a pair of neighbors concurrently update via pairwise averaging. In practice, this creates deadlocks and is vulnerable to information delays. It can also be problematic if a node is unable to response or has only access to its private-preserved local dataset. To address these issues simultaneously, this paper proposes an asynchronous decentralized algorithm, i.e. APPG, with {\em directed} communication where each node updates {\em asynchronously} and independently of any other node. If local functions are strongly-convex with Lipschitz-continuous gradients, each node of APPG converges to the same optimal solution at a rate of $O(\lambda^k)$, where $\lambda\in(0,1)$ and the virtual counter $k$ increases by 1 no matter on which node updates. The superior performance of APPG is validated on a logistic regression problem against state-of-the-art methods in terms of linear speedup and system implementations.

[1]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[2]  Asuman E. Ozdaglar,et al.  Convergence rate for consensus with delays , 2010, J. Glob. Optim..

[3]  Ali H. Sayed,et al.  Decentralized Consensus Optimization With Asynchrony and Delays , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[4]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[7]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[8]  Martin Jaggi,et al.  COLA: Decentralized Linear Learning , 2018, NeurIPS.

[9]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[10]  Nenghai Yu,et al.  Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.

[11]  Jiaqi Zhang,et al.  AsySPA: An Exact Asynchronous Algorithm for Convex Optimization Over Digraphs , 2018, IEEE Transactions on Automatic Control.

[12]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[13]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[14]  B. Touri Product of Random Stochastic Matrices and Distributed Averaging , 2012 .

[15]  J. Cortés,et al.  When does a digraph admit a doubly stochastic adjacency matrix? , 2010, Proceedings of the 2010 American Control Conference.

[16]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[17]  Usman A. Khan,et al.  Optimization over time-varying directed graphs with row and column-stochastic matrices , 2018, 1810.07393.

[18]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[19]  Usman A. Khan,et al.  A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.

[20]  Wotao Yin,et al.  A2BCD: Asynchronous Acceleration with Optimal Complexity , 2018, ICLR.

[21]  Martin J. Wainwright,et al.  Distributed Dual Averaging In Networks , 2010, NIPS.

[22]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[23]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[24]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[25]  Hamid Reza Feyzmahdavian,et al.  An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[26]  Wei Zhang,et al.  Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.

[27]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[28]  Michael G. Rabbat,et al.  Asynchronous Gradient Push , 2018, IEEE Transactions on Automatic Control.

[29]  Wei Shi,et al.  A Push-Pull Gradient Method for Distributed Optimization in Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[30]  Michael G. Rabbat,et al.  Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[31]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[32]  Nam Sung Kim,et al.  Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training , 2018, NeurIPS.

[33]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.