Distributed mirror descent for stochastic learning over rate-limited networks

We present and analyze two algorithms — termed distributed stochastic approximation mirror descent (D-SAMD) and accelerated distributed stochastic approximation mirror descent (AD-SAMD)—for distributed, stochastic optimization from high-rate data streams over rate-limited networks. Devices contend with fast streaming rates by mini-batching samples in the data stream, and they collaborate via distributed consensus to compute variance-reduced averages of distributed subgradients. This induces a trade-off: Mini-batching slows down the effective streaming rate, but may also slow down convergence. We present two theoretical contributions that characterize this trade-off: (i) bounds on the convergence rates of D-SAMD and AD-SAMD, and (ii) sufficient conditions for order-optimum convergence of D-SAMD and AD-SAMD, in terms of the network size/topology and the ratio of the data streaming and communication rates. We find that AD-SAMD achieves order-optimum convergence in a larger regime than D-SAMD. We demonstrate the effectiveness of the proposed algorithms using numerical experiments.

[1]  José M. F. Moura,et al.  Linear Convergence Rate of a Class of Distributed Augmented Lagrangian Algorithms , 2013, IEEE Transactions on Automatic Control.

[2]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[3]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[4]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[5]  Aryan Mokhtari,et al.  DSA: Decentralized Double Stochastic Averaging Gradient Algorithm , 2015, J. Mach. Learn. Res..

[6]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[7]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[8]  Michael G. Rabbat,et al.  Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[9]  Dimitris S. Papailiopoulos,et al.  Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..

[10]  Stephen P. Boyd,et al.  A scheme for robust distributed sensor fusion based on average consensus , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[11]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[12]  Angelia Nedic,et al.  Distributed Asynchronous Constrained Stochastic Optimization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[13]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.