Distributed Online Optimization over a Heterogeneous Network with Any-Batch Mirror Descent

In distributed online optimization over a computing network with heterogeneous nodes, slow nodes can adversely affect the progress of fast nodes, leading to drastic slowdown of the overall convergence process. To address this issue, we consider a new algorithm termed Distributed Any-Batch Mirror Descent (DABMD), which is based on distributed Mirror Descent but uses a fixed per-round computing time to limit the waiting by fast nodes to receive information updates from slow nodes. DABMD is characterized by varying minibatch sizes across nodes. It is applicable to a broader range of problems compared with existing distributed online optimization methods such as those based on dual averaging, and it accommodates time-varying network topology. We study two versions of DABMD, depending on whether the computing nodes average their primal variables via single or multiple consensus iterations. We show that both versions provide strong theoretical performance guarantee, by deriving upperbounds on their expected dynamic regret, which capture the variability in minibatch sizes. Our experimental results show substantial reduction in cost and acceleration in convergence compared with the known best alternative.

[1]  Mehran Mesbahi,et al.  Online distributed optimization via dual averaging , 2013, 52nd IEEE Conference on Decision and Control.

[2]  Jorge Cortés,et al.  Distributed Online Convex Optimization Over Jointly Connected Digraphs , 2014, IEEE Transactions on Network Science and Engineering.

[3]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[4]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[5]  Maxim Raginsky,et al.  Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[6]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[7]  Jaya Prakash Champati,et al.  Single restart with time stamps for computational offloading in a semi-online setting , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[8]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[9]  Tingwen Huang,et al.  Cooperative Distributed Optimization in Multiagent Networks With Delays , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Michael G. Rabbat,et al.  Distributed dual averaging for convex optimization under communication delays , 2012, 2012 American Control Conference (ACC).

[11]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[12]  Ben Liang,et al.  Distributed Online Optimization over a Heterogeneous Network , 2020, ICML 2020.

[13]  Parijat Dube,et al.  Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.

[14]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[15]  Lachlan L. H. Andrew,et al.  Dynamic Right-Sizing for Power-Proportional Data Centers , 2011, IEEE/ACM Transactions on Networking.

[16]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[17]  Yiguang Hong,et al.  Distributed Stochastic Mirror Descent Algorithm Over Time-varying Network , 2018, 2018 IEEE 14th International Conference on Control and Automation (ICCA).

[18]  Shengyuan Xu,et al.  Distributed dual averaging method for multi-agent optimization with quantized communication , 2012, Syst. Control. Lett..

[19]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[20]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[21]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[22]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[23]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[24]  Sonia Fahmy,et al.  Competitive Online Convex Optimization with Switching Costs and Ramp Constraints , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[25]  A. Odlyzko,et al.  Bounds for eigenvalues of certain stochastic matrices , 1981 .

[26]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[27]  Daniel W. C. Ho,et al.  Optimal distributed stochastic mirror descent for strongly convex optimization , 2016, Autom..

[28]  Sébastien Lafond,et al.  Video transcoding time prediction for proactive load balancing , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[29]  Michael G. Rabbat,et al.  Efficient Distributed Online Prediction and Stochastic Optimization With Approximate Distributed Averaging , 2014, IEEE Transactions on Signal and Information Processing over Networks.

[30]  Rebecca Willett,et al.  Online Convex Optimization in Dynamic Environments , 2015, IEEE Journal of Selected Topics in Signal Processing.

[31]  Angelia Nedic,et al.  Decentralized online optimization with global objectives and local communication , 2015, 2015 American Control Conference (ACC).

[32]  Zhao Yang Dong,et al.  Distributed mirror descent method for multi-agent optimization with delay , 2016, Neurocomputing.

[33]  Heinz H. Bauschke,et al.  Joint and Separate Convexity of the Bregman Distance , 2001 .

[34]  Michael G. Rabbat,et al.  Multi-agent mirror descent for decentralized stochastic optimization , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[35]  Nuwan S. Ferdinand,et al.  Anytime Minibatch: Exploiting Stragglers in Online Distributed Optimization , 2020, ICLR.

[36]  Jinfeng Yi,et al.  Improved Dynamic Regret for Non-degenerate Functions , 2016, NIPS.

[37]  Qiong Wu,et al.  Distributed Mirror Descent over Directed Graphs , 2014, ArXiv.

[38]  Shahin Shahrampour,et al.  Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.