Stochastic Mirror Descent for Convex Optimization with Consensus Constraints

The mirror descent algorithm is known to be effective in applications where it is beneficial to adapt the mirror map to the underlying geometry of the optimization model. However, the effect of mirror maps on the geometry of distributed optimization problems has not been previously addressed. In this paper we propose and study exact distributed mirror descent algorithms in continuous-time under additive noise and present the settings that enable linear convergence rates. Our analysis draws motivation from the augmented Lagrangian and its relation to gradient tracking. To further explore the benefits of mirror maps in a distributed setting we present a preconditioned variant of our algorithm with an additional mirror map over the Lagrangian dual variables. This allows our method to adapt to the geometry of the consensus manifold and leads to faster convergence. We illustrate the performance of the algorithms in convex settings both with and without constraints. We also explore their performance numerically in a non-convex application with neural networks.

[1]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[2]  Lin Xiao,et al.  Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization , 2020, ICML.

[3]  Arindam Banerjee,et al.  Bregman Alternating Direction Method of Multipliers , 2013, NIPS.

[4]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[5]  Daniel P. Robinson,et al.  ADMM and Accelerated ADMM as Continuous Dynamical Systems , 2018, ICML.

[6]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[7]  Lihua Xie,et al.  Continuous-time Distributed Convex Optimization with Set Constraints , 2014 .

[8]  Maxim Raginsky,et al.  Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[9]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[10]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[11]  Angelia Nedic,et al.  Decentralized online optimization with global objectives and local communication , 2015, 2015 American Control Conference (ACC).

[12]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[13]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[14]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[15]  Grigorios A. Pavliotis,et al.  On stochastic mirror descent with interacting particles: convergence properties and variance reduction , 2020 .

[16]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[17]  Shahin Shahrampour,et al.  Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.

[18]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[19]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[20]  Bahman Gharesifard,et al.  Distributed Continuous-Time Convex Optimization on Weight-Balanced Digraphs , 2012, IEEE Transactions on Automatic Control.

[21]  Katta G. Murty,et al.  Nonlinear Programming Theory and Algorithms , 2007, Technometrics.

[22]  Angelia Nedic,et al.  Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.

[23]  Shu Liang,et al.  Exponential convergence of distributed primal-dual convex optimization algorithm without strong convexity , 2019, Autom..

[24]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[25]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[26]  Na Li,et al.  Stochastic Primal-Dual Method on Riemannian Manifolds of Bounded Sectional Curvature , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[27]  Shahin Shahrampour,et al.  Distributed Mirror Descent with Integral Feedback: Asymptotic Convergence Analysis of Continuous-time Dynamics , 2020, 2021 American Control Conference (ACC).

[28]  Jing Wang,et al.  A control perspective for centralized and distributed convex optimization , 2011, IEEE Conference on Decision and Control and European Control Conference.

[29]  Shahin Shahrampour,et al.  Decentralized Riemannian Gradient Descent on the Stiefel Manifold , 2021, ICML.

[30]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[31]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[32]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[33]  Mathias Staudigl,et al.  On the convergence of gradient-like flows with noisy gradient input , 2016, SIAM J. Optim..

[34]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[35]  Jay A. Farrell,et al.  Distributed Continuous-Time Optimization: Nonuniform Gradient Gains, Finite-Time Convergence, and Convex Constraint Set , 2017, IEEE Transactions on Automatic Control.

[36]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[37]  Yiguang Hong,et al.  Distributed Continuous-Time Algorithm for Constrained Convex Optimizations via Nonsmooth Analysis Approach , 2015, IEEE Transactions on Automatic Control.

[38]  Karl Henrik Johansson,et al.  Network Synchronization with Convexity , 2014, SIAM J. Control. Optim..

[39]  Nicolas Loizou,et al.  Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize , 2021, ArXiv.

[40]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[41]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[42]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.