Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.

[1]  Alexey S. Matveev,et al.  Diffusion-Based Distributed Parameter Estimation Through Directed Graphs With Switching Topology: Application of Dynamic Regressor Extension and Mixing , 2022, IEEE Transactions on Automatic Control.

[2]  Eduard A. Gorbunov,et al.  Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods , 2022, AISTATS.

[3]  Sulaiman A. Alghunaim,et al.  A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning , 2021, IEEE Transactions on Signal Processing.

[4]  César A. Uribe,et al.  Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization , 2021, EURO J. Comput. Optim..

[5]  Kevin A. Lai,et al.  Higher-order methods for convex-concave min-max optimization and monotone variational inequalities , 2020, SIAM J. Optim..

[6]  Eduard A. Gorbunov,et al.  An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization , 2018, SIAM J. Optim..

[7]  Zehao Dou,et al.  On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization , 2021, ArXiv.

[8]  George Michailidis,et al.  A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems , 2021, Signal Process..

[9]  George Michailidis,et al.  Solving a Class of Non-Convex Min-Max Games Using Adaptive Momentum Methods , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  P. Dvurechensky,et al.  An Accelerated Method For Decentralized Distributed Stochastic Optimization Over Time-Varying Graphs , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[11]  Alexander Gasnikov,et al.  An Accelerated Second-Order Method for Distributed Stochastic Optimization , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[12]  Mehrdad Mahdavi,et al.  Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency , 2021, AISTATS.

[13]  P. Dvurechensky,et al.  Decentralized Distributed Optimization for Saddle Point Problems , 2021, ArXiv.

[14]  Sewoong Oh,et al.  Efficient Algorithms for Federated Saddle Point Optimization , 2021, ArXiv.

[15]  P. Dvurechensky,et al.  Newton Method over Networks is Fast up to the Statistical Precision , 2021, ICML.

[16]  Sebastian U. Stich,et al.  Consensus Control for Decentralized Deep Learning , 2021, ICML.

[17]  Martin Jaggi,et al.  Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data , 2021, ICML.

[18]  Ohad Shamir,et al.  The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication , 2021, COLT.

[19]  Eduard A. Gorbunov,et al.  Local SGD: Unified Theory and New Efficient Methods , 2020, AISTATS.

[20]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[21]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[22]  Angelia Nedic,et al.  Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.

[23]  Wenhan Xian,et al.  A Faster Decentralized Algorithm for Nonconvex Minimax Problems , 2021, NeurIPS.

[24]  Mrityunjoy Chakraborty,et al.  A decentralized algorithm for large scale min-max problems , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[25]  Pavel Dvurechensky,et al.  Distributed Optimization with Quantization for Computing Wasserstein Barycenters , 2020, 2010.14325.

[26]  Aaron Sidford,et al.  Efficiently Solving MDPs with Stochastic Mirror Descent , 2020, ICML.

[27]  Nathan Srebro,et al.  Minibatch vs Local SGD for Heterogeneous Distributed Learning , 2020, NeurIPS.

[28]  Mingyi Hong,et al.  Decentralized Min-Max Optimization: Formulations, Algorithms and Applications in Network Poisoning Attack , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[30]  J. Malick,et al.  Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling , 2020, NeurIPS.

[31]  Lin Xiao,et al.  Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization , 2020, ICML.

[32]  Ohad Shamir,et al.  Is Local SGD Better than Minibatch SGD? , 2020, ICML.

[33]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[34]  Mingrui Liu,et al.  Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets , 2019, ICLR.

[35]  Mingrui Liu,et al.  Decentralized Parallel Algorithm for Training Generative Adversarial Nets , 2019, ArXiv.

[36]  Peter Richtárik,et al.  Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2019, AISTATS.

[37]  Peter Richtárik,et al.  Revisiting Stochastic Extragradient , 2019, AISTATS.

[38]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[39]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[40]  Kimon Antonakopoulos,et al.  An adaptive Mirror-Prox method for variational inequalities with singular operators , 2019, NeurIPS.

[41]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[42]  Aryan Mokhtari,et al.  A Decentralized Proximal Point-type Method for Saddle Point Problems , 2019, ArXiv.

[43]  Sebastian U. Stich,et al.  Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.

[44]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[45]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[46]  Yi Zhou,et al.  SGD Converges to Global Minimum in Deep Learning via Star-convex Path , 2019, ICLR.

[47]  Michael G. Rabbat,et al.  Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.

[48]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[49]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[50]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[51]  Uday V. Shanbhag,et al.  Optimal stochastic extragradient schemes for pseudomonotone stochastic variational inequality problems and their variants , 2014, Computational Optimization and Applications.

[52]  Ji,et al.  DeepSqueeze : Decentralization Meets Error-Compensated Compression , 2019 .

[53]  Jianyu Wang,et al.  Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.

[54]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[55]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[56]  Yuanzhi Li,et al.  An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.

[57]  Lei Guo,et al.  Analysis of Distributed Adaptive Filters Based on Diffusion Strategies Over Sensor Networks , 2018, IEEE Transactions on Automatic Control.

[58]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[59]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[60]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[61]  Rachid Guerraoui,et al.  Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.

[62]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[63]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[64]  Alfredo N. Iusem,et al.  Extragradient Method with Variance Reduction for Stochastic Variational Inequalities , 2017, SIAM J. Optim..

[65]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[66]  Marc Tommasi,et al.  Decentralized Collaborative Learning of Personalized Models over Networks , 2016, AISTATS.

[67]  Alexander Olshevsky,et al.  A geometrically convergent method for distributed optimization over time-varying graphs , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[68]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[69]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[70]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[71]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[72]  Wei Shi,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, SIAM J. Optim..

[73]  Guanghui Lan,et al.  On the convergence properties of non-Euclidean extragradient methods for variational inequalities with generalized monotone operators , 2013, Comput. Optim. Appl..

[74]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[75]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[76]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[77]  Ali H. Sayed,et al.  Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior , 2013, IEEE Signal Processing Magazine.

[78]  Asuman E. Ozdaglar,et al.  Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[79]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[80]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[81]  Tony F. Chan,et al.  A General Framework for a Class of First Order Primal-Dual Algorithms for Convex Optimization in Imaging Science , 2010, SIAM J. Imaging Sci..

[82]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[83]  John N. Tsitsiklis,et al.  On distributed averaging algorithms and quantization effects , 2007, 2008 47th IEEE Conference on Decision and Control.

[84]  Laurent El Ghaoui,et al.  Robust Optimization , 2021, ICORES.

[85]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[86]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[87]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[88]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[89]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[90]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[91]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[92]  A. Jadbabaie,et al.  Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[93]  D. Kinderlehrer,et al.  An introduction to variational inequalities and their applications , 1980 .

[94]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[95]  G. Minty Monotone (nonlinear) operators in Hilbert space , 1962 .