Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

Distributed and federated learning algorithms and techniques associated primarily with minimization problems. However, with the increase of minimax optimization and variational inequality problems in machine learning, the necessity of designing efficient distributed/federated learning approaches for these problems is becoming more apparent. In this paper, we provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs). Our approach is based on a general key assumption on the stochastic estimates that allows us to propose and analyze several novel local training algorithms under a single framework for solving a class of structured non-monotone VIPs. We present the first local gradient descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data. The general algorithmic framework recovers state-of-the-art algorithms and their sharp convergence guarantees when the setting is specialized to minimization or minimax optimization problems. Finally, we demonstrate the strong performance of the proposed algorithms compared to state-of-the-art methods when solving federated minimax optimization problems.

[1]  Eduard A. Gorbunov,et al.  Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions , 2023, ArXiv.

[2]  V. Cevher,et al.  Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems , 2023, ICLR.

[3]  V. Cevher,et al.  Solving stochastic weak Minty variational inequalities without increasing batch size , 2023, ICLR.

[4]  L. Rutkowski,et al.  Global Nash Equilibrium in Non-convex Multi-player Game: Theory and Algorithms , 2023, ArXiv.

[5]  Sulaiman A. Alghunaim,et al.  An Enhanced Gradient-Tracking Bound for Distributed Online Stochastic Convex Optimization , 2023, 2301.02855.

[6]  Sebastian U. Stich,et al.  Decentralized Gradient Tracking with Local Steps , 2023, Optimization Methods and Software.

[7]  Feihu Huang Adaptive Federated Minimax Optimization with Lower complexities , 2022, ArXiv.

[8]  Eduard A. Gorbunov,et al.  Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity , 2022, 2210.13831.

[9]  Kai Yi,et al.  Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning , 2022, NeurIPS.

[10]  J. Z. Kolter,et al.  A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games , 2022, ArXiv.

[11]  Ermin Wei,et al.  A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning , 2022, NeurIPS.

[12]  N. S. Aybat,et al.  SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems , 2022, NeurIPS.

[13]  Davoud Ataee Tarzanagh,et al.  FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization , 2022, ICML.

[14]  P. Varshney,et al.  Federated Minimax Optimization: Improved Convergence Analyses and Algorithms , 2022, ICML.

[15]  Sebastian U. Stich,et al.  ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! , 2022, ICML.

[16]  Eduard A. Gorbunov,et al.  Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods , 2022, AISTATS.

[17]  Sebastian U. Stich,et al.  An Improved Analysis of Gradient Tracking for Decentralized Machine Learning , 2022, NeurIPS.

[18]  A. Bohm Solving Nonconvex-Nonconcave Min-Max Problems exhibiting Weak Minty Solutions , 2022, 2201.12247.

[19]  Niao He,et al.  Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization , 2022, AISTATS.

[20]  Antonio Orvieto,et al.  Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity , 2021, AISTATS.

[21]  Eduard A. Gorbunov,et al.  Stochastic Extragradient: General Analysis and Improved Rates , 2021, AISTATS.

[22]  Eduard A. Gorbunov,et al.  Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity , 2021, AISTATS.

[23]  Suhas Diggavi,et al.  A Field Guide to Federated Optimization , 2021, ArXiv.

[24]  Nicolas Le Roux,et al.  On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging , 2021, AISTATS.

[25]  Ioannis Mitliagkas,et al.  Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity , 2021, NeurIPS.

[26]  Sebastian U. Stich,et al.  Decentralized Local Stochastic Extra-Gradient for Variational Inequalities , 2021, NeurIPS.

[27]  Michael I. Jordan,et al.  Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization , 2021, AISTATS.

[28]  Mehrdad Mahdavi,et al.  Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency , 2021, AISTATS.

[29]  Yura Malitsky,et al.  Stochastic Variance Reduction for Variational Inequality Methods , 2021, COLT.

[30]  Sewoong Oh,et al.  Efficient Algorithms for Federated Saddle Point Optimization , 2021, ArXiv.

[31]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[32]  A. Gasnikov,et al.  Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust Algorithms , 2020, 2010.13112.

[33]  Constantinos Daskalakis,et al.  The complexity of constrained min-max optimization , 2020, STOC.

[34]  Ioannis Mitliagkas,et al.  Stochastic Hamiltonian Gradient Methods for Smooth Games , 2020, ICML.

[35]  Robert M. Gower,et al.  Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization , 2020, Journal of Optimization Theory and Applications.

[36]  J. Malick,et al.  Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling , 2020, NeurIPS.

[37]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[38]  Ohad Shamir,et al.  Is Local SGD Better than Minibatch SGD? , 2020, ICML.

[39]  Asuman Ozdaglar,et al.  An Optimal Multistage Stochastic Gradient Method for Minimax Problems , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[40]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[41]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[42]  Peter Richtárik,et al.  Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2019, AISTATS.

[43]  Aaron Sidford,et al.  Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond , 2019, COLT.

[44]  M. Fardad,et al.  Adversarial Attack Generation Empowered by Min-Max Optimization , 2019, NeurIPS.

[45]  Jacob Abernethy,et al.  Last-iterate convergence rates for min-max optimization , 2019, ArXiv.

[46]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[47]  Peter Richtárik,et al.  A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent , 2019, AISTATS.

[48]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[49]  Michael G. Rabbat,et al.  Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.

[50]  Ioannis Mitliagkas,et al.  Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[51]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[52]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[53]  Francesca Parise,et al.  A variational inequality framework for network games: Existence, uniqueness, convergence and sensitivity analysis , 2017, Games Econ. Behav..

[54]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[55]  Lacra Pavel,et al.  A distributed primal-dual algorithm for computation of generalized Nash equilibria with shared affine coupling constraints via operator splitting methods , 2017, ArXiv.

[56]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[57]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[58]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[59]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[60]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[61]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[62]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[63]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[64]  Yang Cai,et al.  On minmax theorems for multiplayer games , 2011, SODA '11.

[65]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[66]  Laurent El Ghaoui,et al.  Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[67]  L. Popov A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[68]  G. Stampacchia,et al.  On some non-linear elliptic differential-functional equations , 1966 .

[69]  Siqi Zhang ProxSkip for Stochastic Variational Inequalities: A Federated Learning Algorithm for Provable Communication Acceleration , 2022 .

[70]  Crist'obal Guzm'an,et al.  A Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusion Problems , 2022, ArXiv.

[71]  Haibo Yang,et al.  SAGDA: Achieving O(ε-2) Communication Complexity in Federated Min-Max Learning , 2022, arXiv.org.

[72]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[73]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[74]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[75]  J. Neumann,et al.  Theory of games and economic behavior, 2nd rev. ed. , 1947 .