Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions

Single-call stochastic extragradient methods, like stochastic past extragradient (SPEG) and stochastic optimistic gradient (SOG), have gained a lot of interest in recent years and are one of the most efficient algorithms for solving large-scale min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, despite their undoubted popularity, current convergence analyses of SPEG and SOG require a bounded variance assumption. In addition, several important questions regarding the convergence properties of these methods are still open, including mini-batching, efficient step-size selection, and convergence guarantees under different sampling strategies. In this work, we address these questions and provide convergence guarantees for two large classes of structured non-monotone VIPs: (i) quasi-strongly monotone problems (a generalization of strongly monotone problems) and (ii) weak Minty variational inequalities (a generalization of monotone and Minty VIPs). We introduce the expected residual condition, explain its benefits, and show how it can be used to obtain a strictly weaker bound than previously used growth conditions, expected co-coercivity, or bounded variance assumptions. Equipped with this condition, we provide theoretical guarantees for the convergence of single-call extragradient methods for different step-size selections, including constant, decreasing, and step-size-switching rules. Furthermore, our convergence analysis holds under the arbitrary sampling paradigm, which includes importance sampling and various mini-batching strategies as special cases.

[1]  V. Cevher,et al.  Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems , 2023, ICLR.

[2]  Eduard A. Gorbunov,et al.  Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity , 2022, 2210.13831.

[3]  Eduard A. Gorbunov,et al.  Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems - Survey , 2022, European Mathematical Society Magazine.

[4]  A. Gasnikov,et al.  Compression and Data Similarity: Combination of Two Techniques for Communication-Efficient Solving of Distributed Variational Inequalities , 2022, OPTIMA.

[5]  J. Z. Kolter,et al.  A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games , 2022, ArXiv.

[6]  Eduard A. Gorbunov,et al.  Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top , 2022, ICLR.

[7]  Eduard A. Gorbunov,et al.  Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods , 2022, AISTATS.

[8]  A. Bohm Solving Nonconvex-Nonconcave Min-Max Problems exhibiting Weak Minty Solutions , 2022, 2201.12247.

[9]  Eduard A. Gorbunov,et al.  Stochastic Extragradient: General Analysis and Improved Rates , 2021, AISTATS.

[10]  Nicolas Loizou,et al.  Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize , 2021, ArXiv.

[11]  Eduard A. Gorbunov,et al.  Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity , 2021, AISTATS.

[12]  Alexander Tyurin,et al.  Permutation Compressors for Provably Faster Distributed Nonconvex Optimization , 2021, ArXiv.

[13]  Nicolas Le Roux,et al.  On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging , 2021, AISTATS.

[14]  Ioannis Mitliagkas,et al.  Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity , 2021, NeurIPS.

[15]  Sucheol Lee,et al.  Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems , 2021, NeurIPS.

[16]  Michael I. Jordan,et al.  Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization , 2021, AISTATS.

[17]  Daniel J. Singer To the Best of Our Knowledge , 2021, The Philosophical Review.

[18]  Zhengyuan Zhou,et al.  Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities , 2021, NeurIPS.

[19]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[20]  Noah Golowich,et al.  Tight last-iterate convergence rates for no-regret learning in multi-player games , 2020, NeurIPS.

[21]  Constantinos Daskalakis,et al.  The complexity of constrained min-max optimization , 2020, STOC.

[22]  Adam Lerer,et al.  Combining Deep Reinforcement Learning and Search for Imperfect-Information Games , 2020, NeurIPS.

[23]  Ioannis Mitliagkas,et al.  Stochastic Hamiltonian Gradient Methods for Smooth Games , 2020, ICML.

[24]  Robert M. Gower,et al.  Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization , 2020, Journal of Optimization Theory and Applications.

[25]  Robert Mansel Gower,et al.  SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation , 2020, AISTATS.

[26]  J. Malick,et al.  Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling , 2020, NeurIPS.

[27]  Sharan Vaswani,et al.  Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, AISTATS.

[28]  Michael I. Jordan,et al.  Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games , 2020, ICML.

[29]  Mingrui Liu,et al.  Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets , 2019, ICLR.

[30]  J. Malick,et al.  On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[31]  Francis Bach,et al.  Towards closing the gap between the theory and practice of SVRG , 2019, NeurIPS.

[32]  Sebastian U. Stich,et al.  Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.

[33]  M. Fardad,et al.  Adversarial Attack Generation Empowered by Min-Max Optimization , 2019, NeurIPS.

[34]  Jacob Abernethy,et al.  Last-iterate convergence rates for min-max optimization , 2019, ArXiv.

[35]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[36]  Sanjeev Arora,et al.  Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.

[37]  Peter Richtárik,et al.  Revisiting Stochastic Extragradient , 2019, AISTATS.

[38]  Kun Yuan,et al.  ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs , 2019, ArXiv.

[39]  Mark W. Schmidt,et al.  Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.

[40]  Peter Richtárik,et al.  Revisiting Randomized Gossip Algorithms: General Framework, Convergence Rates and Novel Block and Accelerated Protocols , 2019, IEEE Transactions on Information Theory.

[41]  Peter Richtárik,et al.  Convergence Analysis of Inexact Randomized Iterative Methods , 2019, SIAM J. Sci. Comput..

[42]  Heinz H. Bauschke,et al.  Generalized monotone operators and their averaged resolvents , 2019, Mathematical Programming.

[43]  Peter Richtárik,et al.  SGD: General Analysis and Improved Rates , 2019, ICML 2019.

[44]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[45]  Peter Richtárik,et al.  SAGA with Arbitrary Sampling , 2019, ICML.

[46]  Mingrui Liu,et al.  First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems , 2018, J. Mach. Learn. Res..

[47]  Mark W. Schmidt,et al.  Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.

[48]  Peter Richtárik,et al.  Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches , 2018, AISTATS.

[49]  Peter Richt'arik,et al.  Nonconvex Variance Reduced Optimization with Arbitrary Sampling , 2018, ICML.

[50]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[51]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[52]  Georg Martius,et al.  L4: Practical loss-based stepsize adaptation for deep learning , 2018, NeurIPS.

[53]  Peter Richtárik,et al.  Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.

[54]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[55]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[56]  Peter Richtárik,et al.  Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory , 2017, SIAM J. Matrix Anal. Appl..

[57]  Peter Richtárik,et al.  A new perspective on randomized gossip algorithms , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[58]  Zhengyuan Zhou,et al.  Learning in games with continuous action sets and unknown payoff functions , 2016, Mathematical Programming.

[59]  Y. Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[60]  Peter Richtárik,et al.  Coordinate descent with arbitrary sampling I: algorithms and complexity† , 2014, Optim. Methods Softw..

[61]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[62]  Guanghui Lan,et al.  On the convergence properties of non-Euclidean extragradient methods for variational inequalities with generalized monotone operators , 2013, Comput. Optim. Appl..

[63]  Peter Richtárik,et al.  On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[64]  Francisco Facchinei,et al.  Generalized Nash Equilibrium Problems , 2010, Ann. Oper. Res..

[65]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[66]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[67]  M. Solodov,et al.  A Hybrid Approximate Extragradient – Proximal Point Algorithm Using the Enlargement of a Maximal Monotone Operator , 1999 .

[68]  L. Popov A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[69]  Yang Cai,et al.  Tight Last-Iterate Convergence of the Extragradient Method for Constrained Monotone Variational Inequalities , 2022, ArXiv.

[70]  Quoc Tran-Dinh,et al.  Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function , 2020, NeurIPS.

[71]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[72]  Patrick L. Combettes,et al.  Proximal Methods for Cohypomonotone Operators , 2004, SIAM J. Control. Optim..