On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp con-trasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.

[1]  Eduard A. Gorbunov,et al.  Stochastic Extragradient: General Analysis and Improved Rates , 2021, AISTATS.

[2]  Eduard A. Gorbunov,et al.  Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity , 2021, AISTATS.

[3]  Yura Malitsky,et al.  Stochastic Variance Reduction for Variational Inequality Methods , 2021, COLT.

[4]  J. Renegar,et al.  A Simple Nearly Optimal Restart Scheme For Speeding Up First-Order Methods , 2018, Foundations of Computational Mathematics.

[5]  Robert Mansel Gower,et al.  SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation , 2020, AISTATS.

[6]  Haipeng Luo,et al.  Linear Last-iterate Convergence in Constrained Saddle-point Optimization , 2020, ICLR.

[7]  Sharan Vaswani,et al.  Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, AISTATS.

[8]  Ioannis Mitliagkas,et al.  Stochastic Hamiltonian Gradient Methods for Smooth Games , 2020, ICML.

[9]  Aaron Defazio,et al.  On the convergence of the Stochastic Heavy Ball Method , 2020, ArXiv.

[10]  J. Malick,et al.  Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling , 2020, NeurIPS.

[11]  Noah Golowich,et al.  Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[12]  Ioannis Mitliagkas,et al.  Accelerating Smooth Games by Manipulating Spectral Shapes , 2020, AISTATS.

[13]  Ioannis Mitliagkas,et al.  Linear Lower Bounds and Conditioning of Differentiable Games , 2019, ICML.

[14]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[15]  Peter Richtárik,et al.  Revisiting Stochastic Extragradient , 2019, AISTATS.

[16]  Michael I. Jordan,et al.  On the Adaptivity of Stochastic Gradient-Based Optimization , 2019, SIAM J. Optim..

[17]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[18]  Peter Richtárik,et al.  Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.

[19]  Ioannis Mitliagkas,et al.  A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games , 2020, AISTATS.

[20]  Mark W. Schmidt,et al.  Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.

[21]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[22]  Mathias Staudigl,et al.  Forward-backward-forward methods with variance reduction for stochastic variational inequalities , 2019, ArXiv.

[23]  Peter Richtárik,et al.  SGD: General Analysis and Improved Rates , 2019, ICML 2019.

[24]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[25]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[26]  Mark W. Schmidt,et al.  Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.

[27]  Thomas Hofmann,et al.  Local Saddle Point Optimization: A Curvature Exploitation Approach , 2018, AISTATS.

[28]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[29]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[30]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[31]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[32]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[33]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[34]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[35]  Alfredo N. Iusem,et al.  Extragradient Method with Variance Reduction for Stochastic Variational Inequalities , 2017, SIAM J. Optim..

[36]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[37]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[38]  Mark W. Schmidt Convergence rate of stochastic gradient with constant step size , 2014 .

[39]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[40]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[41]  A. Juditsky,et al.  Large Deviations of Vector-valued Martingales in 2-Smooth Normed Spaces , 2008, 0809.0813.

[42]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[43]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[44]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[45]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[46]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[47]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[48]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.