On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. We first note that the last iterate of the basic SEG method only contracts to a fixed neighborhood of the Nash equilibrium, independent of the step size. This contrasts sharply with the standard setting of minimization where standard stochastic algorithms converge to a neighborhood that vanishes in proportion to the square-root (constant) step size. Under the same setting, however, we prove that when augmented with iteration averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure. In the interpolation setting, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.

[1]  Thomas Hofmann,et al.  Local Saddle Point Optimization: A Curvature Exploitation Approach , 2018, AISTATS.

[2]  Robert Mansel Gower,et al.  SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation , 2020, AISTATS.

[3]  Ioannis Mitliagkas,et al.  Accelerating Smooth Games by Manipulating Spectral Shapes , 2020, AISTATS.

[4]  Sharan Vaswani,et al.  Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, AISTATS.

[5]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[6]  Shuzhong Zhang,et al.  On lower iteration complexity bounds for the convex concave saddle point problems , 2019, Math. Program..

[7]  Mark W. Schmidt,et al.  Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.

[8]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[9]  Mark W. Schmidt Convergence rate of stochastic gradient with constant step size , 2014 .

[10]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[11]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[12]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[13]  J. Renegar,et al.  A Simple Nearly Optimal Restart Scheme For Speeding Up First-Order Methods , 2018, Foundations of Computational Mathematics.

[14]  Noah Golowich,et al.  Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[15]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[16]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[17]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[18]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[19]  Panayotis Mertikopoulos,et al.  Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling , 2020, NeurIPS.

[20]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[21]  Ioannis Mitliagkas,et al.  Stochastic Hamiltonian Gradient Methods for Smooth Games , 2020, ICML.

[22]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[23]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[24]  Mark W. Schmidt,et al.  Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.

[25]  Mathias Staudigl,et al.  Forward-backward-forward methods with variance reduction for stochastic variational inequalities , 2019, ArXiv.

[26]  Aaron Defazio,et al.  On the convergence of the Stochastic Heavy Ball Method , 2020, ArXiv.

[27]  Ioannis Mitliagkas,et al.  A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games , 2020, AISTATS.

[28]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[29]  Peter Richtárik,et al.  Revisiting Stochastic Extragradient , 2019, AISTATS.

[30]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[31]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[32]  Peter Richtárik,et al.  Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.

[33]  Alfredo N. Iusem,et al.  Extragradient Method with Variance Reduction for Stochastic Variational Inequalities , 2017, SIAM J. Optim..

[34]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[35]  Michael I. Jordan,et al.  On the Adaptivity of Stochastic Gradient-Based Optimization , 2019, SIAM J. Optim..

[36]  Peter Richtárik,et al.  SGD: General Analysis and Improved Rates , 2019, ICML 2019.

[37]  Ioannis Mitliagkas,et al.  Linear Lower Bounds and Conditioning of Differentiable Games , 2019, ICML.

[38]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[39]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..