Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant stepsize can potentially diverge even in the convex-concave setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-Łojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.

[1]  Shuzhong Zhang,et al.  On lower iteration complexity bounds for the convex concave saddle point problems , 2019, Math. Program..

[2]  Ya-Ping Hsieh,et al.  The limits of min-max optimization algorithms: convergence to spurious non-critical sets , 2020, ICML.

[3]  Meisam Razaviyayn,et al.  Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2020, SIAM J. Optim..

[4]  Jacob Abernethy,et al.  Last-iterate convergence rates for min-max optimization , 2019, ArXiv.

[5]  Tanner Fiez,et al.  Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation , 2020, ArXiv.

[6]  Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems , 2020, 2007.13605.

[7]  Tanner Fiez,et al.  Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study , 2020, ICML.

[8]  H. Vincent Poor,et al.  Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization , 2020, ArXiv.

[9]  Tianbao Yang,et al.  Fast Objective and Duality Gap Convergence for Non-convex Strongly-concave Min-max Problems , 2020, ArXiv.

[10]  Guanghui Lan,et al.  A Unified Single-loop Alternating Gradient Projection Algorithm for Nonconvex-Concave and Convex-Nonconcave Minimax Problems , 2020, Mathematical programming.

[11]  Babak Barazandeh,et al.  Solving Non-Convex Non-Differentiable Min-Max Games Using Proximal Gradient Method , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[13]  Haishan Ye,et al.  Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems , 2020, NeurIPS.

[14]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[15]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[16]  S. Shankar Sastry,et al.  On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..

[17]  Yujun Li,et al.  A Stochastic Proximal Point Algorithm for Saddle-Point Problems , 2019, ArXiv.

[18]  Georgios Piliouras,et al.  Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent , 2019, COLT.

[19]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[20]  Lillian J. Ratliff,et al.  Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.

[21]  Tamer Basar,et al.  Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games , 2019, NeurIPS.

[22]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[23]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[24]  Yongxin Chen,et al.  On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator , 2019, ArXiv.

[25]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[26]  Ioannis Mitliagkas,et al.  Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[27]  Rong Jin,et al.  Robust Optimization over Multiple Domains , 2018, AAAI.

[28]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[29]  Y. Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[30]  Mingrui Liu,et al.  Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Successive Strongly Monotone Variational Inequalities , 2018 .

[31]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[32]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[33]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[34]  Le Song,et al.  SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.

[35]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[36]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[37]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[38]  Robert S. Chen,et al.  Robust Optimization for Non-Convex Objectives , 2017, NIPS.

[39]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[40]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[41]  Le Song,et al.  Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.

[42]  Mengdi Wang,et al.  Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.

[43]  Alexander J. Smola,et al.  Fast incremental method for smooth nonconvex optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[44]  Yi Zhou,et al.  Geometrical properties and accelerated gradient solvers of non-convex phase retrieval , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[45]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[46]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[47]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[48]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[49]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[50]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[51]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[52]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[53]  Roberto Todeschini,et al.  Prediction of Acute Aquatic Toxicity toward Daphnia Magna by using the GA-kNN Method , 2014, Alternatives to laboratory animals : ATLA.

[54]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[55]  Hui Zhang,et al.  Gradient methods for convex minimization: better rates under weaker conditions , 2013, ArXiv.

[56]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[57]  Yurii Nesterov,et al.  Solving Strongly Monotone Variational and Quasi-Variational Inequalities , 2006 .

[58]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[59]  Laurent El Ghaoui,et al.  Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[60]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[61]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[62]  M. Bacharach Two-person Cooperative Games , 1976 .

[63]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.