Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training. Albeit its desirable simplicity, recent work shows inferior convergence rates of GDA in theory, even when assuming strong concavity of the objective in terms of one variable. This paper establishes new convergence results for two alternative single-loop algorithms – alternating GDA and smoothed GDA – under the mild assumption that the objective satisfies the PolyakLojasiewicz (PL) condition about one variable. We prove that, to find an -stationary point, (i) alternating GDA and its stochastic variant (without mini batch) respectively require O(κ −2) and O(κ −4) iterations, while (ii) smoothed GDA and its stochastic variant (without mini batch) respectively require O(κ −2) and O(κ −4) iterations. The latter greatly improves over the vanilla GDA and gives the hitherto best known complexity results among single-loop algorithms under similar settings. We further showcase the empirical efficiency of these algorithms in training GANs and robust nonlinear regression.

[1]  Guodong Zhang,et al.  Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization , 2021, AISTATS.

[2]  Haihao Lu,et al.  An O(sr)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(s^r)$$\end{document}-resolution ODE framework for understand , 2020, Mathematical Programming.

[3]  Luo Luo,et al.  Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave Minimax Problem , 2021, ArXiv.

[4]  Babak Barazandeh,et al.  Nonconvex-Nonconcave Min-Max Optimization with a Small Maximization Domain , 2021, ArXiv.

[5]  Zi Xu,et al.  Zeroth-Order Alternating Randomized Gradient Projection Algorithms for General Nonconvex-Concave Minimax Problems , 2021, ArXiv.

[6]  Ioannis Mitliagkas,et al.  Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity , 2021, NeurIPS.

[7]  Heng Huang,et al.  AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization , 2021, AISTATS.

[8]  Tong Zhang,et al.  Near Optimal Stochastic Algorithms for Finite-Sum Unbalanced Convex-Concave Minimax Optimization , 2021, ArXiv.

[9]  Ali Jadbabaie,et al.  Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization , 2021, NeurIPS.

[10]  Niao He,et al.  The Complexity of Nonconvex-Strongly-Concave Minimax Optimization , 2021, UAI.

[11]  Zhihua Zhang,et al.  Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction , 2021, ArXiv.

[12]  Aurélien Lucchi,et al.  Direct-Search for a Class of Stochastic Min-Max Problems , 2021, AISTATS.

[13]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[14]  Constantinos Daskalakis,et al.  The complexity of constrained min-max optimization , 2020, STOC.

[15]  Nisheeth K. Vishnoi,et al.  Greedy adversarial equilibrium: an efficient alternative to nonconvex-nonconcave min-max optimization , 2020, STOC.

[16]  Ya-Ping Hsieh,et al.  The limits of min-max optimization algorithms: convergence to spurious non-critical sets , 2020, ICML.

[17]  Alistair Letcher,et al.  On the Impossibility of Global Convergence in Multi-Loss Optimization , 2020, ICLR.

[18]  Meisam Razaviyayn,et al.  Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2020, SIAM J. Optim..

[19]  Jacob Abernethy,et al.  Last-iterate convergence rates for min-max optimization , 2019, ArXiv.

[20]  Mingrui Liu,et al.  First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems , 2018, J. Mach. Learn. Res..

[21]  W. Yin,et al.  Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems , 2021, NeurIPS.

[22]  Tianbao Yang,et al.  On Stochastic Moving-Average Estimators for Non-Convex Optimization , 2021, ArXiv.

[23]  W. Yin,et al.  A Single-Timescale Stochastic Bilevel Optimization Method , 2021, ArXiv.

[24]  Liang Zhang Variance Reduction for Non-Convex Stochastic Optimization: General Analysis and New Applications , 2021 .

[25]  Yi Zhou,et al.  Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent , 2021, ArXiv.

[26]  Eric V. Mazumdar,et al.  Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games , 2021, NeurIPS.

[27]  Lillian J. Ratliff,et al.  Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation , 2021, ICLR.

[28]  Zebang Shen,et al.  A Federated Learning Framework for Nonconvex-PL Minimax Problems , 2021, ArXiv.

[29]  Tianbao Yang,et al.  Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel Optimization , 2021, ArXiv.

[30]  Zhengyuan Zhou,et al.  Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities , 2021, NeurIPS.

[31]  Z. Luo,et al.  A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems , 2020, NeurIPS.

[32]  Yingbin Liang,et al.  Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms. , 2020 .

[33]  J. Pei,et al.  Accelerated Zeroth-Order Momentum Methods from Mini to Minimax Optimization , 2020, arXiv.org.

[34]  Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems , 2020, 2007.13605.

[35]  Sijia Liu,et al.  Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks , 2020, ICML.

[36]  Zhaoran Wang,et al.  A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, SIAM J. Optim..

[37]  Ioannis Mitliagkas,et al.  Stochastic Hamiltonian Gradient Methods for Smooth Games , 2020, ICML.

[38]  Tianbao Yang,et al.  Fast Objective and Duality Gap Convergence for Non-convex Strongly-concave Min-max Problems , 2020, ArXiv.

[39]  Guanghui Lan,et al.  A Unified Single-loop Alternating Gradient Projection Algorithm for Nonconvex-Concave and Convex-Nonconcave Minimax Problems , 2020, Mathematical programming.

[40]  Mingrui Liu,et al.  Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks , 2020, ICML.

[41]  Renbo Zhao,et al.  A Primal Dual Smoothing Framework for Max-Structured Nonconvex Optimization , 2020, 2003.04375.

[42]  Mikhail Belkin,et al.  Loss landscapes and optimization in over-parameterized non-linear systems and neural networks , 2020, Applied and Computational Harmonic Analysis.

[43]  Wei Liu,et al.  Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization , 2020, NeurIPS.

[44]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[45]  Shiqian Ma,et al.  Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities , 2020, ArXiv.

[46]  Haishan Ye,et al.  Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems , 2020, NeurIPS.

[47]  Mingrui Liu,et al.  Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets , 2019, ICLR.

[48]  Jimmy Ba,et al.  On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach , 2019, ICLR.

[49]  Alexandros G. Dimakis,et al.  SGD Learns One-Layer Networks in WGANs , 2019, ICML.

[50]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[51]  Yongxin Chen,et al.  Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[52]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[53]  S. Shankar Sastry,et al.  On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..

[54]  Yura Malitsky,et al.  Golden ratio algorithms for variational inequalities , 2018, Mathematical Programming.

[55]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[56]  Niao He,et al.  Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, NeurIPS.

[57]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[58]  Pratik Worah,et al.  The landscape of the proximal point method for nonconvex–nonconcave minimax optimization , 2022, Mathematical Programming.

[59]  Quoc Tran-Dinh,et al.  Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function , 2020, NeurIPS.

[60]  John C. Duchi,et al.  Lower bounds for non-convex stochastic optimization , 2019, Mathematical Programming.

[61]  T. Başar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[62]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[63]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[64]  Yongxin Chen,et al.  On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator , 2019, ArXiv.

[65]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[66]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[67]  Thomas Hofmann,et al.  Local Saddle Point Optimization: A Curvature Exploitation Approach , 2018, AISTATS.

[68]  Dmitriy Drusvyatskiy,et al.  Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[69]  Mingrui Liu,et al.  Weakly-convex–concave min–max optimization: provable algorithms and applications in machine learning , 2018, Optim. Methods Softw..

[70]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[71]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[72]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[73]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[74]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[75]  Stephen P. Boyd,et al.  Stochastic Mirror Descent in Variationally Coherent Optimization Problems , 2017, NIPS.

[76]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[77]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[78]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[79]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[80]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[81]  S. Shankar Sastry,et al.  On the Characterization of Local Nash Equilibria in Continuous Games , 2014, IEEE Transactions on Automatic Control.

[82]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[83]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[84]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[85]  S. Shankar Sastry,et al.  Characterization and computation of local Nash equilibria in continuous games , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).