A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an $O(1/\epsilon^4)$ iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve $O(1/\epsilon^2)$ for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.

[1]  Wotao Yin,et al.  Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[2]  Guanghui Lan,et al.  A Unified Single-loop Alternating Gradient Projection Algorithm for Nonconvex-Concave and Convex-Nonconcave Minimax Problems , 2020, Mathematical programming.

[3]  Meisam Razaviyayn,et al.  Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2020, SIAM J. Optim..

[4]  Francis Bach,et al.  On the Convergence of Adam and Adagrad , 2020, ArXiv.

[5]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[6]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[7]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[8]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[9]  Aryan Mokhtari,et al.  Distribution-Agnostic Model-Agnostic Meta-Learning , 2020, ArXiv.

[10]  Le Song,et al.  SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.

[11]  Renato D. C. Monteiro,et al.  On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[12]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[13]  N. S. Aybat,et al.  Iteration Complexity of Randomized Primal-Dual Methods for Convex-Concave Saddle Point Problems , 2018 .

[14]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[15]  Anders Forsgren,et al.  Interior Methods for Nonlinear Optimization , 2002, SIAM Rev..

[16]  Shiqian Ma,et al.  Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity , 2018, NeurIPS.

[17]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[18]  Lihong Li,et al.  Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Le Song,et al.  Smoothed Dual Embedding Control , 2017, ArXiv.

[23]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[24]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[25]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[26]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[27]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[28]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[29]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[30]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[31]  Yongxin Chen,et al.  Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[32]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[33]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[34]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[35]  Mohamed-Jalal Fadili,et al.  Local Linear Convergence of Forward-Backward under Partial Smoothness , 2014, NIPS.

[36]  Miguel Á. Carreira-Perpiñán,et al.  Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application , 2013, ArXiv.

[37]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[38]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[39]  Mark W. Schmidt,et al.  An interior-point stochastic approximation method and an L1-regularized delta rule , 2008, NIPS.

[40]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[41]  Patrick T. Harker,et al.  Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications , 1990, Math. Program..

[42]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[43]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[44]  Bo Yang,et al.  SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems , 2019, ArXiv.

[45]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[46]  Zhi-Quan Luo,et al.  A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization , 2018, SIAM J. Optim..

[47]  Francisco Facchinei,et al.  Asynchronous parallel algorithms for nonconvex optimization , 2016, Mathematical Programming.

[48]  Wotao Yin,et al.  On Nonconvex Decentralized Gradient Descent , 2016, IEEE Transactions on Signal Processing.

[49]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[50]  Zhouchen Lin,et al.  Maximum-and-Concatenation Networks , 2020, ICML.

[51]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[52]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[53]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[54]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[55]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[56]  Tony Jebara,et al.  Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.