论文信息 - A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems - 字舞流文

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an $O(1/\epsilon^4)$ iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve $O(1/\epsilon^2)$ for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.

Z. Luo | Jiawei Zhang | Ruoyu Sun | Peijun Xiao

[1] Wotao Yin,et al. Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[2] Guanghui Lan,et al. A Unified Single-loop Alternating Gradient Projection Algorithm for Nonconvex-Concave and Convex-Nonconcave Minimax Problems , 2020, Mathematical programming.

[3] Meisam Razaviyayn,et al. Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2020, SIAM J. Optim..

[4] Francis Bach,et al. On the Convergence of Adam and Adagrad , 2020, ArXiv.

[5] Yurii Nesterov,et al. Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[6] Chuan-Sheng Foo,et al. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[7] Jason D. Lee,et al. Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[8] Mehryar Mohri,et al. Agnostic Federated Learning , 2019, ICML.

[9] Aryan Mokhtari,et al. Distribution-Agnostic Model-Agnostic Meta-Learning , 2020, ArXiv.

[10] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.

[11] Renato D. C. Monteiro,et al. On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[12] J. Zico Kolter,et al. Scaling provable adversarial defenses , 2018, NeurIPS.

[13] N. S. Aybat,et al. Iteration Complexity of Randomized Primal-Dual Methods for Convex-Concave Saddle Point Problems , 2018 .

[14] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.

[15] Anders Forsgren,et al. Interior Methods for Nonlinear Optimization , 2002, SIAM Rev..

[16] Shiqian Ma,et al. Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity , 2018, NeurIPS.

[17] Wotao Yin,et al. Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[18] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] Prateek Jain,et al. Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[21] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[22] Le Song,et al. Smoothed Dual Embedding Control , 2017, ArXiv.

[23] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[24] Jean-Philippe Vial,et al. Robust Optimization , 2021, ICORES.

[25] John C. Duchi,et al. Variance-based Regularization with Convex Objectives , 2016, NIPS.

[26] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[27] Francis R. Bach,et al. Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[28] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[29] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[30] Michael I. Jordan,et al. Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[31] Yongxin Chen,et al. Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[32] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[33] Michael I. Jordan,et al. Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[34] Aryan Mokhtari,et al. A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[35] Mohamed-Jalal Fadili,et al. Local Linear Convergence of Forward-Backward under Partial Smoothness , 2014, NIPS.

[36] Miguel Á. Carreira-Perpiñán,et al. Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application , 2013, ArXiv.

[37] Toniann Pitassi,et al. Fairness through awareness , 2011, ITCS '12.

[38] Michael I. Jordan,et al. On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[39] Mark W. Schmidt,et al. An interior-point stochastic approximation method and an L1-regularized delta rule , 2008, NIPS.

[40] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[41] Patrick T. Harker,et al. Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications , 1990, Math. Program..

[42] Yinyu Ye,et al. Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[43] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[44] Bo Yang,et al. SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems , 2019, ArXiv.

[45] Nathan Srebro,et al. Equality of Opportunity in Supervised Learning , 2016, NIPS.

[46] Zhi-Quan Luo,et al. A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization , 2018, SIAM J. Optim..

[47] Francisco Facchinei,et al. Asynchronous parallel algorithms for nonconvex optimization , 2016, Mathematical Programming.

[48] Wotao Yin,et al. On Nonconvex Decentralized Gradient Descent , 2016, IEEE Transactions on Signal Processing.

[49] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..

[50] Zhouchen Lin,et al. Maximum-and-Concatenation Networks , 2020, ICML.

[51] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[52] Mingrui Liu,et al. Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[53] John C. Duchi,et al. Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[54] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[55] Michael I. Jordan,et al. What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[56] Tony Jebara,et al. Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.