Fast Objective and Duality Gap Convergence for Non-convex Strongly-concave Min-max Problems

This paper focuses on stochastic methods for solving smooth non-convex strongly-concave min-max problems, which have received increasing attention due to their potential applications in deep learning (e.g., deep AUC maximization). However, most of the existing algorithms are slow in practice, and their analysis revolves around the convergence to a nearly stationary point. We consider leveraging the Polyak-Łojasiewicz (PL) condition to design faster stochastic algorithms with stronger convergence guarantee. Although PL condition has been utilized for designing many stochastic minimization algorithms, their applications for non-convex min-max optimization remains rare. In this paper, we propose and analyze proximal epoch-based methods, and establish fast convergence in terms of both {\bf the primal objective gap and the duality gap}. Our analysis is interesting in threefold: (i) it is based on a novel Lyapunov function that consists of the primal objective gap and the duality gap of a regularized function; (ii) it only requires a weaker PL condition for establishing the primal objective convergence than that required for the duality gap convergence; (iii) it yields the optimal dependence on the accuracy level $\epsilon$, i.e., $O(1/\epsilon)$. We also make explicit the dependence on the problem parameters and explore regions of weak convexity parameter that lead to improved dependence on condition numbers. Experiments on deep AUC maximization demonstrate the effectiveness of our methods. Our method (MaxAUC) achieved an AUC of 0.922 on private testing set on {\bf CheXpert competition}.

[1]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[2]  Hieu H. Pham,et al.  Interpreting Chest X-rays via CNNs that Exploit Hierarchical Disease Dependencies and Uncertainty Labels , 2020, Neurocomputing.

[3]  J. Pei,et al.  Accelerated Zeroth-Order Momentum Methods from Mini to Minimax Optimization , 2020, arXiv.org.

[4]  Mingrui Liu,et al.  Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate , 2018, ICML.

[5]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[6]  Yan Yan,et al.  Stagewise Training Accelerates Convergence of Testing Error Over SGD , 2018, NeurIPS.

[7]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[8]  Heng Huang,et al.  Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems , 2021, NeurIPS.

[9]  Jian Li,et al.  A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization , 2018, NeurIPS.

[10]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[11]  Mingrui Liu,et al.  Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks , 2020, ICML.

[12]  Enhong Chen,et al.  Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions , 2018, ICLR.

[13]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[14]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[15]  Dimitris S. Papailiopoulos,et al.  Stability and Generalization of Learning Algorithms that Converge to Global Optima , 2017, ICML.

[16]  Léon Bottou,et al.  On the Ineffectiveness of Variance Reduced Optimization for Deep Learning , 2018, NeurIPS.

[17]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[18]  Xiaodong Cui,et al.  Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets , 2020, ICLR.

[19]  Milan Sonka,et al.  Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification , 2020, ArXiv.

[20]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[21]  SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms , 2018, 1810.10690.

[22]  Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems , 2020, 2007.13605.

[23]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[24]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[27]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[28]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[29]  Guangzeng Xie,et al.  Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction , 2021, ArXiv.

[30]  Ha Q. Nguyen,et al.  Interpreting chest X-rays via CNNs that exploit disease dependencies and uncertainty labels , 2019, ArXiv.

[31]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[32]  Jieping Ye,et al.  Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning , 2020, ArXiv.

[33]  Guanghui Lan,et al.  Accelerated Stochastic Algorithms for Nonconvex Finite-sum and Multi-block Optimization , 2018, 1805.05411.

[34]  Tianbao Yang,et al.  An Online Method for A Class of Distributionally Robust Optimization with Non-convex Objectives , 2020, NeurIPS.

[35]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[36]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[37]  Tianbao Yang,et al.  Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number , 2019, ICML.

[38]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[39]  A Primal Dual Smoothing Framework for Max-Structured Nonconvex Optimization , 2020, 2003.04375.

[40]  Niao He,et al.  Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, NeurIPS.

[41]  Wenhan Xian,et al.  A Faster Decentralized Algorithm for Nonconvex Minimax Problems , 2021, NeurIPS.

[42]  Niao He,et al.  The Complexity of Nonconvex-Strongly-Concave Minimax Optimization , 2021, UAI.

[43]  Yi Zhou,et al.  Characterization of Gradient Dominance and Regularity Conditions for Neural Networks , 2017, ArXiv.

[44]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[45]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[46]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[47]  P. Bernhard,et al.  On a theorem of Danskin with an application to a theorem of Von Neumann-Sion , 1995 .

[48]  Meisam Razaviyayn,et al.  Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2021, SIAM J. Optim..

[49]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[50]  Mingrui Liu,et al.  Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Successive Strongly Monotone Variational Inequalities , 2018 .

[51]  H. Vincent Poor,et al.  Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization , 2020, ArXiv.

[52]  Jason D. Lee,et al.  Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition , 2018, ArXiv.

[53]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[54]  Siwei Lyu,et al.  Stochastic Online AUC Maximization , 2016, NIPS.

[55]  Wei Hu,et al.  A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.

[56]  Tong Zhang,et al.  Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems , 2020, NeurIPS.

[57]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[58]  Lillian J. Ratliff,et al.  Minimax Optimization with Smooth Algorithmic Adversaries , 2021, ICLR.

[59]  Quoc Tran-Dinh,et al.  Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function , 2020, NeurIPS.

[60]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[61]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[62]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[63]  Wei Liu,et al.  Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization , 2020, NeurIPS.

[64]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[65]  Mingrui Liu,et al.  Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Weakly-Monotone Variational Inequality , 2018 .

[66]  Cheng Chen,et al.  Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave Minimax Problem , 2021, ArXiv.

[67]  Yongxin Chen,et al.  Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[68]  Shiqian Ma,et al.  Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities , 2020, ArXiv.

[69]  Tianbao Yang,et al.  Stability and Generalization of Stochastic Gradient Methods for Minimax Problems , 2021, ICML.

[70]  Tengyu Ma,et al.  Identity Matters in Deep Learning , 2016, ICLR.

[71]  Mingrui Liu,et al.  Stochastic AUC Maximization with Deep Neural Networks , 2019, ICLR.

[72]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[73]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[74]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[75]  Jie Liu,et al.  Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[76]  Niao He,et al.  Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, ArXiv.