Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis

Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $\theta \in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $\epsilon$-game-stationary points and $\epsilon$-optimization-stationary points of the problems of interest in $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when $\theta \in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $\theta=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.

[1]  Xiaoqi Yang,et al.  Linearized Proximal Algorithms with Adaptive Stepsizes for Convex Composite Optimization with Applications , 2023, Applied Mathematics & Optimization.

[2]  Antonio Orvieto,et al.  Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity , 2021, AISTATS.

[3]  Ali Jadbabaie,et al.  Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization , 2021, NeurIPS.

[4]  Niao He,et al.  The Complexity of Nonconvex-Strongly-Concave Minimax Optimization , 2021, UAI.

[5]  Yi Zhou,et al.  Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry , 2021, ICLR.

[6]  Z. Luo,et al.  A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems , 2020, NeurIPS.

[7]  John C. Duchi,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[8]  Constantinos Daskalakis,et al.  The complexity of constrained min-max optimization , 2020, STOC.

[9]  A. Bohm,et al.  Alternating Proximal-Gradient Steps for (Stochastic) Nonconvex-Concave Minimax Problems , 2020, SIAM J. Optim..

[10]  Anthony Man-Cho So,et al.  Understanding Notions of Stationarity in Nonsmooth Optimization: A Guided Tour of Various Constructions of Subdifferential for Nonsmooth Functions , 2020, IEEE Signal Processing Magazine.

[11]  Maziar Sanjabi,et al.  Nonconvex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances , 2020, IEEE Signal Processing Magazine.

[12]  Guanghui Lan,et al.  A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems , 2020, Mathematical Programming.

[13]  Babak Barazandeh,et al.  Solving Non-Convex Non-Differentiable Min-Max Games Using Proximal Gradient Method , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Meisam Razaviyayn,et al.  Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2020, SIAM J. Optim..

[15]  S. Shakkottai,et al.  Task-Robust Model-Agnostic Meta-Learning , 2020, NeurIPS.

[16]  Richard Nock,et al.  Generalised Lipschitz Regularisation Equals Distributional Robustness , 2020, ICML.

[17]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[18]  Sanjay Mehrotra,et al.  Frameworks and Results in Distributionally Robust Optimization , 2019, Open J. Math. Optim..

[19]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[20]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[21]  Peter W. Glynn,et al.  Multivariate Distributionally Robust Convex Regression under Absolute Error Loss , 2019, NeurIPS.

[22]  Yongxin Chen,et al.  Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[23]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[24]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[25]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[26]  Zhi-Quan Luo,et al.  A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization , 2018, SIAM J. Optim..

[27]  Mingrui Liu,et al.  Weakly-convex–concave min–max optimization: provable algorithms and applications in machine learning , 2018, Optim. Methods Softw..

[28]  D. Drusvyatskiy,et al.  Efficiency of minimizing compositions of convex functions and smooth maps , 2018, Mathematical Programming.

[29]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[30]  Xi Chen,et al.  Wasserstein Distributionally Robust Optimization and Variation Regularization , 2017, Oper. Res..

[31]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[32]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[33]  Daniel Kuhn,et al.  Regularization via Mass Transportation , 2017, J. Mach. Learn. Res..

[34]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[35]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[36]  J. Blanchet,et al.  Robust Wasserstein profile inference and applications to machine learning , 2016, J. Appl. Probab..

[37]  Dmitriy Drusvyatskiy,et al.  Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria , 2016, Mathematical Programming.

[38]  Chong Li,et al.  On Convergence Rates of Linearized Proximal Algorithms for Convex Composite Optimization with Applications , 2016, SIAM J. Optim..

[39]  Dmitriy Drusvyatskiy,et al.  Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[40]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[41]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[42]  Yonatan Wexler,et al.  Minimizing the Maximal Loss: How and Why , 2016, ICML.

[43]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[44]  Aaron C. Courville,et al.  Generative adversarial networks , 2014, Commun. ACM.

[45]  Sébastien Bubeck Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[46]  J. Peypouquet,et al.  Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates , 2014, Journal of Optimization Theory and Applications.

[47]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[48]  Nicholas I. M. Gould,et al.  On the Evaluation Complexity of Composite Function Minimization with Applications to Nonconvex Nonlinear Programming , 2011, SIAM J. Optim..

[49]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[50]  Stephen J. Wright,et al.  A proximal method for composite minimization , 2008, Mathematical Programming.

[51]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities and applications , 2008, 0802.0826.

[52]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[53]  Y. Nesterov Modified Gauss–Newton scheme with worst case guarantees for global performance , 2007, Optim. Methods Softw..

[54]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..

[55]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[56]  M. Sion On general minimax theorems , 1958 .

[57]  A. Hoffman On approximate solutions of systems of linear inequalities , 1952 .

[58]  Michael I. Jordan,et al.  A Nonasymptotic Analysis of Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2022, SSRN Electronic Journal.

[59]  Heng Huang,et al.  Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems , 2021, NeurIPS.

[60]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[61]  Zhi-Hua Zhou,et al.  𝓁1-regression with Heavy-tailed Distributions , 2018, ArXiv.

[62]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[63]  Jong-Shi Pang,et al.  A unified distributed algorithm for non-cooperative games , 2016, Big Data over Networks.

[64]  Zhaolin Hu,et al.  Kullback-Leibler divergence constrained distributionally robust optimization , 2012 .

[65]  Jean-Yves Audibert,et al.  ROBUST LINEAR LEAST SQUARES REGRESSION , 2010 .

[66]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[67]  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2022 .