AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

In the paper, we propose a class of faster adaptive gradient descent ascent methods for solving the nonconvex-strongly-concave minimax problems by using unified adaptive matrices used in the SUPER-ADAM [Huang et al., 2021]. Specifically, we propose a fast adaptive gradient decent ascent (AdaGDA) method based on the basic momentum technique, which reaches a low sample complexity of O(κ4ǫ−4) for finding an ǫ-stationary point without large batches, which improves the existing result of adaptive minimax optimization method by a factor of O( √ κ). Moreover, we present an accelerated version of AdaGDA (VR-AdaGDA) method based on the momentum-based variance reduced technique, which achieves the best known sample complexity of O(κ3ǫ−3) for finding an ǫ-stationary point without large batches. Further assume the bounded Lipschitz parameter of objective function, we prove that our VR-AdaGDA method reaches a lower sample complexity of O(κ2.5ǫ−3) with the mini-batch size O(κ). In particular, we provide an effective convergence analysis framework for our adaptive methods based on unified adaptive matrices, which include almost existing adaptive learning rates.

[1]  Mehrdad Mahdavi,et al.  Distributionally Robust Federated Averaging , 2021, NeurIPS.

[2]  Niao He,et al.  Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, NeurIPS.

[3]  Ali Jadbabaie,et al.  Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization , 2021, NeurIPS.

[4]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[5]  Ali Jadbabaie,et al.  Robust Federated Learning: The Case of Affine Distribution Shifts , 2020, NeurIPS.

[6]  J. Pei,et al.  Accelerated Zeroth-Order Momentum Methods from Mini to Minimax Optimization , 2020, arXiv.org.

[7]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[8]  J. Pei,et al.  Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization , 2020, J. Mach. Learn. Res..

[9]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[10]  Wei Liu,et al.  Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization , 2020, NeurIPS.

[11]  Yingbin Liang,et al.  SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.

[12]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[13]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[14]  Feihu Huang,et al.  Gradient Descent Ascent for Minimax Problems on Riemannian Manifolds , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Feihu Huang,et al.  SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients , 2021, ArXiv.

[17]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[18]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[19]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[20]  Zhaoran Wang,et al.  Variance Reduced Policy Evaluation with Smooth Function Approximation , 2019, NeurIPS.

[21]  Francesco Orabona,et al.  On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.

[22]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[23]  J. Duncan,et al.  AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients , 2020, NeurIPS.

[24]  Tong Zhang,et al.  Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems , 2020, NeurIPS.

[25]  Niao He,et al.  The Complexity of Nonconvex-Strongly-Concave Minimax Optimization , 2021, UAI.

[26]  Feihu Huang,et al.  BiAdam: Fast Adaptive Bilevel Optimization Methods , 2021, ArXiv.

[27]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[28]  Xiaoxia Wu,et al.  AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.

[29]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[30]  Y. Censor,et al.  Proximal minimization algorithm withD-functions , 1992 .

[31]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[32]  Mingrui Liu,et al.  Weakly-convex–concave min–max optimization: provable algorithms and applications in machine learning , 2018, Optim. Methods Softw..

[33]  Rong Jin,et al.  On Stochastic Moving-Average Estimators for Non-Convex Optimization , 2021, ArXiv.

[34]  Y. Censor,et al.  An iterative row-action method for interval convex programming , 1981 .

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.