Sharp Analysis of Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

Epoch gradient descent method (a.k.a. Epoch-GD) proposed by (Hazan and Kale, 2011) was deemed a breakthrough for stochastic strongly convex minimization, which achieves the optimal convergence rate of O(1/T) with T iterative updates for the objective gap. However, its extension to solving stochastic min-max problems with strong convexity and strong concavity still remains open, and it is still unclear whether a fast rate of O(1/T) for the duality gap is achievable for stochastic min-max optimization under strong convexity and strong concavity. Although some recent studies have proposed stochastic algorithms with fast convergence rates for min-max problems, they require additional assumptions about the problem, e.g., smoothness, bi-linear structure, etc. In this paper, we bridge this gap by providing a sharp analysis of epoch-wise stochastic gradient descent ascent method (referred to as Epoch-GDA) for solving strongly convex strongly concave (SCSC) min-max problems, without imposing any additional assumptions about smoothness or its structure. To the best of our knowledge, our result is the first one that shows Epoch-GDA can achieve the fast rate of O(1/T) for the duality gap of general SCSC min-max problems. We emphasize that such generalization of Epoch-GD for strongly convex minimization problems to Epoch-GDA for SCSC min-max problems is non-trivial and requires novel technical analysis. Moreover, we notice that the key lemma can be also used for proving the convergence of Epoch-GDA for weakly-convex strongly-concave min-max problems, leading to the best complexity as well without smoothness or other structural conditions.

[1]  Mingyi Hong,et al.  Decomposing Linearly Constrained Nonconvex Problems by a Proximal Primal Dual Approach: Algorithms, Convergence, and Applications , 2016, ArXiv.

[2]  Mingyi Hong,et al.  Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization , 2018, ArXiv.

[3]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[4]  Guanghui Lan,et al.  Randomized First-Order Methods for Saddle Point Optimization , 2014, 1409.8625.

[5]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[6]  Lin Xiao,et al.  Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms , 2017, ICML.

[7]  Mingrui Liu,et al.  Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate , 2018, ICML.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[10]  Siwei Lyu,et al.  Stochastic Online AUC Maximization , 2016, NIPS.

[11]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[12]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[13]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[14]  N. S. Aybat,et al.  A Primal-Dual Algorithm with Line Search for General Convex-Concave Saddle Point Problems , 2020, SIAM J. Optim..

[15]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[16]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[17]  Mingyi Hong,et al.  Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization , 2019, Math. Program..

[18]  Tianbao Yang,et al.  Stochastic Primal-Dual Algorithms with Faster Convergence than O(1/√T) for Problems without Bilinear Structure , 2019, ArXiv.

[19]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[20]  Mingrui Liu,et al.  Stochastic AUC Maximization with Deep Neural Networks , 2019, ICLR.

[21]  Shiqian Ma,et al.  Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity , 2018, NeurIPS.

[22]  Le Thi Khanh Hien,et al.  A primal-dual smoothing gap reduction framework for strongly convex-generally concave saddle point problems , 2017 .

[23]  Alexander Shapiro,et al.  Validation analysis of mirror descent stochastic approximation method , 2012, Math. Program..

[24]  Yongxin Chen,et al.  Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[25]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[26]  Haishan Ye,et al.  Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems , 2020, NeurIPS.

[27]  Renbo Zhao Optimal Algorithms for Stochastic Three-Composite Convex-Concave Saddle Point Problems , 2019, 1903.01687.

[28]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[29]  L. Hien,et al.  An Inexact Primal-Dual Smoothing Framework for Large-Scale Non-Bilinear Saddle Point Problems , 2017, J. Optim. Theory Appl..

[30]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[31]  Tony Jebara,et al.  Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[32]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[33]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[34]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[35]  N. S. Aybat,et al.  A Primal-Dual Algorithm for General Convex-Concave Saddle Point Problems , 2018, 1803.01401.

[36]  Bao-Gang Hu,et al.  Learning with Average Top-k Loss , 2017, NIPS.

[37]  Tianbao Yang,et al.  An efficient primal dual prox method for non-smooth optimization , 2014, Machine Learning.

[38]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[39]  Nathan Srebro,et al.  Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.

[40]  Panayotis Mertikopoulos,et al.  On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[41]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[42]  Tianbao Yang,et al.  Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data , 2015, ArXiv.