论文信息 - Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

Saddle-point optimization problems are an important class of optimization problems with applications to game theory, multi-agent reinforcement learning and machine learning. A majority of the rich literature available for saddle-point optimization has focused on the offline setting. In this paper, we study nonstationary versions of stochastic, smooth, strongly-convex and strongly-concave saddle-point optimization problem, in both online (or first-order) and multi-point bandit (or zeroth-order) settings. We first propose natural notions of regret for such nonstationary saddle-point optimization problems. We then analyze extragradient and Frank-Wolfe algorithms, for the unconstrained and constrained settings respectively, for the above class of nonstationary saddle-point optimization problems. We establish sub-linear regret bounds on the proposed notions of regret in both the online and bandit setting.

[1] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .

[2] Yu-Xiang Wang,et al. Online Forecasting of Total-Variation-bounded Sequences , 2019, NeurIPS.

[3] Xiaobo Li,et al. Online Learning with Non-Convex Losses and Non-Stationary Regret , 2018, AISTATS.

[4] Georgios Piliouras,et al. Multiplicative Weights Update in Zero-Sum Games , 2018, EC.

[5] Assaf J. Zeevi,et al. Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..

[6] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[7] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[8] Jason D. Lee,et al. On the Convergence and Robustness of Training GANs with Regularized Optimal Transport , 2018, NeurIPS.

[9] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[10] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[11] Anupam Gupta,et al. Chasing Convex Bodies with Linear Competitive Ratio , 2019, SODA.

[12] Michael I. Jordan,et al. Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[13] Michael I. Jordan,et al. What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[14] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[15] Mark Sellke. Chasing Convex Bodies Optimally , 2020, SODA.

[16] Nathan Linial,et al. On convex body chasing , 1993, Discret. Comput. Geom..

[17] Siwei Lyu,et al. Stochastic Online AUC Maximization , 2016, NIPS.

[18] Karan Singh,et al. Efficient Regret Minimization in Non-Convex Games , 2017, ICML.

[19] W. Waterhouse. The absolute-value estimate for symmetric multilinear forms☆ , 1990 .

[20] Prateek Jain,et al. Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[21] Jun-Kun Wang,et al. On Frank-Wolfe and Equilibrium Computation , 2017, NIPS.

[22] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .

[23] Andreas Krause,et al. An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[24] Jason D. Lee,et al. Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[25] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[26] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[27] Haipeng Luo,et al. Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[28] Rebecca Willett,et al. Online Convex Optimization in Dynamic Environments , 2015, IEEE Journal of Selected Topics in Signal Processing.

[29] Yin Tat Lee,et al. Competitively chasing convex bodies , 2018, STOC.

[30] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[31] Jacob Abernethy,et al. Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs , 2019, ArXiv.

[32] Rebecca Willett,et al. Online Markov Decision Processes With Kullback–Leibler Control Cost , 2014, IEEE Transactions on Automatic Control.

[33] Renato D. C. Monteiro,et al. On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[34] Francis R. Bach,et al. Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[35] G. Piliouras,et al. Poincar\'e Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games , 2019 .

[36] Manfred K. Warmuth,et al. Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[37] Krishnakumar Balasubramanian,et al. Multi-Point Bandit Algorithms for Nonstationary Online Nonconvex Optimization , 2019, ArXiv.

[38] Jinfeng Yi,et al. Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , 2016, ICML.

[39] Nicole Immorlica,et al. Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[40] Karthik Sridharan,et al. Online Nonparametric Regression with General Loss Functions , 2015, ArXiv.

[41] Yu-Xiang Wang,et al. Non-stationary Stochastic Optimization under L p,q -Variation Measures , 2018 .

[42] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[43] P. Tseng. On linear convergence of iterative methods for the variational inequality problem , 1995 .

[44] Aryan Mokhtari,et al. A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[45] András György,et al. Online Learning in Markov Decision Processes with Changing Cost Sequences , 2014, ICML.

[46] He Wang,et al. The Online Saddle Point Problem: Applications to Online Convex Optimization with Knapsacks , 2018 .

[47] Georgios Piliouras,et al. Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games , 2019, NeurIPS.

[48] Osman Güer. On the convergence of the proximal point algorithm for convex minimization , 1991 .

[49] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[50] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[51] Constantinos Daskalakis,et al. Training GANs with Optimism , 2017, ICLR.

[52] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.

[53] Xu Chen,et al. Fictitious GAN: Training GANs with Historical Models , 2018, ECCV.

[54] Michael I. Jordan,et al. On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[55] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.

[56] Sjur Didrik Flåm,et al. Equilibrium programming using proximal-like algorithms , 1997, Math. Program..

[57] Omar Besbes,et al. Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[58] J. Hammond. Solving asymmetric variational inequality problems and systems of equations with generalized nonlinear programming algorithms , 1984 .

[59] Mingrui Liu,et al. Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[60] Tony Jebara,et al. Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[61] Krishnakumar Balasubramanian,et al. Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points , 2018, Foundations of Computational Mathematics.

[62] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[63] Martin J. Wainwright,et al. Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[64] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.