Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

Saddle-point optimization problems are an important class of optimization problems with applications to game theory, multi-agent reinforcement learning and machine learning. A majority of the rich literature available for saddle-point optimization has focused on the offline setting. In this paper, we study nonstationary versions of stochastic, smooth, strongly-convex and strongly-concave saddle-point optimization problem, in both online (or first-order) and multi-point bandit (or zeroth-order) settings. We first propose natural notions of regret for such nonstationary saddle-point optimization problems. We then analyze extragradient and Frank-Wolfe algorithms, for the unconstrained and constrained settings respectively, for the above class of nonstationary saddle-point optimization problems. We establish sub-linear regret bounds on the proposed notions of regret in both the online and bandit setting.

[1]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[2]  Yu-Xiang Wang,et al.  Online Forecasting of Total-Variation-bounded Sequences , 2019, NeurIPS.

[3]  Xiaobo Li,et al.  Online Learning with Non-Convex Losses and Non-Stationary Regret , 2018, AISTATS.

[4]  Georgios Piliouras,et al.  Multiplicative Weights Update in Zero-Sum Games , 2018, EC.

[5]  Assaf J. Zeevi,et al.  Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..

[6]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[7]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[8]  Jason D. Lee,et al.  On the Convergence and Robustness of Training GANs with Regularized Optimal Transport , 2018, NeurIPS.

[9]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[10]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[11]  Anupam Gupta,et al.  Chasing Convex Bodies with Linear Competitive Ratio , 2019, SODA.

[12]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[13]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[14]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[15]  Mark Sellke Chasing Convex Bodies Optimally , 2020, SODA.

[16]  Nathan Linial,et al.  On convex body chasing , 1993, Discret. Comput. Geom..

[17]  Siwei Lyu,et al.  Stochastic Online AUC Maximization , 2016, NIPS.

[18]  Karan Singh,et al.  Efficient Regret Minimization in Non-Convex Games , 2017, ICML.

[19]  W. Waterhouse The absolute-value estimate for symmetric multilinear forms☆ , 1990 .

[20]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[21]  Jun-Kun Wang,et al.  On Frank-Wolfe and Equilibrium Computation , 2017, NIPS.

[22]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[23]  Andreas Krause,et al.  An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[24]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[25]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[26]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[27]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[28]  Rebecca Willett,et al.  Online Convex Optimization in Dynamic Environments , 2015, IEEE Journal of Selected Topics in Signal Processing.

[29]  Yin Tat Lee,et al.  Competitively chasing convex bodies , 2018, STOC.

[30]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[31]  Jacob Abernethy,et al.  Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs , 2019, ArXiv.

[32]  Rebecca Willett,et al.  Online Markov Decision Processes With Kullback–Leibler Control Cost , 2014, IEEE Transactions on Automatic Control.

[33]  Renato D. C. Monteiro,et al.  On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[34]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[35]  G. Piliouras,et al.  Poincar\'e Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games , 2019 .

[36]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[37]  Krishnakumar Balasubramanian,et al.  Multi-Point Bandit Algorithms for Nonstationary Online Nonconvex Optimization , 2019, ArXiv.

[38]  Jinfeng Yi,et al.  Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , 2016, ICML.

[39]  Nicole Immorlica,et al.  Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[40]  Karthik Sridharan,et al.  Online Nonparametric Regression with General Loss Functions , 2015, ArXiv.

[41]  Yu-Xiang Wang,et al.  Non-stationary Stochastic Optimization under L p,q -Variation Measures , 2018 .

[42]  Omar Besbes,et al.  Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[43]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[44]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[45]  András György,et al.  Online Learning in Markov Decision Processes with Changing Cost Sequences , 2014, ICML.

[46]  He Wang,et al.  The Online Saddle Point Problem: Applications to Online Convex Optimization with Knapsacks , 2018 .

[47]  Georgios Piliouras,et al.  Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games , 2019, NeurIPS.

[48]  Osman Güer On the convergence of the proximal point algorithm for convex minimization , 1991 .

[49]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[50]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[51]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[52]  Chen-Yu Wei,et al.  Online Reinforcement Learning in Stochastic Games , 2017, NIPS.

[53]  Xu Chen,et al.  Fictitious GAN: Training GANs with Historical Models , 2018, ECCV.

[54]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[55]  Csaba Szepesvári,et al.  Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.

[56]  Sjur Didrik Flåm,et al.  Equilibrium programming using proximal-like algorithms , 1997, Math. Program..

[57]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[58]  J. Hammond Solving asymmetric variational inequality problems and systems of equations with generalized nonlinear programming algorithms , 1984 .

[59]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[60]  Tony Jebara,et al.  Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[61]  Krishnakumar Balasubramanian,et al.  Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points , 2018, Foundations of Computational Mathematics.

[62]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[63]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[64]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.