Generalization Bounds for Stochastic Saddle Point Problems

This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.

[1]  Nishanth Dikkala,et al.  Minimax Estimation of Conditional Moment Models , 2020, NeurIPS.

[2]  Wei Liu,et al.  Sharp Analysis of Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization , 2020, ArXiv.

[3]  Guanghui Lan,et al.  First-order and Stochastic Optimization Methods for Machine Learning , 2020 .

[4]  Junyu Zhang,et al.  A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[5]  Tianbao Yang,et al.  Stochastic Primal-Dual Algorithms with Faster Convergence than O(1/√T) for Problems without Bilinear Structure , 2019, ArXiv.

[6]  Renbo Zhao Optimal Algorithms for Stochastic Three-Composite Convex-Concave Saddle Point Problems , 2019, 1903.01687.

[7]  Renbo Zhao,et al.  Optimal Stochastic Algorithms for Convex-Concave Saddle-Point Problems , 2019 .

[8]  Francis Bach,et al.  A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise , 2019, COLT.

[9]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[10]  Weizhu Chen,et al.  DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization , 2017, J. Mach. Learn. Res..

[11]  Jason D. Lee,et al.  Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition , 2018, ArXiv.

[12]  Siwei Lyu,et al.  Stochastic Proximal Algorithms for AUC Maximization , 2018, ICML.

[13]  Lihong Li,et al.  Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.

[14]  Bin Yu,et al.  Stability and Convergence Trade-off of Iterative Optimization Algorithms , 2018, ArXiv.

[15]  Mengdi Wang,et al.  Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.

[16]  Mengdi Wang,et al.  Lower Bound On the Computational Complexity of Discounted Markov Decision Problems , 2017, ArXiv.

[17]  Rong Jin,et al.  Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds , 2017, COLT.

[18]  Nishant Mehta,et al.  Fast rates with high probability in exp-concave statistical learning , 2016, AISTATS.

[19]  Shai Shalev-Shwartz,et al.  Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization , 2016, J. Mach. Learn. Res..

[20]  Yunmei Chen,et al.  Accelerated schemes for a class of variational inequalities , 2014, Mathematical Programming.

[21]  Mengdi Wang,et al.  Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.

[22]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[23]  Nishant A. Mehta From exp-concavity to variance control: High probability O(1/n) rates and high probability online-to-batch conversion , 2016 .

[24]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[25]  Kfir Y. Levy,et al.  Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.

[26]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[27]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[28]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[29]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[30]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[31]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[32]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[33]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[34]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[35]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[36]  Yurii Nesterov,et al.  Modified Gauss–Newton scheme with worst case guarantees for global performance , 2007, Optim. Methods Softw..

[37]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[38]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[39]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[40]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[41]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[42]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[43]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[44]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[45]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[46]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[47]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[48]  C. McDiarmid Concentration , 1862, The Dental register.