Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine

Wasserstein \textbf{D}istributionally \textbf{R}obust \textbf{O}ptimization (DRO) is concerned with finding decisions that perform well on data that are drawn from the worst-case probability distribution within a Wasserstein ball centered at a certain nominal distribution. In recent years, it has been shown that various DRO formulations of learning models admit tractable convex reformulations. However, most existing works propose to solve these convex reformulations by general-purpose solvers, which are not well-suited for tackling large-scale problems. In this paper, we focus on a family of Wasserstein distributionally robust support vector machine (DRSVM) problems and propose two novel epigraphical projection-based incremental algorithms to solve them. The updates in each iteration of these algorithms can be computed in a highly efficient manner. Moreover, we show that the DRSVM problems considered in this paper satisfy a H\"olderian growth condition with explicitly determined growth exponents. Consequently, we are able to establish the convergence rates of the proposed incremental algorithms. Our numerical results indicate that the proposed methods are orders of magnitude faster than the state-of-the-art, and the performance gap grows considerably as the problem size increases.

[1]  J. Zico Kolter,et al.  Epigraph projections for fast general convex programming , 2016, ICML.

[2]  Karthyek Murthy,et al.  Optimal Transport-Based Distributionally Robust Optimization: Structural Properties and Iterative Schemes , 2018, Math. Oper. Res..

[3]  Sanjay Mehrotra,et al.  A Distributionally-robust approach for finding support vector machine , 2015 .

[4]  Yong-Jin Liu,et al.  Fast algorithm for singly linearly constrained quadratic programs with box-like constraints , 2017, Comput. Optim. Appl..

[5]  Sanjay Mehrotra,et al.  Decomposition Algorithm for Distributionally Robust Optimization using Wasserstein Metric , 2017, 1704.03920.

[6]  Viet Anh Nguyen,et al.  Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning , 2019, Operations Research & Management Science in the Age of Analytics.

[7]  Anthony Man-Cho So,et al.  A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression , 2019, ArXiv.

[8]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[9]  Jaeho Lee,et al.  Minimax Statistical Learning with Wasserstein distances , 2017, NeurIPS.

[10]  Anthony Man-Cho So,et al.  A unified approach to error bounds for structured convex optimization problems , 2015, Mathematical Programming.

[11]  Benjamin Pfaff,et al.  Perturbation Analysis Of Optimization Problems , 2016 .

[12]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[13]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..

[14]  Heinz H. Bauschke,et al.  Projection algorithms and monotone operators , 1996 .

[15]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[16]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[17]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[18]  Anthony Man-Cho So,et al.  Incremental Methods for Weakly Convex Optimization , 2019, ArXiv.

[19]  Daniel Kuhn,et al.  Distributionally Robust Convex Optimization , 2014, Oper. Res..

[20]  J. Burkey,et al.  WEAK SHARP MINIMA IN MATHEMATICAL PROGRAMMING , 1993 .

[21]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[22]  Debdas Ghosh,et al.  A survey of robust optimization based machine learning with special reference to support vector machines , 2019, Int. J. Mach. Learn. Cybern..

[23]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[24]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[25]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[26]  Roger Fletcher,et al.  New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds , 2006, Math. Program..

[27]  Xi Chen,et al.  Wasserstein Distributional Robustness and Regularization in Statistical Learning , 2017, ArXiv.

[28]  Daniel Kuhn,et al.  Regularization via Mass Transportation , 2017, J. Mach. Learn. Res..