Kernel Distributionally Robust Optimization

This paper is an in-depth investigation of using kernel methods to immunize optimization solutions against distributional ambiguity. We propose kernel distributionally robust optimization (K-DRO) using insights from the robust optimization theory and functional analysis. Our method uses reproducing kernel Hilbert spaces (RKHS) to construct ambiguity sets. It can be reformulated as a tractable program by using the conic duality of moment problems and an extension of the RKHS representer theorem. Our insights reveal that universal RKHSs are large enough for K-DRO to be effective. This paper provides both theoretical analyses that extend the robustness properties of kernel methods, as well as practical algorithms that can be applied to general optimization problems, not limited to kernelized models.

[1]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[2]  John C. Duchi,et al.  Distributionally Robust Losses Against Mixture Covariate Shifts , 2019 .

[3]  Giuseppe Carlo Calafiore,et al.  The scenario approach to robust control design , 2006, IEEE Transactions on Automatic Control.

[4]  Daniel Kuhn,et al.  Generalized Gauss inequalities via semidefinite programming , 2015, Mathematical Programming.

[5]  Arkadi Nemirovski,et al.  A Randomized Mirror-Prox Method for Solving Structured Large-Scale Matrix Saddle-Point Problems , 2011, SIAM J. Optim..

[6]  Jean-Philippe Vial,et al.  Deriving robust counterparts of nonlinear uncertain inequalities , 2012, Math. Program..

[7]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[8]  Ioana Popescu,et al.  Optimal Inequalities in Probability Theory: A Convex Optimization Approach , 2005, SIAM J. Optim..

[9]  Bo Wei,et al.  The CoMirror algorithm with random constraint sampling for convex semi-infinite programming , 2020, Annals of Operations Research.

[10]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[11]  Anthony Man-Cho So,et al.  A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression , 2019, ArXiv.

[12]  Laurent El Ghaoui,et al.  Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[13]  Julien Mairal,et al.  A Kernel Perspective for Regularizing Deep Neural Networks , 2018, ICML.

[14]  Oliver Stein,et al.  Generalized semi-infinite programming: A tutorial , 2008 .

[15]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[16]  Yongpei Guan,et al.  Data-driven risk-averse stochastic optimization with Wasserstein metric , 2018, Oper. Res. Lett..

[17]  Sean P. Meyn,et al.  Randomized algorithms for semi-infinite programming problems , 2003, 2003 European Control Conference (ECC).

[18]  Tamás Terlaky,et al.  A Survey of the S-Lemma , 2007, SIAM Rev..

[19]  A. Shapiro ON DUALITY THEORY OF CONIC LINEAR PROBLEMS , 2001 .

[20]  Gert R. G. Lanckriet,et al.  On the empirical estimation of integral probability metrics , 2012 .

[21]  Peter W. Glynn,et al.  Likelihood robust optimization for data-driven problems , 2013, Computational Management Science.

[22]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[23]  W. Rogosinski Moments of non-negative mass , 1958, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[24]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[25]  Zhiqiang Zhou,et al.  Algorithms for stochastic optimization with function or expectation constraints , 2016, Comput. Optim. Appl..

[26]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[27]  John Duchi,et al.  Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach , 2016, Math. Oper. Res..

[28]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[29]  Stefanie Jegelka,et al.  Distributionally Robust Optimization and Generalization in Kernel Methods , 2019, NeurIPS.

[30]  Johannes Kirschner,et al.  Distributionally Robust Bayesian Optimization , 2020, AISTATS.

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Kevin Scaman,et al.  Lipschitz regularity of deep neural networks: analysis and efficient estimation , 2018, NeurIPS.

[33]  Yangyang Xu,et al.  Primal-Dual Stochastic Gradient Method for Convex Programs with Many Functional Constraints , 2018, SIAM J. Optim..

[34]  石井 恵一 On sharpness of Tchebycheff-type inequalities = チェビシェフ型不等式の最良性について , 1964 .

[35]  Ingo Steinwart,et al.  Consistency and robustness of kernel-based regression in convex risk minimization , 2007, 0709.0626.

[36]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[37]  Michael A. Osborne,et al.  Distributionally Ambiguous Optimization for Batch Bayesian Optimization , 2020, J. Mach. Learn. Res..

[38]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[39]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[40]  Daniel Kuhn,et al.  Distributionally robust joint chance constraints with second-order moment information , 2011, Mathematical Programming.

[41]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[42]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[43]  Lorenzo Rosasco,et al.  Learning with SGD and Random Features , 2018, NeurIPS.

[44]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[45]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[46]  Karin Rothschild,et al.  A Course In Functional Analysis , 2016 .

[47]  J. Lasserre Bounds on measures satisfying moment conditions , 2002 .

[48]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[49]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[50]  Alexander Barvinok,et al.  A course in convexity , 2002, Graduate studies in mathematics.

[51]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[52]  Daniel Kuhn,et al.  Regularization via Mass Transportation , 2017, J. Mach. Learn. Res..

[53]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[54]  A. Kleywegt,et al.  Distributionally Robust Stochastic Optimization with Wasserstein Distance , 2016, Math. Oper. Res..

[55]  Herbert E. Scarf,et al.  A Min-Max Solution of an Inventory Problem , 1957 .

[56]  Bernhard Schölkopf,et al.  Worst-Case Risk Quantification under Distributional Ambiguity using Kernel Mean Embedding in Moment Problem , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[57]  Vishal Gupta,et al.  Data-driven robust optimization , 2013, Math. Program..

[58]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[59]  Ioana Popescu,et al.  A Semidefinite Programming Approach to Optimal-Moment Bounds for Convex Classes of Distributions , 2005, Math. Oper. Res..

[60]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[61]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[62]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[63]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.