A Distributionally Robust Boosting Algorithm

Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms of a principled approach informed by robustness considerations. This paper contributes to this growing literature, connecting DRO and statistics, by showing how boosting algorithms can be studied via DRO. We propose a boosting type algorithm, named DRO-Boosting, as a procedure to solve our DRO formulation. Our DRO-Boosting algorithm recovers Adaptive Boosting (AdaBoost) in particular, thus showing that AdaBoost is effectively solving a DRO problem. We apply our algorithm to a financial dataset on credit card default payment prediction. Our approach compares favorably to alternative boosting methods which are widely used in practice.

[1]  Xiaotong Shen,et al.  Empirical Likelihood , 2002 .

[2]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[3]  John Duchi,et al.  Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach , 2016, Math. Oper. Res..

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  Yang Kang,et al.  Distributionally Robust Groupwise Regularization Estimator , 2017, ACML.

[6]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[7]  YeYinyu,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010 .

[8]  Karthyek R. A. Murthy,et al.  Quantifying Distributional Model Risk Via Optimal Transport , 2016, Math. Oper. Res..

[9]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[10]  J. Blanchet,et al.  Doubly Robust Data‐driven Distributionally Robust Optimization , 2017, 1705.07168.

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[13]  Melvyn Sim,et al.  The Price of Robustness , 2004, Oper. Res..

[14]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[15]  Henry Lam,et al.  The empirical likelihood approach to quantifying uncertainty in sample average approximation , 2017, Oper. Res. Lett..

[16]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[17]  Soumyadip Ghosh,et al.  Robust Analysis in Stochastic Simulation: Computation and Performance Guarantees , 2015, Oper. Res..

[18]  Fan Zhang,et al.  Data-Driven Optimal Transport Cost Selection For Distributionally Robust Optimization , 2017, 2019 Winter Simulation Conference (WSC).

[19]  Yang Kang,et al.  Distributionally Robust Semi-supervised Learning , 2017 .

[20]  Yoav Freund,et al.  A more robust boosting algorithm , 2009, 0905.2138.

[21]  Yongpei Guan,et al.  Data-driven risk-averse stochastic optimization with Wasserstein metric , 2018, Oper. Res. Lett..

[22]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[23]  Arkadi Nemirovski,et al.  Robust solutions of Linear Programming problems contaminated with uncertain data , 2000, Math. Program..

[24]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[25]  Laurent El Ghaoui,et al.  Worst-Case Value-At-Risk and Robust Portfolio Optimization: A Conic Programming Approach , 2003, Oper. Res..

[26]  Henry Lam,et al.  Recovering Best Statistical Guarantees via the Empirical Divergence-Based Distributionally Robust Optimization , 2016, Oper. Res..

[27]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[28]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[29]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[30]  Garud Iyengar,et al.  Ambiguous chance constrained problems and robust optimization , 2006, Math. Program..

[31]  Yang Kang,et al.  Semi‐supervised Learning Based on Distributionally Robust Optimization , 2020 .

[32]  Saharon Rosset,et al.  Robust boosting and its relation to bagging , 2005, KDD '05.

[33]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[34]  Xinhua Zhang,et al.  Distributionally Robust Graphical Models , 2018, NeurIPS.

[35]  Elena Smirnova,et al.  Distributionally Robust Reinforcement Learning , 2019, ArXiv.

[36]  Melvyn Sim,et al.  Distributionally Robust Optimization and Its Tractable Approximations , 2010, Oper. Res..

[37]  Alexander Shapiro,et al.  Distributionally Robust Stochastic Programming , 2017, SIAM J. Optim..

[38]  Güzin Bayraksan,et al.  Data-Driven Stochastic Programming Using Phi-Divergences , 2015 .

[39]  A. Kleywegt,et al.  Distributionally Robust Stochastic Optimization with Wasserstein Distance , 2016, Math. Oper. Res..

[40]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[41]  Leon Hirsch,et al.  Fundamentals Of Convex Analysis , 2016 .

[42]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.