Fast Rates for Contextual Linear Optimization

Incorporating side observations in decision making can reduce uncertainty and boost performance, but it also requires that we tackle a potentially complex predictive relationship. Although one may use off-the-shelf machine learning methods to separately learn a predictive model and plug it in, a variety of recent methods instead integrate estimation and optimization by fitting the model to directly optimize downstream decision performance. Surprisingly, in the case of contextual linear optimization, we show that the naïve plug-in approach actually achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance. We show this by leveraging the fact that specific problem instances do not have arbitrarily bad near-dual-degeneracy. Although there are other pros and cons to consider as we discuss and illustrate numerically, our results highlight a nuanced landscape for the enterprise to integrate estimation and optimization. Our results are overall positive for practice: predictive models are easy and fast to train using existing tools; simple to interpret; and, as we show, lead to decisions that perform very well. This paper was accepted by Hamid Nazerzadeh, data science.

[1]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[2]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[3]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[4]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[5]  Luc Devroye,et al.  Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..

[6]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[9]  Société de mathématiques appliquées et industrielles,et al.  ESAIM. Probability and statistics , 1997 .

[10]  G. Ziegler,et al.  Basic properties of convex polytopes , 1997 .

[11]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[12]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[13]  S. Boucheron,et al.  Concentration inequalities using the entropy method , 2003 .

[14]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[15]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[16]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[17]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[18]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[19]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[20]  A. W. van der Vaart,et al.  A note on bounds for VC dimensions. , 2009, Institute of Mathematical Statistics collections.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Russel Wright and Handicraft Transnational Collecting Practices , 2009 .

[23]  Philippe Rigollet,et al.  Nonparametric Bandits with Covariates , 2010, COLT.

[24]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[25]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[26]  Alexander I. Barvinok A bound for the number of vertices of a polytope with applications , 2013, Comb..

[27]  A. Zeevi,et al.  A Linear Response Bandit Problem , 2013 .

[28]  D. Simchi-Levi,et al.  A Statistical Learning Approach to Personalization in Revenue Management , 2015, Manag. Sci..

[29]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[30]  Priya L. Donti,et al.  Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[31]  P.R. Srivastava,et al.  On Data-Driven Prescriptive Analytics with Side Information: A Regularized Nadaraya-Watson Approach , 2021, 2110.04855.

[32]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[33]  Richard Pibernik,et al.  Prescriptive Analytics for Flexible Capacity Management , 2019, Manag. Sci..

[34]  Ambuj Tewari,et al.  Generalization Bounds in the Predict-then-Optimize Framework , 2019, NeurIPS.

[35]  Cynthia Rudin,et al.  The Big Data Newsvendor: Practical Insights from Machine Learning , 2013, Oper. Res..

[36]  Jean-Philippe P. Richard,et al.  Objective-Aligned Regression for Two-Stage Linear Programs , 2019, SSRN Electronic Journal.

[37]  Nathan Kallus,et al.  Stochastic Optimization Forests , 2020, Manag. Sci..

[38]  G. Loke,et al.  Decision-Driven Regularization: Harmonizing the Predictive and Prescriptive , 2020 .

[39]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[40]  Mohsen Bayati,et al.  Online Decision Making with High-Dimensional Covariates , 2020, Oper. Res..

[41]  Adam N. Elmachtoub,et al.  Decision Trees for Decision-Making under the Predict-then-Optimize Framework , 2020, ICML.

[42]  Shuotao Diao,et al.  Distribution-free Algorithms for Learning Enabled Predictive Stochastic Programming , 2020 .

[43]  David Simchi-Levi,et al.  Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective , 2020, COLT.

[44]  Adam N. Elmachtoub,et al.  Smart "Predict, then Optimize" , 2017, Manag. Sci..

[45]  Nathan Kallus,et al.  Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes , 2019, COLT.

[46]  Nam Ho-Nguyen,et al.  Risk Guarantees for End-to-End Prediction and Optimization Processes , 2020, Manag. Sci..