On Data-Driven Prescriptive Analytics with Side Information: A Regularized Nadaraya-Watson Approach

We consider generic stochastic optimization problems in the presence of side information which enables a more insightful decision. The side information constitutes observable exogenous covariates that alter the conditional probability distribution of the random problem parameters. A decision maker who adapts her decisions according to the observed side information solves an optimization problem where the objective function is specified by the conditional expectation of the random cost. If the joint probability distribution is unknown, then the conditional expectation can be approximated in a data-driven manner using the Nadaraya-Watson (NW) kernel regression. While the emerging approximation scheme has found successful applications in diverse decision problems under uncertainty, it is largely unknown whether the scheme can provide any reasonable out-of-sample performance guarantees. In this paper, we establish guarantees for the generic problems by leveraging techniques from moderate deviations theory. Our analysis motivates the use of a variance-based regularization scheme which, in general, leads to a non-convex optimization problem. We adopt ideas from distributionally robust optimization to obtain tractable formulations. We present numerical experiments for newsvendor and wind energy commitment problems to highlight the effectiveness of our regularization scheme.

[1]  E. Nadaraya On Estimating Regression , 1964 .

[2]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[3]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[4]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[5]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[6]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[9]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[10]  Alexander Shapiro,et al.  The Sample Average Approximation Method for Stochastic Discrete Optimization , 2002, SIAM J. Optim..

[11]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[12]  Matthias Löwe,et al.  MODERATE DEVIATIONS FOR I.I.D. RANDOM VARIABLES , 2003 .

[13]  Pedro Santa-Clara,et al.  Parametric Portfolio Policies: Exploiting Characteristics in the Cross Section of Equity Returns , 2004 .

[14]  Johan Löfberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004 .

[15]  Alexander Shapiro,et al.  On Complexity of Stochastic Programming Problems , 2005 .

[16]  R. Karandikar,et al.  Sankhyā, The Indian Journal of Statistics , 2006 .

[17]  Large and moderate deviations principles for kernel estimators of the multivariate regression , 2007, math/0703341.

[18]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[19]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[20]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[21]  Warren B. Powell,et al.  Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable , 2010, NIPS.

[22]  David B. Dunson,et al.  Approximate Dynamic Programming for Storage Problems , 2011, ICML.

[23]  Warren B. Powell,et al.  Optimal Energy Commitments with Storage and Intermittent Supply , 2011, Oper. Res..

[24]  Shie Mannor,et al.  Statistical Optimization in High Dimensions , 2012, Oper. Res..

[25]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[26]  Daniel Kuhn,et al.  Robust Data-Driven Dynamic Programming , 2013, NIPS.

[27]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[28]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[29]  Purnamrita Sarkar,et al.  Covariate Regularized Community Detection in Sparse Graphs , 2016, Journal of the American Statistical Association.

[30]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[31]  S. Sen,et al.  Learning Enabled Optimization : Towards a Fusion of Statistical Learning and Stochastic Programming , 2018 .

[32]  Bart P. G. Van Parys,et al.  Bootstrap robust prescriptive analytics , 2017, Mathematical Programming.

[33]  Purnamrita Sarkar,et al.  A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers , 2019, Oper. Res..

[34]  Dimitris Bertsimas,et al.  From Predictions to Prescriptions in Multistage Optimization Problems , 2019, ArXiv.

[35]  Jérémie Gallien,et al.  Dynamic Procurement of New Products with Covariate Information: The Residual Tree Method , 2019, Manuf. Serv. Oper. Manag..

[36]  Dimitris Bertsimas,et al.  Dynamic optimization with side information , 2019, Eur. J. Oper. Res..

[37]  Cynthia Rudin,et al.  The Big Data Newsvendor: Practical Insights from Machine Learning , 2013, Oper. Res..

[38]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[39]  James R. Luedtke,et al.  Data-Driven Sample Average Approximation with Covariate Information , 2022, 2207.13554.

[40]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[41]  Erick Delage,et al.  Generalization bounds for regularized portfolio selection with market side information , 2018, INFOR Inf. Syst. Oper. Res..

[42]  James R. Luedtke,et al.  Residuals-based distributionally robust optimization with covariate information. , 2020 .

[43]  J. Morales,et al.  Distributionally robust stochastic programs with side information based on trimmings , 2020, Mathematical Programming.

[44]  Melvyn Sim,et al.  The Analytics of Robust Satisficing , 2021, SSRN Electronic Journal.

[45]  James R. Luedtke,et al.  Heteroscedasticity-aware residuals-based contextual stochastic optimization , 2021, 2101.03139.

[46]  Adam N. Elmachtoub,et al.  Smart "Predict, then Optimize" , 2017, Manag. Sci..