Data-Driven Sample Average Approximation with Covariate Information

We consider optimization for decision-making in which parameters within the optimization model are uncertain, but predictions of these parameters can be made using available covariate information. We consider a data-driven setting in which we have observations of the uncertain parameters together with concurrently-observed covariates. Given a new covariate observation, the goal is to choose a decision that minimizes the expected cost conditioned on this observation. We investigate three data-driven frameworks that integrate a machine learning prediction model within a stochastic programming sample average approximation (SAA) for approximating the solution to this problem. Two of the SAA frameworks are new and use out-of-sample residuals of leave-one-out prediction models for scenario generation. The frameworks we investigate are flexible and accommodate parametric, nonparametric, and semiparametric regression techniques. We derive conditions on the data generation process, the prediction model, and the stochastic program under which solutions of these data-driven SAAs are consistent and asymptotically optimal, and also derive convergence rates and finite sample guarantees. Computational experiments validate our theoretical results, demonstrate the potential advantages of our data-driven formulations over existing approaches (even when the prediction model is misspecified), and illustrate the benefits of our new data-driven formulations in the limited data regime.

[1]  Numérisation de documents anciens mathématiques,et al.  Annales de l'I.H.P. Probabilités et statistiques , 1983 .

[2]  H. White Asymptotic theory for econometricians , 1985 .

[3]  M. A. Arcones,et al.  Central limit theorems for empirical andU-processes of stationary mixing sequences , 1994 .

[4]  M. A. Arcones,et al.  Limit Theorems for Nonlinear Functionals of a Stationary Gaussian Sequence of Vectors , 1994 .

[5]  D. Pollard,et al.  An introduction to functional central limit theorems for dependent stochastic processes , 1994 .

[6]  P. Massart,et al.  Invariance principles for absolutely regular empirical processes , 1995 .

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  A. Dembo,et al.  Large deviations and strong mixing , 1996 .

[10]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[11]  David P. Morton,et al.  Monte Carlo bounding techniques for determining solution quality in stochastic programs , 1999, Oper. Res. Lett..

[12]  Chun-Hung Chen,et al.  Convergence Properties of Two-Stage Stochastic Programming , 2000 .

[13]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[14]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[15]  Alan David Hutson,et al.  Resampling Methods for Dependent Data , 2004, Technometrics.

[16]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[17]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[18]  Tito Homem-de-Mello,et al.  On Rates of Convergence for Stochastic Optimization Problems Under Non--Independent and Identically Distributed Sampling , 2008, SIAM J. Optim..

[19]  V. Koltchinskii The Dantzig selector and sparsity oracle inequalities , 2009, 0909.0861.

[20]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[21]  Antonio Alonso Ayuso,et al.  Introduction to Stochastic Programming , 2009 .

[22]  Shabbir Ahmed,et al.  Supply chain design under uncertainty using sample average approximation and dual decomposition , 2009, Eur. J. Oper. Res..

[23]  Dorota Kurowicka,et al.  Generating random correlation matrices based on vines and extended onion method , 2009, J. Multivar. Anal..

[24]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[25]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[26]  T. Merkle,et al.  Strong Laws of Large Numbers and Nonparametric Estimation , 2010 .

[27]  E. Seijo,et al.  Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function , 2010, 1003.4765.

[28]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[29]  Sham M. Kakade,et al.  Random Design Analysis of Ridge Regression , 2012, COLT.

[30]  Sourav Chatterjee,et al.  Assumptionless consistency of the Lasso , 2013, 1303.5817.

[31]  Michael B. Miller Linear Regression Analysis , 2013 .

[32]  G. Michailidis,et al.  Regularized estimation in sparse high-dimensional time series models , 2013, 1311.4175.

[33]  Tito Homem-de-Mello,et al.  Monte Carlo sampling-based methods for stochastic optimization , 2014 .

[34]  Johannes O. Royset,et al.  From Data to Assessments and Decisions: Epi-Spline Technology , 2014 .

[35]  James R. Luedtke A branch-and-cut decomposition algorithm for solving chance-constrained mathematical programs with finite support , 2013, Mathematical Programming.

[36]  Sanjay Mehrotra,et al.  A Two-Stage Stochastic Integer Programming Approach to Integrated Staffing and Scheduling with Application to Nurse Management , 2015, Oper. Res..

[37]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[38]  Güzin Bayraksan,et al.  Stochastic Constraints and Variance Reduction Techniques , 2015 .

[39]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[40]  Iain Dunning,et al.  JuMP: A Modeling Language for Mathematical Optimization , 2015, SIAM Rev..

[41]  Priya L. Donti,et al.  Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[42]  D. Davarnia BAYESIAN SOLUTION ESTIMATORS IN STOCHASTIC OPTIMIZATION , 2017 .

[43]  B. Sen,et al.  A Computational Framework for Multivariate Convex Regression and Its Variants , 2015, Journal of the American Statistical Association.

[44]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[45]  S. Sen,et al.  Learning Enabled Optimization : Towards a Fusion of Statistical Learning and Stochastic Programming , 2018 .

[46]  Devavrat Shah,et al.  Explaining the Success of Nearest Neighbor Methods in Prediction , 2018, Found. Trends Mach. Learn..

[47]  Ryan J. Tibshirani,et al.  Predictive inference with the jackknife+ , 2019, The Annals of Statistics.

[48]  Dimitris Bertsimas,et al.  From Predictions to Prescriptions in Multistage Optimization Problems , 2019, ArXiv.

[49]  Jérémie Gallien,et al.  Dynamic Procurement of New Products with Covariate Information: The Residual Tree Method , 2019, Manuf. Serv. Oper. Manag..

[50]  Dimitris Bertsimas,et al.  Dynamic optimization with side information , 2019, Eur. J. Oper. Res..

[51]  Mihai Anitescu,et al.  Distributionally Robust Optimization with Correlated Data from Vector Autoregressive Processes , 2019, Oper. Res. Lett..

[52]  Cynthia Rudin,et al.  The Big Data Newsvendor: Practical Insights from Machine Learning , 2013, Oper. Res..

[53]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[54]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[55]  Shuotao Diao,et al.  Distribution-free Algorithms for Learning Enabled Predictive Stochastic Programming , 2020 .

[56]  Adam N. Elmachtoub,et al.  Smart "Predict, then Optimize" , 2017, Manag. Sci..