Optimization of Tree Ensembles

Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.

[1]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[4]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[5]  Iain Dunning,et al.  Computing in Operations Research Using Julia , 2013, INFORMS J. Comput..

[6]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  Dimitris Bertsimas,et al.  Exact First-Choice Product Line Optimization , 2019, Oper. Res..

[9]  Christophe Croux,et al.  Bagging and Boosting Classification Trees to Predict Churn , 2006 .

[10]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[11]  Amedeo R. Odoni,et al.  An Integer Optimization Approach to Large-Scale Air Traffic Flow Management , 2011, Oper. Res..

[12]  Yves Crama Concave extensions for nonlinear 0–1 maximization problems , 1993, Math. Program..

[13]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[14]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Nathan Kallus,et al.  Recursive Partitioning for Personalization using Observational Data , 2016, ICML.

[16]  J. Scannell,et al.  Diagnosing the decline in pharmaceutical R&D efficiency , 2012, Nature Reviews Drug Discovery.

[17]  Juan Pablo Vielma,et al.  Mixed Integer Linear Programming Formulation Techniques , 2015, SIAM Rev..

[18]  Georgia Perakis,et al.  The Impact of Linear Optimization on Promotion Planning , 2014, Oper. Res..

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Stephen Relyea,et al.  An Analytics Approach to Designing Combination Chemotherapy Regimens for Cancer , 2016, Manag. Sci..

[21]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing , 2005 .

[22]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[23]  Stefan Wager,et al.  Uniform Convergence of Random Forests via Adaptive Concentration , 2015 .

[24]  A. Montgomery Creating Micro-Marketing Pricing Strategies Using Supermarket Scanner Data , 1997 .

[25]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[26]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[27]  Stephen P. Ryan,et al.  Machine Learning Methods for Demand Estimation , 2015 .

[28]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[29]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[30]  Velibor V. Mišić,et al.  Data, models and decisions for large-scale stochastic optimization problems , 2016 .

[31]  Dimitris Bertsimas,et al.  OR Forum - An Algorithmic Approach to Linear Regression , 2016, Oper. Res..

[32]  Robert Phillips,et al.  Testing the Validity of a Demand Model: An Operations Perspective , 2010, Manuf. Serv. Oper. Manag..

[33]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[34]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[35]  David Simchi-Levi,et al.  Analytics for an Online Retailer: Demand Forecasting and Price Optimization , 2016, Manuf. Serv. Oper. Manag..

[36]  M. van Beek An Algorithmic Approach to Linear Regression , 2018 .

[37]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[38]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[39]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[40]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[41]  K Gubernator,et al.  Physicochemical high throughput screening: parallel artificial membrane permeation assay in the description of passive absorption processes. , 1998, Journal of medicinal chemistry.

[42]  Stefan Wager,et al.  Adaptive Concentration of Regression Trees, with Application to Random Forests , 2015 .

[43]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[44]  Dimitris Bertsimas,et al.  The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization , 2016, 1605.02347.

[45]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  Igor V. Tetko,et al.  Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[48]  Jarmo Huuskonen,et al.  Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology , 2000, J. Chem. Inf. Comput. Sci..