Random Forest Prediction Intervals

Abstract Random forests are among the most popular machine learning techniques for prediction problems. When using random forests to predict a quantitative response, an important but often overlooked challenge is the determination of prediction intervals that will contain an unobserved response value with a specified probability. We propose new random forest prediction intervals that are based on the empirical distribution of out-of-bag prediction errors. These intervals can be obtained as a by-product of a single random forest. Under regularity conditions, we prove that the proposed intervals have asymptotically correct coverage rates. Simulation studies and analysis of 60 real datasets are used to compare the finite-sample properties of the proposed intervals with quantile regression forests and recently proposed split conformal intervals. The results indicate that intervals constructed with our proposed method tend to be narrower than those of competing methods while still maintaining marginal coverage rates approximately equal to nominal levels.

[1]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[2]  Julie Tibshirani,et al.  Local Linear Forests , 2018, J. Comput. Graph. Stat..

[3]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[4]  Erwan Scornet,et al.  Random Forests and Kernel Methods , 2015, IEEE Transactions on Information Theory.

[5]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[6]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[7]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[8]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[9]  Daniel J. Nordman,et al.  Case-Specific Random Forests , 2016 .

[10]  Alessandro Rinaldo,et al.  Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.

[11]  A. Gammerman,et al.  On-line predictive linear regression , 2005, math/0511522.

[12]  B. Efron Estimation and Accuracy After Model Selection , 2014, Journal of the American Statistical Association.

[13]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[14]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[15]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[18]  B. Efron Jackknife‐After‐Bootstrap Standard Errors and Influence Functions , 1992 .