Random forest (RF) methodology is a nonparametric methodology for prediction problems. A standard way to use RFs includes generating a global RF to predict all test cases of interest. In this article, we propose growing different RFs specific to different test cases, namely case-specific random forests (CSRFs). In contrast to the bagging procedure in the building of standard RFs, the CSRF algorithm takes weighted bootstrap resamples to create individual trees, where we assign large weights to the training cases in close proximity to the test case of interest a priori. Tuning methods are discussed to avoid overfitting issues. Both simulation and real data examples show that the weighted bootstrap resampling used in CSRF construction can improve predictions for specific cases. We also propose a new case-specific variable importance (CSVI) measure as a way to compare the relative predictor variable importance for predicting a particular case. It is possible that the idea of building a predictor case-specifically can be generalized in other areas.
[1]
G. Casella,et al.
Statistical Inference
,
2003,
Encyclopedia of Social Network Analysis and Mining.
[2]
Yung-Seop Lee,et al.
Enriched random forests
,
2008,
Bioinform..
[3]
Andy Liaw,et al.
Classification and Regression by randomForest
,
2007
.
[4]
Nicolai Meinshausen,et al.
Quantile Regression Forests
,
2006,
J. Mach. Learn. Res..
[5]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.
[6]
Leo Breiman,et al.
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
,
2001
.
[7]
Ian Foster,et al.
Designing and building parallel programs
,
1994
.
[8]
Hemant Ishwaran,et al.
Random Survival Forests
,
2008,
Wiley StatsRef: Statistics Reference Online.
[9]
Leo Breiman,et al.
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
,
2001,
Statistical Science.
[10]
Mark R. Segal,et al.
Multivariate random forests
,
2011,
WIREs Data Mining Knowl. Discov..