Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error.

[1]  J. H. Wilkinson,et al.  Reliable Numerical Computation. , 1992 .

[2]  N. Higham Analysis of the Cholesky Decomposition of a Semi-definite Matrix , 1990 .

[3]  Robert R. Freimuth,et al.  A weighted random forests approach to improve predictive performance , 2013, Stat. Anal. Data Min..

[4]  Alexandru Telea,et al.  International Conference on Computer Vision Theory and Applications (VISAPP) , 2014 .

[5]  Markus Breitenbach,et al.  Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU-CS-954-03 , 2003 .

[6]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[7]  Michael I. Jordan A statistical approach to decision tree modeling , 1994, COLT '94.

[8]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[11]  W. Newey,et al.  A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix , 1986 .

[12]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[13]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[15]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[16]  Howard A. Fine,et al.  Predicting in vitro drug sensitivity using Random Forests , 2011, Bioinform..

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[19]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[20]  R. Pal,et al.  An Ensemble Based Top Performing Approach for NCI-DREAM Drug Sensitivity Prediction Challenge , 2014, PloS one.

[21]  Henrik Boström,et al.  Estimating class probabilities in random forests , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[22]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.