Estimating class probabilities in random forests

For both single probability estimation trees (PETs) and ensembles of such trees, commonly employed class probability estimates correct the observed relative class frequencies in each leaf to avoid anomalies caused by small sample sizes. The effect of such corrections in random forests of PETs is investigated, and the use of the relative class frequency is compared to using two corrected estimates, the Laplace estimate and the m-estimate. An experiment with 34 datasets from the UCI repository shows that estimating class probabilities using relative class frequency clearly outperforms both using the Laplace estimate and the m-estimate with respect to accuracy, area under the ROC curve (AUC) and Brier score. Hence, in contrast to what is commonly employed for PETs and ensembles of PETs, these results strongly suggest that a non-corrected probability estimate should be used in random forests of PETs. The experiment further shows that learning random forests of PETs using relative class frequency significantly outperforms learning random forests of classification trees (i.e., trees for which only an unweighted vote on the most probable class is counted) with respect to both accuracy and AUC, but that the latter is clearly ahead of the former with respect to Brier score.

[1]  Solange Oliveira Rezende,et al.  A methodology for identifying interesting association rules by combining objective and subjective measures , 2006, Inteligencia Artif..

[2]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[3]  C. Ling,et al.  Decision Tree with Better Ranking , 2003, ICML.

[4]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[5]  Thomas G. Dietterich,et al.  Improved Class Probability Estimates from Decision Tree Models , 2003 .

[6]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[7]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[8]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[9]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  B. Shekar,et al.  Interestingness of association rules in data mining: Issues relevant to e-commerce , 2005 .

[11]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[12]  Moisés Goldszmidt,et al.  Properties and Benefits of Calibrated Classifiers , 2004, PKDD.

[13]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[14]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[15]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.