Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU-CS-954-03

Recently proposed classification algorithms give estimates or worst-case bounds for the probability of misclassification [Lanckriet et al., 2002][L. Breiman, 2001]. These accuracy estimates are for all future predictions, even though some predictions are more likely to be correct than others. This paper introduces Probabilistic Random Forests (PRF), which is based on two existing algorithms, Minimax Probability Machine Classification and Random Forests, and gives data point dependent estimates of misclassification probabilities for binary classification. A PRF model outputs both a classification and a misclassification probability estimate for the data point. PRF makes it possible to assess the risk of misclassification, one prediction at a time, without detailed distribution assumptions or density estimation. Experiments show that PRFs give good estimates of the error probability for each classification.

[1]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[2]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[3]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[4]  T. W. Anderson,et al.  Classification into two Multivariate Normal Distributions with Different Covariance Matrices , 1962 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.