Adaptive random forest — How many “experts” to ask before making a decision?

How many people should you ask if you are not sure about your way? We provide an answer to this question for Random Forest classification. The presented method is based on the statistical formulation of confidence intervals and conjugate priors for binomial as well as multinomial distributions. We derive appealing decision rules to speed up the classification process by leveraging the fact that many samples can be clearly mapped to classes. Results on test data are provided, and we highlight the applicability of our method to a wide range of problems. The approach introduces only one non-heuristic parameter, that allows to trade-off accuracy and speed without any re-training of the classifier. The proposed method automatically adapts to the difficulty of the test data and makes classification significantly faster without deteriorating the accuracy.

[1]  R. Z. Gold Tests Auxiliary to $\chi^2$ Tests in a Markov Chain , 1963 .

[2]  D. C. Hurst,et al.  Large Sample Simultaneous Confidence Intervals for Multinomial Proportions , 1964 .

[3]  L. A. Goodman On Simultaneous Confidence Intervals for Multinomial Proportions , 1965 .

[4]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[5]  B. Bailey Large Sample Simultaneous Confidence Intervals for the Multinomial Probabilities Based on Transformations of the Cell Frequencies , 1980 .

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[8]  K. Kwong,et al.  On singular multivariate normal distribution and its applications , 1996 .

[9]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[10]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  A. Genz,et al.  Numerical evaluation of singular multivariate normal distributions , 2000 .

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Jiri Matas,et al.  WaldBoost - learning for time constrained sequential detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Jonathan Brandt,et al.  Robust object detection via soft cascade , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[21]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jiri Matas,et al.  Optimal Randomized RANSAC , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[24]  Djalil CHAFAÏ,et al.  Confidence Regions for the Multinomial Parameter With Small Sample Size , 2008, 0805.1971.

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  Andrew Blake,et al.  Discriminative, Semantic Segmentation of Brain Tissue in MR Images , 2009, MICCAI.

[27]  Andrew Blake,et al.  Random Forest Classification for Automatic Delineation of Myocardium in Real-Time 3D Echocardiography , 2009, FIMH.

[28]  Tim Weyrich,et al.  Capturing Time-of-Flight data with confidence , 2011, CVPR 2011.