Random forests: from early developments to recent advancements

Ensemble classification is a data mining approach that utilizes a number of classifiers that work together in order to identify the class label for unlabeled instances. Random forest (RF) is an ensemble classification approach that has proved its high accuracy and superiority. With one common goal in mind, RF has recently received considerable attention from the research community to further boost its performance. In this paper, we look at developments of RF from birth to present. The main aim is to describe the research done to date and also identify potential and future developments to RF. Our approach in this review paper is to take a historical view on the development of this notably successful classification technique. We start with developments that were found before Breiman's introduction of the technique in 2001, by which RF has borrowed some of its components. We then delve into dealing with the main technique proposed by Breiman. A number of developments to enhance the original technique are then presented and summarized. Successful applications that utilized RF are discussed, before a discussion of possible directions of research is finally given.

[1]  Yunming Ye,et al.  Hybrid weighted random forests for classifying very high-dimensional data , 2012 .

[2]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[3]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[4]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[5]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[6]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[7]  Gian Luca Foresti,et al.  Meta Random Forests , 2006 .

[8]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[9]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[11]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[12]  Christopher Conrad,et al.  Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble , 2012, Remote Sensing.

[13]  Wei Hu,et al.  Identifying predictive markers of chemosensitivity of breast cancer with random forests , 2010 .

[14]  Sean T. Green,et al.  Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards , 2011, Population health metrics.

[15]  Rafael A. Calvo,et al.  Accuracy and Diversity in Ensembles of Text Categorisers , 2005, CLEI Electron. J..

[16]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[17]  HoTin Kam The Random Subspace Method for Constructing Decision Forests , 1998 .

[18]  Weizhong Yan,et al.  Designing classifier ensembles with constrained performance requirements , 2004, SPIE Defense + Commercial Sensing.

[19]  C. King,et al.  Measuring the burden of arboviral diseases: the spectrum of morbidity and mortality from four prevalent infections , 2011, Population health metrics.

[20]  Fatin Zaklouta,et al.  Traffic sign classification using K-d trees and Random Forests , 2011, The 2011 International Joint Conference on Neural Networks.

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[23]  Yunming Ye,et al.  Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces , 2012, Int. J. Data Warehous. Min..

[24]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[25]  Laurent Heutte,et al.  A Study of Strength and Correlation in Random Forests , 2010, ICIC.

[26]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[29]  Hubert Naacke,et al.  Integrity Constraint in Distributed Nested Transactions over a Database Cluster , 2006 .

[30]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[31]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[32]  Mykola Pechenizkiy,et al.  Dynamic Integration with Random Forests , 2006, ECML.

[33]  Griselda Saldaña-González,et al.  Investigation of Random Forest Performance with Cancer Microarray Data , 2008, CATA.

[34]  Niklas Lavesson,et al.  Veto-based Malware Detection , 2012, 2012 Seventh International Conference on Availability, Reliability and Security.

[35]  Yong-Heng Zhao,et al.  Random forest algorithm for classification of multiwavelength data , 2009 .

[36]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[37]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[38]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[39]  Mohamed Medhat Gaber,et al.  GARF: Towards Self-optimised Random Forests , 2012, ICONIP.

[40]  Niklas Lavesson,et al.  Comparative Analysis of Voting Schemes for Ensemble-based Malware Detection , 2013, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[41]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[42]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[43]  Yu-An Sun,et al.  When majority voting fails: Comparing quality assurance methods for noisy human computation environment , 2012, ArXiv.

[44]  Marko Robnik,et al.  Improving Random Forests , 2004 .

[45]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[46]  Yung-Seop Lee,et al.  Enriched random forests , 2008, Bioinform..