Decision forest: Twenty years of research

Abstract A decision tree is a predictive model that recursively partitions the covariate’s space into subspaces such that each subspace constitutes a basis for a different prediction function. Decision trees can be used for various learning tasks including classification, regression and survival analysis. Due to their unique benefits, decision trees have become one of the most powerful and popular approaches in data science. Decision forest aims to improve the predictive performance of a single decision tree by training multiple trees and combining their predictions. This paper provides an introduction to the subject by explaining how a decision forest can be created and when it is most valuable. In addition, we are reviewing some popular methods for generating the forest, fusion the individual trees’ outputs and thinning large decision forests.

[1]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[2]  Nitesh V. Chawla,et al.  Learning Ensembles from Bites: A Scalable and Accurate Approach , 2004, J. Mach. Learn. Res..

[3]  Steven Salzberg,et al.  Lookahead and Pathology in Decision Tree Induction , 1995, IJCAI.

[4]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[5]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[6]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[7]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007 .

[13]  Gonzalo Martínez-Muñoz,et al.  Switching class labels to generate classification ensembles , 2005, Pattern Recognit..

[14]  Salvatore J. Stolfo,et al.  Cost Complexity-Based Pruning of Ensemble Classifiers , 2001, Knowledge and Information Systems.

[15]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[16]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[17]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[18]  Lior Rokach,et al.  OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Chun-Xia Zhang,et al.  RotBoost: A technique for combining Rotation Forest and AdaBoost , 2008, Pattern Recognit. Lett..

[20]  Daniel Hernández-Lobato,et al.  Statistical Instance-Based Pruning in Ensembles of Independent Classifiers , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  L. Rokach,et al.  Data mining by attribute decomposition with semiconductor manufacturing case study , 2001 .

[22]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[23]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[24]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[25]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[26]  Chandrika Kamath,et al.  Approximate Splitting for Ensembles of Trees using Histograms , 2001, SDM.

[27]  Hong Fang,et al.  Decision forest for classification of gene expression data , 2010, Comput. Biol. Medicine.

[28]  Wensheng Zhang,et al.  Ranking with decision tree , 2008, Knowledge and Information Systems.

[29]  Lior Rokach,et al.  Genetic algorithm-based feature set partitioning for classification problems , 2008, Pattern Recognit..

[30]  Denis Larocque,et al.  A review of survival trees , 2011 .

[31]  Amnon Meisels,et al.  A Decision Tree Based Recommender System , 2010, IICS.

[32]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[33]  Terry Windeatt,et al.  An Empirical Comparison of Pruning Methods for Ensemble Classifiers , 2001, IDA.

[34]  Lior Rokach,et al.  Random Projection Ensemble Classifiers , 2009, ICEIS.

[35]  Ping Li,et al.  ABC-boost: adaptive base class boost for multi-class classification , 2008, ICML '09.

[36]  Gary M. Weiss,et al.  Are Decision Trees Always Greener on the Open (Source) Side of the Fence? , 2009, DMIN.

[37]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[38]  Laurent Heutte,et al.  Dynamic Random Forests , 2012, Pattern Recognit. Lett..

[39]  Gonzalo Martínez-Muñoz,et al.  Using all data to generate decision tree ensembles , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[41]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[42]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[43]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[44]  Christino Tamon,et al.  On the Boosting Pruning Problem , 2000, ECML.

[45]  Lior Rokach,et al.  Attribute-Driven Hidden Markov Model Trees for Intention Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[46]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[47]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Kamal A. Ali,et al.  On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles , 1995 .

[49]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[50]  Slava Kisilevich,et al.  Initial Profile Generation in Recommender Systems Using Pairwise Comparison , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[51]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[52]  Ran El-Yaniv,et al.  Variance Optimized Bagging , 2002, ECML.

[53]  Aníbal R. Figueiras-Vidal,et al.  Post-aggregation of classifier ensembles , 2015, Inf. Fusion.

[54]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[55]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[56]  Shaul Markovitch,et al.  Lookahead-based algorithms for anytime induction of decision trees , 2004, ICML.

[57]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[58]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[59]  Alex Alves Freitas,et al.  Automatic Design of Decision-Tree Induction Algorithms , 2015, SpringerBriefs in Computer Science.

[60]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[61]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[62]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[63]  Geoffrey I. Webb,et al.  Feature-subspace aggregating: ensembles for stable and unstable learners , 2011, Machine Learning.

[64]  Heping Zhang,et al.  Search for the smallest random forest. , 2009, Statistics and its interface.

[65]  Johannes Fürnkranz,et al.  Efficient prediction algorithms for binary decomposition techniques , 2011, Data Mining and Knowledge Discovery.

[66]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[67]  Thomas G. Dietterich,et al.  Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms , 2008 .

[68]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[69]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[70]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[71]  Wei Tang,et al.  Selective Ensemble of Decision Trees , 2003, RSFDGrC.

[72]  Qinghua Hu,et al.  EROS: Ensemble rough subspaces , 2007, Pattern Recognit..

[73]  Shih-Wei Lin,et al.  Parameter determination and feature selection for C4.5 algorithm using scatter search approach , 2012, Soft Comput..

[74]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[75]  Yaxin Bi The impact of diversity on the accuracy of evidential classifier ensembles , 2012, Int. J. Approx. Reason..

[76]  Lior Rokach,et al.  Selective Voting - Getting More for Less in Sensor Fusion , 2006, Int. J. Pattern Recognit. Artif. Intell..

[77]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[78]  Grigorios Tsoumakas,et al.  An ensemble uncertainty aware measure for directed hill climbing ensemble pruning , 2010, Machine Learning.

[79]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[80]  Lior Rokach,et al.  Troika - An improved stacking schema for classification tasks , 2009, Inf. Sci..

[81]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[82]  David W. Opitz,et al.  Actively Searching for an E(cid:11)ective Neural-Network Ensemble , 1996 .

[83]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[85]  Indranil Palit,et al.  Scalable and Parallel Boosting with MapReduce , 2012, IEEE Transactions on Knowledge and Data Engineering.

[86]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[87]  Christopher J. C. Burges,et al.  Scaling Up Machine Learning: Large-Scale Learning to Rank Using Boosted Decision Trees , 2011 .

[88]  Günther Palm,et al.  Semi-supervised learning for tree-structured ensembles of RBF networks with Co-Training , 2010, Neural Networks.

[89]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[90]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[92]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[93]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[94]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[95]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[96]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[97]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[98]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[99]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[100]  D. Geman,et al.  Randomized Inquiries About Shape: An Application to Handwritten Digit Recognition. , 1994 .

[101]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[102]  Lior Rokach,et al.  Collective-agreement-based pruning of ensembles , 2009, Comput. Stat. Data Anal..

[103]  Meir Kalech,et al.  Survival Analysis of Automobile Components Using Mutually Exclusive Forests , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[104]  Caroline Petitjean,et al.  One class random forests , 2013, Pattern Recognit..

[105]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[106]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[107]  Daryl Pregibon,et al.  Tree-based models , 1992 .

[108]  B.V. Dasarathy,et al.  A composite classifier system design: Concepts and methodology , 1979, Proceedings of the IEEE.