Forest CERN: A New Decision Forest Building Technique

Persistent efforts are going on to propose more accurate decision forest building techniques. In this paper, we propose a new decision forest building technique called “Forest by Continuously Excluding Root Node (Forest CERN)”. The key feature of the proposed technique is that it strives to exclude attributes that participated in the root nodes of previous trees by imposing penalties on them to obstruct them appear in some subsequent trees. Penalties are gradually lifted in such a manner that those attributes can reappear after a while. Other than that, our technique uses bootstrap samples to generate predefined number of trees. The target of the proposed algorithm is to maximize tree diversity without impeding individual tree accuracy. We present an elaborate experimental results involving fifteen widely used data sets from the UCI Machine Learning Repository. The experimental results indicate the effectiveness of the proposed technique in most of the cases.

[1]  Md Zahidul Islam,et al.  Knowledge Discovery through SysFor - a Systematically Developed Forest of Multiple Decision Trees , 2011, AusDM.

[2]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[7]  Md. Nasim Adnan On Dynamic Selection of Subspace for Random Forest , 2014, ADMA.

[8]  Huiqing Liu,et al.  Ensembles of cascading trees , 2003, Third IEEE International Conference on Data Mining.

[9]  Hua Wang,et al.  A maximally diversified multiple decision tree algorithm for microarray data classification , 2006 .

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Laurent Heutte,et al.  Forest-RK: A New Random Forest Induction Method , 2008, ICIC.

[12]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Gonzalo Martínez-Muñoz,et al.  Out-of-bag estimation of the optimal sample size in bagging , 2010, Pattern Recognit..

[15]  Mehmet Fatih Amasyali,et al.  Classifier Ensembles with the Extended Space Forest , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[20]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.