Effective Learning and Classification using Random Forest Algorithm

Random Forest is a supervised machine learning algorithm. In Data Mining domain, machine learning algorithms are extensively used to analyze data, and generate predictions based on this data. Being an ensemble algorithm, Random Forest generates multiple decision trees as base classifiers and applies majority voting to combine the outcomes of the base trees. Strength of individual decision trees and correlation among the base trees are key issues which decide generalization error of Random Forest classifiers. Based on accuracy measure, Random Forest classifiers are at par with existing ensemble techniques like bagging and boosting. In this research work an attempt is made to improve performance of Random Forest classifiers in terms of accuracy, and time required for learning and classification. To achieve this, five new approaches are proposed. The empirical analysis and outcomes of experiments carried out in this research work lead to effective learning and classification using Random Forest algorithm.

[1]  Laurent Heutte,et al.  Dynamic Random Forests , 2012, Pattern Recognit. Lett..

[2]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[3]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[4]  Simon Bernard,et al.  Random Forest Classifiers : A Survey and Future Research Directions , 2013 .

[5]  Laurent Heutte,et al.  On the selection of decision trees in Random Forests , 2009, 2009 International Joint Conference on Neural Networks.

[6]  Dimitrios I. Fotiadis,et al.  Dynamic construction of Random Forests: Evaluation using biomedical engineering problems , 2010, Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine.

[7]  Heping Zhang,et al.  Search for the smallest random forest. , 2009, Statistics and its interface.

[8]  Gian Luca Foresti,et al.  Meta Random Forests , 2006 .

[9]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[10]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Laurent Heutte,et al.  Towards a Better Understanding of Random Forests through the Study of Strength and Correlation , 2009, ICIC.

[12]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[13]  Håkan Grahn,et al.  A CUDA Implementation of Random Forests : Early Results , 2010 .

[14]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[15]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[16]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[19]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[20]  P. K. Sinha,et al.  Analyzing Random Forest Classifier with Different Split Measures , 2012, SocProS.