Dynamic Nonparametric Random Forest Using Covariance

As the representative ensemble machine learning method, the Random Forest (RF) algorithm has widely been used in diverse applications on behalf of the fast learning speed and the high classification accuracy. Research on RF can be classified into two categories: improving the classification accuracy and decreasing the number of trees in a forest. However, most of papers related to the performance improvement of RF have focused on improving the classification accuracy. Only some papers have focused on reducing the number of trees in a forest. In this paper, we propose a new Covariance-Based Dynamic RF algorithm, called C-DRF. Compared to the previous works, while ensuring the good-enough classification accuracy, the proposed C-DRF algorithm reduces the number of trees. Specifically, by computing the covariance between the number of trees in a forest and - measure at each iteration, the proposed algorithm determines whether to increase the number of trees composing a forest. To evaluate the performance of the proposed C-DRF algorithm, we compared the learning time, the test time, and the memory usage with the original RF algorithm under the different areas of datasets. Under the same or higher classification accuracy, it is shown that the proposed C-DRF algorithm improves the performance of the original RF algorithm by as much as 58.68% at learning time, 47.91% at test time, and 68.06% in memory usage on average. As a practical application area, we also show that the proposed C-DRF algorithm is more efficient than the state-of-the-art RF algorithms in Network Intrusion Detection (NID) area.

[1]  Y T Gröhn,et al.  Simulation study on covariance component estimation for two binary traits in an underlying continuous scale. , 1991, Journal of dairy science.

[2]  Mohamed Medhat Gaber,et al.  An Information-Theoretic Approach for Setting the Optimal Number of Decision Trees in Random Forests , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[3]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[4]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[5]  Laurent Heutte,et al.  Dynamic Random Forests , 2012, Pattern Recognit. Lett..

[6]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  M. A. Jabbar,et al.  Random Forest Modeling for Network Intrusion Detection System , 2016 .

[8]  Steven Salzberg,et al.  Decision Tree Induction: How Effective is the Greedy Heuristic? , 1995, KDD.

[9]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[10]  Robert E. Skelton,et al.  Covariance controllers for linear continuous-time systems , 1989 .

[11]  Yee Whye Teh,et al.  Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.

[12]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[13]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  R. Little Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values , 1988 .

[16]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[17]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[18]  Grigorios Tsoumakas,et al.  Effective Stacking of Distributed Classifiers , 2002, ECAI.

[19]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[22]  Kristian Skrede Gleditsch,et al.  Mapping and Measuring Country Shapes , 2010 .

[23]  Jonathan J. Oliver Decision Graphs - An Extension of Decision Trees , 1993 .

[24]  Wei Fan,et al.  Using Conflicts Among Multiple Base Classifiers to Measure the Performance of Stacking , 1999 .

[25]  Steve Y. Chiu,et al.  A Hill-Climbing Approach for Optimizing Classification Trees , 1995, AISTATS.

[26]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .

[27]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Igor Chikalov,et al.  Decision Tree Construction using Greedy Algorithms and Dynamic Programming – Comparative Study , 2011 .

[29]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Mohammad Zulkernine,et al.  Random-Forests-Based Network Intrusion Detection Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[31]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.