An ensemble method

This paper gives an ensemble method called EKNN-RF. Its base classifiers use an enhanced KNN algorithm where an optimal nearest neighbor number and a distance function on a validation set are obtained to make these parameters better reflect the distribution of real data. The feature set of each base classifier is obtained through bootstrap sampling from original feature set, and make the features with higher importance have a better weight. Then the training set of each base classifier is also obtained by bootstrap sampling based original training set and the newly generated feature set. Finally, each base classifier votes to determine the classification result. Experimental results show that compared with Adaboost, Naive Bayes, RandomForest, DCT-KNN [1], LMKNN+DWKNN [2], W-KNN [3], dwh-KNN [4] and LI-KNN [5], the ensemble method EKNN-RF has certain advantages and higher classification accuracy on some datasets.

[1]  Le Zhang,et al.  An ensemble of decision trees with random vector functional link networks for multi-class classification , 2017, Appl. Soft Comput..

[2]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[3]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  Dunja Mladenic,et al.  Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2011, International Journal of Machine Learning and Cybernetics.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Friedhelm Schwenker,et al.  Ensemble Methods: Foundations and Algorithms [Book Review] , 2013, IEEE Computational Intelligence Magazine.

[8]  Li Zhang,et al.  Weigted-KNN and its application on UCI , 2015, International Conference on Informatics and Analytics.

[9]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[10]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[11]  Yang Song,et al.  IKNN: Informative K-Nearest Neighbor Pattern Classification , 2007, PKDD.

[12]  Hyun-Chul Kim,et al.  Constructing support vector machine ensemble , 2003, Pattern Recognit..

[13]  Jianping Fan,et al.  Least squares kernel ensemble regression in Reproducing Kernel Hilbert Space , 2018, Neurocomputing.

[14]  Jie Huang,et al.  An Improved kNN Based on Class Contribution and Feature Weighting , 2018, 2018 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA).

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Nenad Tomašev,et al.  Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2014 .

[17]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[18]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[19]  O S Sitompul,et al.  Improving the accuracy of k-nearest neighbor using local mean based and distance weight , 2018 .

[20]  Kaoru Ota,et al.  Ensemble Classification for Skewed Data Streams Based on Neural Network , 2018, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.