Evolutionary Optimization on k-Nearest Neighbors Classifier for Imbalanced Datasets

Classification is a significant task in machine learning. Common classification algorithms include decision tree, k-nearest neighbors (kNN), support vector machine, and so on. The kNN algorithm is a simple yet useful classification tool in practice. However, its performance may be deeply affected by the properties of dataset, such as imbalance of classes and noisy features. To deal with these problems and enhance the performance of classification, this study proposes optimizing the distance function and class-voting weights of kNN by genetic algorithm. The proposed method is called evolutionary optimized feature and class weights kNN (FCWkNN). The performance of FCWkNN is examined on different datasets. Experimental results show that the combined optimization of feature weights and class weights is capable of improving kNN on imbalanced datasets.

[1]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Rabab Kreidieh Ward,et al.  Genetic algorithms for feature selection and weighting, a review and study , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Songbo Tan,et al.  Neighbor-weighted K-nearest neighbor for unbalanced text corpus , 2005, Expert Syst. Appl..

[5]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[6]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[7]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Ahmed Bouridane,et al.  Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier , 2007, Pattern Recognit. Lett..

[10]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[11]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[12]  A. Ghosh On optimum choice of k in nearest neighbor classification , 2006 .

[13]  Carlotta Domeniconi,et al.  Nearest neighbor ensemble , 2004, ICPR 2004.

[14]  Lawrence Davis,et al.  A Hybrid Genetic Algorithm for Classification , 1991, IJCAI.

[15]  Bulusu Lakshmana Deekshatulu,et al.  Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm , 2015, ArXiv.

[16]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[17]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[18]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[19]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[20]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[21]  Alexandros Agapitos,et al.  Adaptive Distance Metrics for Nearest Neighbour Classification Based on Genetic Programming , 2013, EuroGP.

[22]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[23]  Qing-Cai Chen,et al.  GA-based feature subset clustering for combination of multiple nearest neighbors classifiers , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[24]  E. Næsset,et al.  Optimizing the k-Nearest Neighbors technique for estimating forest aboveground biomass using airborne laser scanning data , 2015 .

[25]  Richard J. Enbody,et al.  Further Research on Feature Selection and Classification Using Genetic Algorithms , 1993, ICGA.

[26]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[27]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[28]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[29]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[30]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[31]  C. Holmes,et al.  A probabilistic nearest neighbour method for statistical pattern recognition , 2002 .