An Improved k-Nearest Neighbor Classification Using Genetic Algorithm

k-Nearest Neighbor (KNN) is one of the most popular algorithms for pattern recognition. Many researchers have found that the KNN algorithm accomplishes very good performance in their experiments on different data sets. The traditional KNN text classification algorithm has three limitations: (i) calculation complexity due to the usage of all the training samples for classification, (ii) the performance is solely dependent on the training set, and (iii) there is no weight difference between samples. To overcome these limitations, an improved version of KNN is proposed in this paper. Genetic Algorithm (GA) is combined with KNN to improve its classification performance. Instead of considering all the training samples and taking k-neighbors, the GA is employed to take k-neighbors straightaway and then calculate the distance to classify the test samples. Before classification, initially the reduced feature set is received from a novel method based on Rough set theory hybrid with Bee Colony Optimization (BCO) as we have discussed in our earlier work. The performance is compared with the traditional KNN, CART and SVM classifiers.

[1]  Yu Wang,et al.  A Fast KNN Algorithm for Text Categorization , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[2]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[3]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[4]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[5]  K. Thanushkodi,et al.  A weighted bee colony optimisation hybrid with rough set reduct algorithm for feature selection in the medical domain , 2011, Int. J. Granul. Comput. Rough Sets Intell. Syst..

[6]  Yan Qiu Chen,et al.  Improving nearest neighbor classification with cam weighted distance , 2006, Pattern Recognit..

[7]  Qinghua Hu,et al.  Neighborhood classifiers , 2008, Expert Syst. Appl..

[8]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  A. Ghosh On optimum choice of k in nearest neighbor classification , 2006 .

[12]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[13]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[14]  Anil K. Ghosh,et al.  On optimum choice of k , 2006, Comput. Stat. Data Anal..

[15]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[18]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.