Survey of Nearest Neighbor Condensing Techniques

The nearest neighbor rule identifies the category of an unknown element according to its known nearest neighbors’ categories. This technique is efficient in many fields as event recognition, text categorization and object recognition. Its prime advantage is its simplicity, but its main inconvenience is its computing complexity for large training sets. This drawback was dealt by the researchers’ community as the problem of prototype selection. Trying to solve this problem several techniques presented as condensing techniques were proposed. Condensing algorithms try to determine a significantly reduced set of prototypes keeping the performance of the 1-NN rule on this set close to the one reached on the complete training set. In this paper we present a survey of some condensing KNN techniques which are CNN, RNN, FCNN, Drop1-5, DEL, IKNN, TRKNN and CBP. All these techniques can improve the efficiency in computation time. But these algorithms fail to prove the minimality of their resulting set. For this, one possibility is to hybridize them with other algorithms, called modern heuristics or metaheuristics, which, themselves, can improve the solution. The metaheuristics that have proven results in the selection of attributes are principally genetic algorithms and tabu search. We will also shed light in this paper on some recent techniques focusing on this template.

[1]  A. V. Zukhba NP-completeness of the problem of prototype selection in the nearest neighbor method , 2010, Pattern Recognition and Image Analysis.

[2]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[3]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[4]  Eiman Elnahrawy,et al.  Log-Based Chat Room Monitoring Using Text Categorization: A Comparative Study , 2002 .

[5]  Francesc J. Ferri,et al.  Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Hossein Shirgahi,et al.  Data Clustering Based on an Efficient Hybrid of K-Harmonic Means, PSO and GA , 2011, Trans. Comput. Collect. Intell..

[7]  Amir F. Atiya,et al.  A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[8]  Cigdem Inan Aci,et al.  A hybrid classification method of k nearest neighbor, Bayesian methods and genetic algorithm , 2010, Expert Syst. Appl..

[9]  Yundong Wub,et al.  AN ALGORITHM FOR REMOTE SENSING IMAGE CLASSIFICATION BASED ON ARTIFICIAL IMMUNE B – CELL NETWORK , 2008 .

[10]  Xin Yao,et al.  Evolving edited k-Nearest Neighbor Classifiers , 2008, Int. J. Neural Syst..

[11]  Fabrizio Angiulli,et al.  Fast condensed nearest neighbor rule , 2005, ICML.

[12]  Tao Ran,et al.  An Improving Tabu Search Algorithm for Intrusion Detection , 2011, 2011 Third International Conference on Measuring Technology and Mechatronics Automation.

[13]  Frank Dellaert,et al.  The Expectation Maximization Algorithm , 2002 .

[14]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[15]  Xin Yao,et al.  Using a Genetic Algorithm for Editing k-Nearest Neighbor Classifiers , 2007, IDEAL.

[16]  Joachim Denzler,et al.  A Comparison of Nearest Neighbor Search Algorithms for Generic Object Recognition , 2006, ACIVS.

[17]  Yiming Yang,et al.  Improving text categorization methods for event tracking , 2000, SIGIR '00.

[18]  Venu Govindaraju,et al.  Improved k-nearest neighbor classification , 2002, Pattern Recognit..

[19]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[20]  Q. Henry Wu,et al.  A class boundary preserving algorithm for data condensation , 2011, Pattern Recognit..

[21]  K. Thanushkodi,et al.  An Improved k-Nearest Neighbor Classification Using Genetic Algorithm , 2010 .

[22]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.