A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method

The K-nearest neighbor (KNN) rule is one of the most widely used pattern classification algorithms. For large data sets, the computational demands for classifying patterns using KNN can be prohibitive. A way to alleviate this problem is through the condensing approach. This means we remove patterns that are more of a computational burden but do not contribute to better classification accuracy. In this brief, we propose a new condensing algorithm. The proposed idea is based on defining the so-called chain. This is a sequence of nearest neighbors from alternating classes. We make the point that patterns further down the chain are close to the classification boundary and based on that we set a cutoff for the patterns we keep in the training set. Experiments show that the proposed approach effectively reduces the number of prototypes while maintaining the same level of classification accuracy as the traditional KNN. Moreover, it is a simple and a fast condensing algorithm.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  B. Bhattacharya,et al.  - 1-APPLICATION OF PROXIMITY GRAPHS TO EDITING NEAREST NEIGHBOR DECISION RULES * , 2010 .

[4]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[5]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[6]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[7]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[8]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[9]  Włodzisław Duch,et al.  Similarity-based methods: a general framework for classification, approximation and association , 2000 .

[10]  Tommy W. S. Chow,et al.  Enhancing Density-Based Data Reduction Using Entropy , 2006, Neural Computation.

[11]  José Salvador Sánchez,et al.  High training set size reduction by space partitioning and prototype abstraction , 2004, Pattern Recognit..

[12]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[13]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[14]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[15]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[18]  Francesc J. Ferri,et al.  Colour image segmentation and labeling through multiedit-condensing , 1992, Pattern Recognit. Lett..

[19]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[20]  Venu Govindaraju,et al.  Improved k-nearest neighbor classification , 2002, Pattern Recognit..

[21]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[22]  Enrique Vidal,et al.  Learning prototypes and distances (LPD). A prototype reduction technique based on nearest neighbor error minimization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[24]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[25]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[26]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[27]  José Salvador Sánchez,et al.  An LVQ-based adaptive algorithm for learning from very small codebooks , 2006, Neurocomputing.

[28]  José Salvador Sánchez,et al.  Decision boundary preserving prototype selection for nearest neighbor classification , 2005, Int. J. Pattern Recognit. Artif. Intell..

[29]  David G. Stork,et al.  Pattern Classification , 1973 .

[30]  D. Fraser Nonparametric methods in statistics , 1957 .

[31]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[32]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[33]  Francesc J. Ferri,et al.  An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering , 2002, Pattern Recognit..