An improvement to k-nearest neighbor classifier

Non-parametric methods like Nearest neighbor classifier (NNC) and its variants such as k-nearest neighbor classifier (k-NNC) are simple to use and often shows good performance in practice. It stores all training patterns and searches to find k nearest neighbors of the given test pattern. Some fundamental improvements to k-NNC are (i) weighted k-nearest neighbor classifier (wk-NNC) where a weight to each of the neighbors is given and is used in the classification, (ii) to use a bootstrapped training set instead of the given training set, etc. Hamamoto et. al. [1] has given a bootstrapping method, where a training pattern is replaced by a weighted mean of a few of its neighbors from its own class of training patterns. It is shown to improve the classification accuracy in most of the cases. The time to create the bootstrapped set is O(n2) where n is the number of training patterns. This paper presents a novel improvement to the k-NNC called k-Nearest Neighbor Mean Classifier (k-NNMC). k-NNMC finds k nearest neighbors for each class of training patterns separately, and finds means for each of these k neighbors (class-wise). Classification is done according to the nearest mean pattern. It is shown experimentally using several standard data-sets that the proposed classifier shows better classification accuracy over k-NNC, wk-NNC and k-NNC using Hamamoto's bootstrapped training set. Further, the proposed method does not have a design phase as the Hamamoto's method, and this is suitable for parallel implementations which can be coupled with any indexing and space reduction methods easily. It is a suitable method to be used in data mining applications.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  A. Richard Newton,et al.  Sketched symbol recognition using Zernike moments , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[3]  M. Narasimha Murty,et al.  An Efficient Parzen-Window Based Network Intrusion Detector Using a Pattern Synthesis Technique , 2005, PReMI.

[4]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[6]  M. Narasimha Murty,et al.  Bootstrapping for efficient handwritten digit recognition , 2001, Pattern Recognit..

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[9]  Yoshihiko Hamamoto,et al.  A Bootstrap Technique for Nearest Neighbor Classifier Design , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[11]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Small Sample Performance , 1952 .

[12]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[13]  M. Narasimha Murty,et al.  Partition based pattern synthesis technique with efficient algorithms for nearest neighbor classification , 2006, Pattern Recognit. Lett..

[14]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[15]  P. Viswanath,et al.  Rough-fuzzy weighted k-nearest leader classifier for large data sets , 2009, Pattern Recognit..

[16]  Rabab Kreidieh Ward,et al.  Vector Quantization Technique for Nonparametric Classifier Design , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  J. Flusser,et al.  Moments and Moment Invariants in Pattern Recognition , 2009 .

[18]  M. Narasimha Murty,et al.  An incremental data mining algorithm for compact realization of prototypes , 2001, Pattern Recognit..

[19]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[20]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[21]  T. Ravindra Babu,et al.  Comparison of genetic algorithm based prototype selection schemes , 2001, Pattern Recognit..

[22]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[23]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[24]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[25]  Belur V. Dasarathy,et al.  Data mining tasks and methods: Classification: nearest-neighbor approaches , 2002 .