Fast Nearest Neighbor Classification using Class-Based Clustering

Nearest neighbor rule (NNR) is a parameter-free classifier which is easy to implement, simple to operate and with high accuracy. However, it is time and memory consuming for large datasets. This study proposed a parameter-free method to accelerate NNR. This method employs a class-based clustering algorithm to divide the training data to several clusters with respective members belonging to the same class. Cluster representations are extracted clustering border data based on the nearest neighbors between the different class clusters. Since the cluster representations are the clustering border data rather than the clustering centers, the predicting accuracy will not be affected by removing a cluster's internal data. In the predicting phase, the nearest neighbor search area is narrowed down by referring to a distance between a testing data and its nearest cluster. Thus the predicting process is speeded up. In this paper, the performance of the proposed method was evaluated and compared with NNR, K-NNR, and LIBSVM by using 5 benchmark datasets. Experimental results show that the proposed parameter-free classification algorithm is very easy to operate and gives consideration to speed and accuracy.

[1]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[2]  Chih-Chiang Lin,et al.  A New Binary Classifier: Clustering-Launched Classification , 2006, ICIC.

[3]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[4]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[5]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[6]  Tung-Shou Chen,et al.  A new binary support vector system for increasing detection rate of credit card fraud , 2006, Int. J. Pattern Recognit. Artif. Intell..

[7]  Chien-Hsing Chou,et al.  The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[8]  Sargur N. Srihari,et al.  Fast k-nearest neighbor classification using cluster-based trees , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Tung-Shou Chen,et al.  Classification of Microarray Gene Expression Data Using a New Binary Support Vector System , 2005, 2005 International Conference on Neural Networks and Brain.

[10]  Robert F. Sproull,et al.  Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[11]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[12]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[13]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[18]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[19]  Randy L. Brown Accelerated template matching using template trees grown by condensation , 1995, IEEE Trans. Syst. Man Cybern..