FHC: an adaptive fast hybrid method for k-NN classification

A popular and easy to implement classifier is the k-Nearest Neighbor (k-NN). However, sequentially searching for nearest neighbors in large datasets leads to inefficient classification because of the high computational cost involved. This paper presents an adaptive hybrid and cluster-based method for speeding up the k-NN classifier. The proposed method reduces the computational cost as much as possible while maintaining classification accuracy at high levels. The method is based on the wellknown k-means clustering algorithm and consists of two main parts: (i) a preprocessing algorithm that builds a two-level, cluster-based data structure, and, (ii) a hybrid classifier that classifies new items by accessing either the first or the second level of the data structure. The proposed approach was tested on seven real life datasets and the experiential measurements were statistically validated by the Wilcoxon signed ranks test. The results show that the proposed classification method can be used either to achieve high accuracy with slightly higher cost or to reduce the cost at a minimum level with slightly lower accuracy.

[1]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[2]  Sungzoon Cho,et al.  Clustering-Based Reference Set Reduction for k-Nearest Neighbor , 2007, ISNN.

[3]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[4]  Georgios Evangelidis,et al.  An Adaptive Hybrid and Cluster-Based Model for Speeding Up the k-NN Classifier , 2012, HAIS.

[5]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[6]  Dennis F. Kibler,et al.  Learning Symbolic Prototypes , 1997, ICML.

[7]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[8]  Chien-Hsing Chou,et al.  The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[11]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[12]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[13]  Belur V. Dasarathy,et al.  Nearest Neighbour Editing and Condensing Tools–Synergy Exploitation , 2000, Pattern Analysis & Applications.

[14]  Sargur N. Srihari,et al.  Fast k-nearest neighbor classification using cluster-based trees , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[16]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[17]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[18]  C. H. Chen,et al.  A sample set condensation algorithm for the class sensitive artificial neural network , 1996, Pattern Recognit. Lett..

[19]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  Fabrizio Angiulli,et al.  Fast condensed nearest neighbor rule , 2005, ICML.

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[23]  José Salvador Sánchez,et al.  High training set size reduction by space partitioning and prototype abstraction , 2004, Pattern Recognit..

[24]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[25]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[26]  Xueyi Wang,et al.  A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality , 2011, The 2011 International Joint Conference on Neural Networks.

[27]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[28]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[29]  Georgios Evangelidis,et al.  Cluster-Based Similarity Search in Time Series , 2009, 2009 Fourth Balkan Conference in Informatics.

[30]  Georgios Evangelidis,et al.  Efficient dataset size reduction by finding homogeneous clusters , 2012, BCI '12.

[31]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[32]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  José Francisco Martínez Trinidad,et al.  Using Maximum Similarity Graphs to Edit Nearest Neighbor Classifiers , 2009, CIARP.

[34]  M. Narasimha Murty,et al.  An incremental prototype set building technique , 2002, Pattern Recognit..

[35]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .