A Fast k-Nearest Neighbor Classifier Using Unsupervised Clustering

In this paper we propose a fast method to classify patterns when using a k-nearest neighbor (kNN) classifier. The kNN classifier is one of the most popular supervised classification strategies. It is easy to implement, and easy to use. However, for large training data sets, the process can be time consuming due to the distance calculation of each test sample to the training samples. Our goal is to provide a generic method to use the same classification strategy, but considerably speed up the distance calculation process. First, the training data is clustered in an unsupervised manner to find the ideal cluster setup to minimize the intra-class dispersion, using the so-called “jump” method. Once the clusters are defined, an iterative method is applied to select some percentage of the data closest to the cluster centers and furthest from the cluster centers, respectively. Beside some interesting property discovered by altering the different selection criteria, we proved the efficiency of the method by reducing by up to 71% the classification speed, while keeping the classification performance in the same range.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  Jianping Gou,et al.  A new distance-weighted k-nearest neighbor classifier , 2012 .

[3]  Gernot A. Fink,et al.  Lampung - a new handwritten character benchmark: database, labeling and recognition , 2011, MOCR_AND '11.

[4]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[5]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[6]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Piyush Kumar,et al.  Fast construction of k-nearest neighbor graphs for point clouds , 2010, IEEE Transactions on Visualization and Computer Graphics.

[8]  Edgar Chávez,et al.  Using the k-Nearest Neighbor Graph for Proximity Searching in Metric Spaces , 2005, SPIRE.

[9]  Gernot A. Fink,et al.  A Semi-supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort , 2011, 2011 International Conference on Document Analysis and Recognition.

[10]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[11]  Shengyu Zhang,et al.  Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design , 2009, SODA.

[12]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[13]  Sargur N. Srihari,et al.  A fast algorithm for finding k-nearest neighbors with non-metric dissimilarity , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[14]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[19]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[20]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[21]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .