Fast k-nearest neighbors search using modified principal axis search tree

The problem of k-nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. Among available methods, the principal axis search tree (PAT) algorithm always has good performance on finding nearest k neighbors using the PAT structure and a node elimination criterion. In this paper, a novel kNN search algorithm is proposed. The proposed algorithm stores projection values for all data points in leaf nodes. If a leaf node in the PAT cannot be rejected by the node elimination criterion, data points in the leaf node are further checked using their pre-stored projection values to reject more impossible data points. Experimental results show that the proposed method can effectively reduce the number of distance calculations and computation time for the PAT algorithm, especially for the data set with a large dimension or for a search tree with large number of data points in a leaf node.

[1]  Jeng-Shyang Pan,et al.  Hadamard transform based fast codeword search algorithm for high-dimensional VQ encoding , 2004, The 2004 IEEE Asia-Pacific Conference on Circuits and Systems, 2004. Proceedings..

[2]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[3]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4]  Yi-Ching Liaw Improvement of the fast exact pairwise-nearest-neighbor algorithm , 2009, Pattern Recognit..

[5]  Robert M. Gray,et al.  An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization , 1985, IEEE Trans. Commun..

[6]  Jim Z. C. Lai,et al.  Image restoration of compressed image using classified vector quantization , 2002, Pattern Recognit..

[7]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[8]  Allen Gersho,et al.  Fast search algorithms for vector quantization and pattern matching , 1984, ICASSP.

[9]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[10]  Chin-Chen Chang,et al.  A near pattern-matching scheme based upon principal component analysis , 1995, Pattern Recognit. Lett..

[11]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[12]  Yi-Ching Liaw,et al.  Fast k-nearest-neighbor search based on projection and triangular inequality , 2007, Pattern Recognit..

[13]  Song B. Park,et al.  A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  James McNames,et al.  A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Joachim M. Buhmann,et al.  Optimal Cluster Preserving Embedding of Nonmetric Proximity Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Ben Wang,et al.  Integration of Projected Clusters and Principal Axis Trees for High-Dimensional Data Indexing and Query , 2004, IDEAL.

[17]  Luisa Micó,et al.  A fast branch & bound nearest neighbour classifier in metric spaces , 1996, Pattern Recognit. Lett..

[18]  Jeng-Shyang Pan,et al.  An efficient encoding algorithm for vector quantization based on subvector technique , 2003, IEEE Trans. Image Process..

[19]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[20]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .