A simple algorithm for nearest neighbor search in high dimensions

The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user-specified distance /spl epsiv/. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance /spl epsiv/. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine /spl epsiv/ in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available.

[1]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[2]  Sorting and searching" the art of computer programming , 1973 .

[3]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[4]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[5]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[6]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[7]  Thomas P. Yunck,et al.  A Technique to Identify Nearest Neighbors , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Richard J. Lipton,et al.  Multidimensional Searching Problems , 1976, SIAM J. Comput..

[9]  Ellis Horowitz,et al.  Fundamentals of Data Structures , 1984 .

[10]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[11]  Jon Louis Bentley,et al.  Multidimensional Binary Search Trees in Database Applications , 1979, IEEE Transactions on Software Engineering.

[12]  V. Klee On the complexity ofd- dimensional Voronoi diagrams , 1979 .

[13]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[14]  Bruce W. Weide,et al.  Optimal Expected-Time Algorithms for Closest Point Problems , 1980, TOMS.

[15]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[16]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[17]  Irene Gargantini,et al.  An effective way to represent quadtrees , 1982, CACM.

[18]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[19]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[20]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[21]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[22]  Christos Faloutsos,et al.  The A dynamic index for multidimensional ob-jects , 1987, Very Large Data Bases Conference.

[23]  Arun N. Netravali,et al.  Digital Pictures: Representation and Compression , 1988 .

[24]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[25]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[26]  Haim J. Wolfson,et al.  Model-Based Object Recognition by Geometric Hashing , 1990, ECCV.

[27]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[28]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[29]  Rakesh Mohan,et al.  Multidimensional indexing for recognizing visual shapes , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[31]  András Faragó,et al.  Fast Nearest-Neighbor Search in Dissimilarity Spaces , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[33]  Hiroshi Murase,et al.  Learning and recognition of 3D objects from appearance , 1993, [1993] Proceedings IEEE Workshop on Qualitative Vision.

[34]  Hiroshi Murase,et al.  Learning, positioning, and tracking visual appearance , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[35]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[36]  Euripides G. M. Petrakis,et al.  Similarity Searching in Large Image DataBases , 1994 .

[37]  Juan Miguel Vilar,et al.  Reducing the Overhead of the AESA Metric-Space Nearest Neighbour Searching Algorithm , 1995, Inf. Process. Lett..

[38]  Arun N. Netravali,et al.  Digital Pictures: Representation, Compression and Standards , 1995 .

[39]  S. Arya Nearest neighbor searching and applications , 1996 .

[40]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.