Efficient search for approximate nearest neighbor in high dimensional spaces

We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L1 norm and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension. (Depending on the case, the extra factors are polylogarithmic in the size of the database.)

[1]  Noam Nisan,et al.  Neighborhood preserving hashing and approximate queries , 1994, SODA '94.

[2]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  L. Devroye,et al.  8 Nearest neighbor methods in discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[4]  Arnold W. M. Smeulders,et al.  Image Databases and Multi-Media Search , 1998, Image Databases and Multi-Media Search.

[5]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[6]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[7]  Ketan Mulmuley,et al.  Computational geometry : an introduction through randomized algorithms , 1993 .

[8]  Richard J. Lipton,et al.  Multidimensional Searching Problems , 1976, SIAM J. Comput..

[9]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Other Conferences.

[10]  Santosh S. Vempala,et al.  Locality-preserving hashing in multidimensional spaces , 1997, STOC '97.

[11]  Andrew Chi-Chih Yao,et al.  A general approach to d-dimensional geometric queries , 1985, STOC '85.

[12]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[13]  Ori Sasson,et al.  Non-Expansive Hashing , 1996, STOC '96.

[14]  Gerald Salton,et al.  Automatic text processing , 1988 .

[15]  Danny Dolev,et al.  Finding the neighborhood of a query in a dictionary , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[16]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[17]  Eli Upfal,et al.  Computing with unreliable information , 1990, STOC '90.

[18]  Kenneth L. Clarkson,et al.  An algorithm for approximate closest-point queries , 1994, SCG '94.

[19]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[20]  E. Kushilevitz,et al.  Communication Complexity: Basics , 1996 .

[21]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[22]  Leonid A. Levin,et al.  A hard-core predicate for all one-way functions , 1989, STOC '89.

[23]  Jirí Matousek,et al.  Reporting Points in Halfspaces , 1992, Comput. Geom..

[24]  T. Figiel,et al.  The dimension of almost spherical sections of convex bodies , 1976 .

[25]  Jirí Matousek,et al.  Ray Shooting and Parametric Search , 1993, SIAM J. Comput..

[26]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[27]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[28]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[29]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[30]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[32]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[33]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .