Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality

We present two algorithms for the approximate nearest neighbor problem in high dimensional spaces. For data sets of size n living in IR, the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree.

[1]  Michael Ian Shamos,et al.  Closest-point problems , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[2]  Robert E. Tarjan,et al.  Applications of a planar separator theorem , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[3]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[4]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[5]  Sanguthevar Rajasekaran,et al.  The light bulb problem , 1995, COLT '89.

[6]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[7]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[8]  Isidore Rigoutsos,et al.  FLASH: a fast look-up algorithm for string homology , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Marshall W. Bern,et al.  Approximate Closest-Point Queries in High Dimensions , 1993, Inf. Process. Lett..

[10]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[11]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[12]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[13]  Geoffrey Zweig,et al.  The bit vector intersection problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[15]  Timothy M. Chan Approximate Nearest Neighbor Queries Revisited , 1997, SCG '97.

[16]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[17]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[18]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[19]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[20]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[21]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[22]  S. Muthukrishnan,et al.  Approximate nearest neighbors and sequence comparison with block operations , 2000, STOC '00.

[23]  Piotr Indyk,et al.  Scalable Techniques for Clustering the Web , 2000, WebDB.

[24]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[25]  Graham Cormode,et al.  Permutation Editing and Matching via Embeddings , 2001, ICALP.

[26]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[27]  Sariel Har-Peled A replacement for Voronoi diagrams of near linear size , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[28]  Sunil Arya,et al.  Linear-size approximate voronoi diagrams , 2002, SODA '02.

[29]  Yogish Sabharwal,et al.  Nearest Neighbors Search Using Point Location in Balls with Applications to Approximate Voronoi Decompositions , 2002, FSTTCS.

[30]  Graham Cormode,et al.  Sequence distance embeddings , 2003 .

[31]  Constance de Koning,et al.  Editors , 2003, Annals of Emergency Medicine.

[32]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[33]  Allan Borodin,et al.  Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces , 2004, Machine Learning.

[34]  Piotr Indyk,et al.  Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[35]  Subhash Khot,et al.  Nonembeddability theorems via Fourier analysis , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[36]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[37]  Patrick Pantel,et al.  Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[38]  Robert Krauthgamer,et al.  Embedding the Ulam metric into l1 , 2006, Theory Comput..

[39]  Alexandr Andoni,et al.  On the Optimality of the Dimensionality Reduction Method , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[40]  Ting Chen,et al.  Scalable Partitioning and Exploration of Chemical Spaces Using Geometric Hashing , 2006, J. Chem. Inf. Model..

[41]  Rajeev Motwani,et al.  Lower bounds on locality sensitive hashing , 2005, SCG '06.

[42]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[43]  Alexandr Andoni,et al.  Earth mover distance over high-dimensional spaces , 2008, SODA '08.

[44]  Sunil Arya,et al.  Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.

[45]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[46]  Bernard Chazelle,et al.  Faster dimension reduction , 2010, Commun. ACM.

[47]  Rina Panigrahy,et al.  Lower Bounds on Near Neighbor Search via Metric Expansion , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[48]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[49]  Huy L. Nguyen Approximate Nearest Neighbor Search in ℓp , 2013, ArXiv.

[50]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[51]  Yi Wu,et al.  Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny) , 2014, TOCT.