Approximate Nearest Neighbor Search in High Dimensions

The nearest neighbor problem is defined as follows: Given a set $P$ of $n$ points in some metric space $(X,D)$, build a data structure that, given any point $q$, returns a point in $P$ that is closest to $q$ (its "nearest neighbor" in $P$). The data structure stores additional information about the set $P$, which is then used to find the nearest neighbor without computing all distances between $q$ and $P$. The problem has a wide range of applications in machine learning, computer vision, databases and other fields. To reduce the time needed to find nearest neighbors and the amount of memory used by the data structure, one can formulate the {\em approximate} nearest neighbor problem, where the the goal is to return any point $p' \in P$ such that the distance from $q$ to $p'$ is at most $c \cdot \min_{p \in P} D(q,p)$, for some $c \geq 1$. Over the last two decades, many efficient solutions to this problem were developed. In this article we survey these developments, as well as their connections to questions in geometric functional analysis and combinatorial geometry.

[1]  V. Strassen Gaussian elimination is not optimal , 1969 .

[2]  Assaf Naor,et al.  A Spectral Gap Precludes Low-Dimensional Embeddings , 2016, SoCG.

[3]  Mary Wootters,et al.  New constructions of RIP matrices with fast multiplication and fewer rows , 2012, SODA.

[4]  Enkatesan G Uruswami Unbalanced expanders and randomness extractors from Parvaresh-Vardy codes , 2008 .

[5]  A. Ron,et al.  Strictly positive definite functions on spheres in Euclidean spaces , 1994, Math. Comput..

[6]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[7]  T. Figiel,et al.  The dimension of almost spherical sections of convex bodies , 1976 .

[8]  Mikhail Kapralov Smooth Tradeoffs between Insert and Query Complexity in Nearest Neighbor Search , 2015, PODS.

[9]  Timothy M. Chan,et al.  Polynomial Representations of Threshold Functions and Algorithmic Applications , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[11]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[12]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[13]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[14]  Assaf Naor,et al.  The Johnson–Lindenstrauss Lemma Almost Characterizes Hilbert Space, But Not Quite , 2008, SODA.

[15]  Don Coppersmith Rapid Multiplication of Rectangular Matrices , 1982, SIAM J. Comput..

[16]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[17]  Piotr Indyk,et al.  Approximate nearest neighbor algorithms for Hausdorff metrics via embeddings , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[18]  Piotr Indyk Dimensionality reduction techniques for proximity problems , 2000, SODA '00.

[19]  Piotr Indyk,et al.  Approximate Nearest Neighbor under edit distance via product metrics , 2004, SODA '04.

[20]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[21]  Pradeep Dubey,et al.  Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing , 2013, Proc. VLDB Endow..

[22]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[23]  P. Wojtaszczyk Banach Spaces For Analysts: Preface , 1991 .

[24]  C. Carathéodory Über den variabilitätsbereich der fourier’schen konstanten von positiven harmonischen funktionen , 1911 .

[25]  Marshall W. Bern,et al.  Approximate Closest-Point Queries in High Dimensions , 1993, Inf. Process. Lett..

[26]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[27]  Robert E. Tarjan,et al.  Applications of a planar separator theorem , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[28]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[29]  Alexandr Andoni,et al.  Approximate near neighbors for general symmetric norms , 2016, STOC.

[30]  M. Fréchet Sur quelques points du calcul fonctionnel , 1906 .

[31]  Trevor Darrell,et al.  Nearest-Neighbor Searching and Metric Space Dimensions , 2006 .

[32]  Nir Ailon,et al.  An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[33]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[34]  Alexandr Andoni,et al.  Data-dependent hashing via nonlinear spectral gaps , 2018, STOC.

[35]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[36]  H. Hahn Bemerkungen zu den Untersuchungen des Herrn M. Fréchet: Sur quelques points du calcul fonctionnel , 1908 .

[37]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[38]  V. V. Williams ON SOME FINE-GRAINED QUESTIONS IN ALGORITHMS AND COMPLEXITY , 2019, Proceedings of the International Congress of Mathematicians (ICM 2018).

[39]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[40]  Peter Bro Miltersen Cell probe complexity-a survey , 1999 .

[41]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[42]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[43]  C. Kuratowski Quelques problèmes concernant les espaces métriques non-séparables , 1935 .

[44]  Rina Panigrahy,et al.  NNS Lower Bounds via Metric Expansion for l ∞ and EMD , 2012, ICALP.

[45]  Piotr Indyk,et al.  Approximate nearest neighbor algorithms for Frechet distance via product metrics , 2002, SCG '02.

[46]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[47]  Kenneth Ward Church,et al.  Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections , 2006, J. Mach. Learn. Res..

[48]  Wei Liu,et al.  Learning to Hash for Indexing Big Data—A Survey , 2015, Proceedings of the IEEE.

[49]  Alexandr Andoni,et al.  Hardness of Nearest Neighbor under L-infinity , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[50]  R. Motwani,et al.  High-Dimensional Computational Geometry , 2000 .

[51]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[52]  Alexandr Andoni,et al.  The Computational Hardness of Estimating Edit Distance , 2010 .

[53]  Thomas Dybdahl Ahle Optimal Las Vegas Locality Sensitive Data Structures , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[54]  Gregory Valiant Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem , 2015, J. ACM.

[55]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[56]  Petteri Kaski,et al.  A Faster Subquadratic Algorithm for Finding Outlier Correlations , 2015, SODA.

[57]  Gideon Schechtman,et al.  Planar Earthmover is not in L_1 , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[58]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[59]  Huy L. Nguyen Approximate Nearest Neighbor Search in ℓp , 2013, ArXiv.

[60]  Alexandr Andoni,et al.  Beyond Locality-Sensitive Hashing , 2013, SODA.

[61]  David P. Woodruff,et al.  Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error , 2011, TALG.

[62]  Huy Le Nguyen,et al.  Algorithms for high dimensional data , 2014 .

[63]  Aravind Srinivasan,et al.  Splitters and near-optimal derandomization , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[64]  Ilya P. Razenshteyn High-dimensional similarity search and sketching: algorithms and hardness , 2017 .

[65]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[66]  I. J. Schoenberg On Certain Metric Spaces Arising From Euclidean Spaces by a Change of Metric and Their Imbedding in Hilbert Space , 1937 .

[67]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[68]  Piotr Indyk,et al.  On Approximate Nearest Neighbors under linfinity Norm , 2001, J. Comput. Syst. Sci..

[69]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[70]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[71]  Rafail Ostrovsky,et al.  Low distortion embeddings for edit distance , 2007, JACM.

[72]  Lee-Ad Gottlieb,et al.  Approximate nearest neighbor search for $\ell_p$-spaces ($2 < p < \infty$) via embeddings , 2015 .

[73]  Rasmus Pagh Locality-sensitive Hashing without False Negatives , 2016, SODA.

[74]  Timothy M. Chan Approximate Nearest Neighbor Queries Revisited , 1998, Discret. Comput. Geom..

[75]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[76]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[77]  Lee-Ad Gottlieb,et al.  Approximate Nearest Neighbor Search for \ell _p -Spaces (2 via Embeddings , 2018, LATIN.

[78]  Bo'az Klartag,et al.  Fitting a $C^m$-Smooth Function to Data II , 2009 .

[79]  Madhu Sudan Essential Coding Theory Problem Set 2 , .

[80]  Satinder P. Singh,et al.  Introduction , 2002, British Journal of Ophthalmology.

[81]  J. Matousek,et al.  On embedding expanders into ℓp spaces , 1997 .

[82]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[83]  Alexandr Andoni,et al.  Optimal Data-Dependent Hashing for Approximate Near Neighbors , 2015, STOC.

[84]  Rachel Ward,et al.  New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[85]  Yi Wu,et al.  Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny) , 2014, TOCT.

[86]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[87]  Alexandr Andoni,et al.  Overcoming the l1 non-embeddability barrier: algorithms for product metrics , 2009, SODA.

[88]  Rajeev Motwani,et al.  Lower bounds on locality sensitive hashing , 2005, SCG '06.

[89]  Sepideh Mahabadi Approximate Nearest Line Search in High Dimensions , 2015, SODA.

[90]  Kenneth L. Clarkson,et al.  An algorithm for approximate closest-point queries , 1994, SCG '94.

[91]  Alexandr Andoni,et al.  Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing , 2015, SoCG.

[92]  Alexandr Andoni,et al.  Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[93]  F. John Extremum Problems with Inequalities as Subsidiary Conditions , 2014 .

[94]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.