Beyond Locality-Sensitive Hashing

We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in Rd, our algorithm achieves Oc(nρ + dlogn) query time and Oc(n1+ρ + dlogn) space, where ρ ≤ 7/(8c2) + O(1/c3) + oc(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ICS 2011). By a standard reduction we obtain a data structure for the Hamming space and e1 norm with ρ ≤ 7/(8c)+ O(1/c3/2)+ oc(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

[1]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[2]  Rina Panigrahy,et al.  Lower Bounds on Near Neighbor Search via Metric Expansion , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[3]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[4]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[5]  James McNames,et al.  A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[7]  Isidore Rigoutsos,et al.  FLASH: a fast look-up algorithm for string homology , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Sanjoy Dasgupta,et al.  Which Spatial Partition Trees are Adaptive to Intrinsic Dimension? , 2009, UAI.

[9]  Rina Panigrahy,et al.  A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[10]  Geoffrey Zweig,et al.  The Bit Vector Intersection Problem (Preliminary Version). , 1995, FOCS 1995.

[11]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[12]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .

[15]  Jay Yagnik,et al.  The power of comparative reasoning , 2011, 2011 International Conference on Computer Vision.

[16]  Gregory Valiant,et al.  Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[17]  GuptaAnupam,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003 .

[18]  David R. Karger,et al.  Approximate graph coloring by semidefinite programming , 1998, JACM.

[19]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[20]  Yi Wu,et al.  Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny) , 2014, TOCT.

[21]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[22]  Rajeev Motwani,et al.  Lower bounds on locality sensitive hashing , 2005, SCG '06.

[23]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[24]  Kenneth L. Clarkson,et al.  Smaller core-sets for balls , 2003, SODA '03.

[25]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[26]  Sanguthevar Rajasekaran,et al.  The light bulb problem , 1995, COLT '89.

[27]  Geoffrey Zweig,et al.  The bit vector intersection problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[28]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[29]  Moshe Dubiner,et al.  Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem , 2008, IEEE Transactions on Information Theory.