Hardness of approximate nearest neighbor search

We prove conditional near-quadratic running time lower bounds for approximate Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance. Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false, for every δ>0 there exists a constant ε>0 such that computing a (1+ε)-approximation to the Bichromatic Closest Pair requires Ω(n2−δ) time. In particular, this implies a near-linear query time for Approximate Nearest Neighbor search with polynomial preprocessing time. Our reduction uses the recently introduced Distributed PCP framework, but obtains improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG codes have been constructed in other settings before, but our construction is the first to yield new hardness results.

[1]  A. Naor,et al.  Nonembeddability theorems via Fourier analysis , 2006 .

[2]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[3]  Rina Panigrahy,et al.  A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[4]  Pasin Manurangsi,et al.  On the parameterized complexity of approximating dominating set , 2017, Electron. Colloquium Comput. Complex..

[5]  Petteri Kaski,et al.  A Faster Subquadratic Algorithm for Finding Outlier Correlations , 2015, SODA.

[6]  Alexandr Andoni,et al.  The Computational Hardness of Estimating Edit Distance , 2010 .

[7]  Robert Krauthgamer,et al.  Approximating edit distance efficiently , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[8]  Rina Panigrahy,et al.  NNS Lower Bounds via Metric Expansion for l ∞ and EMD , 2012, ICALP.

[9]  Avi Wigderson,et al.  Algebrization: A New Barrier in Complexity Theory , 2009, TOCT.

[10]  Alexandr Andoni,et al.  Lower bounds for embedding edit distance into normed spaces , 2003, SODA '03.

[11]  Michael Ian Shamos,et al.  Closest-point problems , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[12]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[13]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[14]  Alexandr Andoni,et al.  The Computational Hardness of Estimating Edit Distance [Extended Abstract] , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[15]  George S. Lueker,et al.  Improved bounds on the average length of longest common subsequences , 2003, JACM.

[16]  Kenneth W. Shum,et al.  A low-complexity algorithm for the construction of algebraic-geometric codes better than the Gilbert-Varshamov bound , 2001, IEEE Trans. Inf. Theory.

[17]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.

[18]  Piotr Indyk,et al.  Approximate Nearest Neighbor under edit distance via product metrics , 2004, SODA '04.

[19]  Ryan Williams,et al.  On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity , 2017, SODA.

[20]  Samir Khuller,et al.  A Simple Randomized Sieve Algorithm for the Closest-Pair Problem , 1995, Inf. Comput..

[21]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[22]  Alexandr Andoni,et al.  Lower bounds for edit distance and product metrics via Poincaré-type inequalities , 2010, SODA '10.

[23]  Alexandr Andoni,et al.  The Computational Hardness of Estimating Edit Distance [Extended Abstract] , 2007, FOCS.

[24]  Richard Ryan Williams,et al.  Distributed PCP Theorems for Hardness of Approximation in P , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[25]  Alexandr Andoni,et al.  Hardness of Nearest Neighbor under L-infinity , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[26]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[27]  Michael Ian Shamos,et al.  Divide-and-conquer in multidimensional space , 1976, STOC '76.

[28]  Rafail Ostrovsky,et al.  Low distortion embeddings for edit distance , 2007, JACM.

[29]  Yi Wu,et al.  Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny) , 2014, TOCT.

[30]  V. D. Goppa Codes on Algebraic Curves , 1981 .

[31]  Eli Ben-Sasson,et al.  Interactive Oracle Proofs with Constant Rate and Query Complexity , 2017, ICALP.

[32]  Rajeev Motwani,et al.  Lower bounds on locality sensitive hashing , 2005, SCG '06.

[33]  Venkatesan Guruswami,et al.  Optimal rate list decoding of folded algebraic-geometric codes over constant-sized alphabets , 2014, SODA.

[34]  Timothy M. Chan,et al.  Polynomial Representations of Threshold Functions and Algorithmic Applications , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[35]  Alexandr Andoni,et al.  Optimal Data-Dependent Hashing for Approximate Near Neighbors , 2015, STOC.

[36]  Piotr Indyk,et al.  Better algorithms for high-dimensional proximity problems via asymmetric embeddings , 2003, SODA '03.

[37]  Hartmut Klauck,et al.  Rectangle size bounds and threshold covers in communication complexity , 2002, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[38]  Venkatesan Guruswami,et al.  Improved decoding of Reed-Solomon and algebraic-geometry codes , 1999, IEEE Trans. Inf. Theory.

[39]  Amir Abboud,et al.  Fast and Deterministic Constant Factor Approximation Algorithms for LCS Imply New Circuit Lower Bounds , 2018, ITCS.

[40]  Or Meir,et al.  IP = PSPACE Using Error-Correcting Codes , 2013, SIAM J. Comput..

[41]  Thomas Dybdahl Ahle Optimal Las Vegas Locality Sensitive Data Structures , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[42]  Gregory Valiant Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem , 2015, J. ACM.

[43]  Bundit Laekhanukit,et al.  The Curse of Medium Dimension for Geometric Problems in Almost Every Norm , 2016, ArXiv.

[44]  Alexandr Andoni,et al.  Beyond Locality-Sensitive Hashing , 2013, SODA.

[45]  Otfried Cheong,et al.  Euclidean minimum spanning trees and bichromatic closest pairs , 1990, SCG '90.

[46]  Sunil Arya,et al.  Expected-case complexity of approximate nearest neighbor searching , 2000, SODA '00.

[47]  Suresh Venkatasubramanian,et al.  A Directed Isoperimetric Inequality with application to Bregman Near Neighbor Lower Bounds , 2015, STOC.

[48]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[49]  Yitong Yin,et al.  Randomized Approximate Nearest Neighbor Search with Limited Adaptivity , 2016, SPAA.

[50]  Alexandr Andoni,et al.  On the Optimality of the Dimensionality Reduction Method , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[51]  Brett Hemenway,et al.  Local List Recovery of High-Rate Tensor Codes and Applications , 2017, SIAM J. Comput..

[52]  Funda Ergün,et al.  Oblivious string embeddings and edit distance approximations , 2006, SODA '06.

[53]  Lijie Chen,et al.  On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product , 2018, Electron. Colloquium Comput. Complex..

[54]  Henning Stichtenoth,et al.  Algebraic function fields and codes , 1993, Universitext.

[55]  Alexandr Andoni,et al.  Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[56]  Amnon Ta-Shma,et al.  Constructing Small-Bias Sets from Algebraic-Geometric Codes , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[57]  Mikkel Thorup,et al.  Randomization does not help searching predecessors , 2007, SODA '07.

[58]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[59]  Amnon Ta-Shma,et al.  Pseudorandom Generators for Low Degree Polynomials from Algebraic Geometry Codes , 2013, Electron. Colloquium Comput. Complex..

[60]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[61]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[62]  Or Meir,et al.  Constant Rate PCPs for Circuit-SAT with Sublinear Query Complexity , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[63]  Ryan Williams,et al.  Probabilistic Polynomials and Hamming Nearest Neighbors , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[64]  Piotr Indyk Dimensionality reduction techniques for proximity problems , 2000, SODA '00.

[65]  Timothy M. Chan Orthogonal Range Searching in Moderate Dimensions: k-d Trees and Range Trees Strike Back , 2019, Discret. Comput. Geom..

[66]  Sunil Arya,et al.  Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.

[67]  Ilan Newman,et al.  Private vs. Common Random Bits in Communication Complexity , 1991, Inf. Process. Lett..

[68]  Venkatesan Guruswami,et al.  Correlated Algebraic-Geometric Codes: Improved List Decoding over Bounded Alphabets , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[69]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[70]  Venkatesan Guruswami,et al.  List decoding reed-solomon, algebraic-geometric, and gabidulin subcodes up to the singleton bound , 2013, STOC '13.

[71]  Sanjeev Arora,et al.  Probabilistic checking of proofs: a new characterization of NP , 1998, JACM.

[72]  Brett Hemenway,et al.  Local List Recovery of High-Rate Tensor Codes & Applications , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[73]  Amit Chakrabarti,et al.  An Optimal Randomized Cell Probe Lower Bound for Approximate Nearest Neighbor Searching , 2010, SIAM J. Comput..

[74]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[75]  Timothy M. Chan,et al.  Better ϵ-Dependencies for Offline Approximate Nearest Neighbor Search, Euclidean Minimum Spanning Trees, and ϵ-Kernels , 2014, Symposium on Computational Geometry.

[76]  Ilya P. Razenshteyn High-dimensional similarity search and sketching: algorithms and hardness , 2017 .

[77]  Rina Panigrahy,et al.  Lower Bounds on Near Neighbor Search via Metric Expansion , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[78]  Amir Abboud,et al.  Tight Hardness Results for LCS and Other Sequence Similarity Measures , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[79]  Yun Kuen Cheung,et al.  Lecture 5 : k-wise Independent Hashing and Applications , 2013 .

[80]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.