Approximate Nearest Neighbor under edit distance via product metrics

We present a data structure for the approximate nearest neighbor problem under edit metric (which is defined as the minimum number of insertions, deletions and character substitutions needed to transform one string into another). For any <i>l</i> ≥ 1 and a set of <i>n</i> strings of length <i>d</i>, the data structure reports a 3<sup><i>l</i></sup>-approximate Nearest Neighbor for any given query string <i>q</i> in <i>O</i>(<i>d</i>) time. The space requirement of this data structure is roughly <i>O</i>(<i>n</i><sup><i>d</i><sup>1/(<i>l</i>+1)</sup></sup>), i.e., strongly subexponential. To our knowledge, this is the first data structure for this problem with both <i>o</i>(<i>n</i>) query time and storage subexponential in <i>d</i>.

[1]  Piotr Indyk Dimensionality reduction techniques for proximity problems , 2000, SODA '00.

[2]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[3]  Piotr Indyk On approximate nearest neighbors in non-Euclidean spaces , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[4]  Uzi Vishkin,et al.  Communication complexity of document exchange , 1999, SODA '00.

[5]  Piotr Indyk,et al.  New Algorithms for Subset Query, Partial Match, Orthogonal Range Searching, and Related Problems , 2002, ICALP.

[6]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.

[7]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[8]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[9]  Piotr Indyk,et al.  Approximate nearest neighbor algorithms for Frechet distance via product metrics , 2002, SCG '02.

[10]  S. Muthukrishnan,et al.  Approximate nearest neighbors and sequence comparison with block operations , 2000, STOC '00.

[11]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[12]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[13]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[14]  Sariel Har-Peled A replacement for Voronoi diagrams of near linear size , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[15]  Alexandr Andoni,et al.  Lower bounds for embedding edit distance into normed spaces , 2003, SODA '03.