Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time

Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within approximation factor poly(log n). In this paper, we provide an algorithm with running time Õ(n^2-2/7) that approximates the edit distance within a constant factor.

[1]  Michal Koucký,et al.  Streaming Algorithms For Computing Edit Distance Without Exploiting Suffix Trees , 2016, ArXiv.

[2]  Amir Abboud,et al.  Tight Hardness Results for LCS and Other Sequence Similarity Measures , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[3]  Alexandr Andoni,et al.  Near-optimal sublinear time algorithms for Ulam distance , 2010, SODA '10.

[4]  Marvin Künnemann,et al.  Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[5]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[6]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[7]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[8]  Funda Ergün,et al.  Oblivious string embeddings and edit distance approximations , 2006, SODA '06.

[9]  Ronitt Rubinfeld,et al.  A sublinear algorithm for weakly approximating edit distance , 2003, STOC '03.

[10]  Michael E. Saks,et al.  Accurate and Nearly Optimal Sublinear Approximations to Ulam Distance , 2017, SODA.

[11]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.

[12]  Robert Krauthgamer,et al.  Approximating edit distance efficiently , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Alexandr Andoni,et al.  Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[14]  Yuval Rabani,et al.  Improved lower bounds for embeddings into L1 , 2006, SODA '06.

[15]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[16]  Ryan Williams,et al.  Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made , 2015, STOC.

[17]  Alexandr Andoni,et al.  Approximating Edit Distance in Near-Linear Time , 2012, SIAM J. Comput..

[18]  Georgios P. Papamichail,et al.  Improved algorithms for approximate string matching (extended abstract) , 2009, BMC Bioinformatics.

[19]  Szymon Grabowski New tabulation and sparse dynamic programming based techniques for sequence similarity problems , 2016, Discret. Appl. Math..

[20]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[21]  Mohammad Ghodsi,et al.  Approximating Edit Distance in Truly Subquadratic Time: Quantum and MapReduce , 2018, SODA.

[22]  Amir Abboud,et al.  Towards Hardness of Approximation for Polynomial Time Problems , 2017, ITCS.

[23]  Michal Koucký,et al.  Streaming algorithms for embedding and computing edit distance in the low distance regime , 2016, STOC.