Efficient Low-Redundancy Codes for Correcting Multiple Deletions

We consider the problem of constructing binary codes to recover from <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>–bit deletions with efficient encoding/decoding, for a fixed <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>. The single deletion case is well understood, with the Varshamov–Tenengolts–Levenshtein code from 1965 giving an asymptotically optimal construction with <inline-formula> <tex-math notation="LaTeX">$\approx ~2^{n}/n$ </tex-math></inline-formula> codewords of length <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>, i.e., at most <inline-formula> <tex-math notation="LaTeX">$\log n$ </tex-math></inline-formula> bits of redundancy. However, even for the case of two deletions, there was no known explicit construction with redundancy less than <inline-formula> <tex-math notation="LaTeX">$n^{\Omega (1)}$ </tex-math></inline-formula>. For any fixed <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>, we construct a binary code with <inline-formula> <tex-math notation="LaTeX">$c_{k} \log n$ </tex-math></inline-formula> redundancy that can be decoded from <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> deletions in <inline-formula> <tex-math notation="LaTeX">$O_{k}(n \log ^{4} n)$ </tex-math></inline-formula> time. The coefficient <inline-formula> <tex-math notation="LaTeX">$c_{k}$ </tex-math></inline-formula> can be taken to be <inline-formula> <tex-math notation="LaTeX">$O(k^{2} \log k)$ </tex-math></inline-formula>, which is only quadratically worse than the optimal, non-constructive bound of <inline-formula> <tex-math notation="LaTeX">$O(k)$ </tex-math></inline-formula>. We also indicate how to modify this code to allow for a combination of up to <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> insertions and deletions. We also note that among <italic>linear</italic> codes capable of correcting <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> deletions, the <inline-formula> <tex-math notation="LaTeX">$(k+1)$ </tex-math></inline-formula>-fold repetition code is essentially the best possible.

[1]  Djamal Belazzougui,et al.  Efficient Deterministic Single Round Document Exchange for Edit Distance , 2015, ArXiv.

[2]  H. C. Ferreira,et al.  On multiple insertion/deletion correcting codes , 1994, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[3]  Khaled A. S. Abdel-Ghaffar,et al.  A Multiple Insertion/Deletion Correcting Code for Run-Length Limited Sequences , 2012, IEEE Transactions on Information Theory.

[4]  Uzi Vishkin,et al.  Communication complexity of document exchange , 1999, SODA '00.

[5]  Qin Zhang,et al.  Edit Distance: Sketching, Streaming, and Document Exchange , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[6]  Michal Koucký,et al.  Low Distortion Embedding from Edit to Hamming Distance using Coupling , 2015, Electron. Colloquium Comput. Complex..

[7]  V. Guruswami,et al.  Efficient low-redundancy codes for correcting multiple deletions , 2016, SODA 2016.

[8]  Uzi Vishkin,et al.  Deterministic sampling—a new technique for fast pattern matching , 1990, STOC '90.

[9]  N.J.A. Sloane,et al.  On Single-Deletion-Correcting Codes , 2002, math/0207197.

[10]  Daniel Cullina,et al.  An improvement to Levenshtein's upper bound on the cardinality of deletion correcting codes , 2013, ISIT.

[11]  Michal Koucký,et al.  Streaming algorithms for embedding and computing edit distance in the low distance regime , 2016, STOC.

[12]  Hossein Jowhari,et al.  Efficient Communication Protocols for Deciding Edit Distance , 2012, ESA.

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Alon Orlitsky Interactive Communication of Balanced Distributions and of Correlated Files , 1993, SIAM J. Discret. Math..

[15]  David Zuckerman,et al.  Asymptotically good codes correcting insertions, deletions, and transpositions , 1997, SODA '97.

[16]  Hendrik C. Ferreira,et al.  On multiple insertion/Deletion correcting codes , 2002, IEEE Trans. Inf. Theory.

[17]  Venkatesan Guruswami,et al.  Deletion Codes in the High-Noise and High-Rate Regimes , 2014, IEEE Transactions on Information Theory.

[18]  Shuhong Gao,et al.  A New Algorithm for Decoding Reed-Solomon Codes , 2003 .

[19]  Khaled A. S. Abdel-Ghaffar,et al.  On Helberg's Generalization of the Levenshtein Code for Multiple Deletion/Insertion Error Correction , 2012, IEEE Transactions on Information Theory.

[20]  Torsten Suel,et al.  Improved single-round protocols for remote file synchronization , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[21]  Khaled A. S. Abdel-Ghaffar,et al.  On Linear and Cyclic Codes for Correcting Deletions , 2007, 2007 IEEE International Symposium on Information Theory.