An Improved Bound on the Fraction of Correctable Deletions

We consider codes over fixed alphabets against worst case symbol deletions. For any fixed <inline-formula> <tex-math notation="LaTeX">$k \ge 2$ </tex-math></inline-formula>, we construct a family of codes over alphabet of size <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> with positive rate, which allow efficient recovery from a worst case deletion fraction approaching <inline-formula> <tex-math notation="LaTeX">$1-({2}/({k+\sqrt {k}}))$ </tex-math></inline-formula>. In particular, for binary codes, we are able to recover a fraction of deletions approaching <inline-formula> <tex-math notation="LaTeX">$1/(\sqrt {2} +1)=\sqrt {2}-1 \approx 0.414$ </tex-math></inline-formula>. Previously, even non-constructively, the largest deletion fraction known to be correctable with positive rate was <inline-formula> <tex-math notation="LaTeX">$1-\Theta (1/\sqrt {k})$ </tex-math></inline-formula>, and around 0.17 for the binary case. Our result pins down the largest fraction of correctable deletions for <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-ary codes as <inline-formula> <tex-math notation="LaTeX">$1-\Theta (1/k)$ </tex-math></inline-formula>, since <inline-formula> <tex-math notation="LaTeX">$1-1/k$ </tex-math></inline-formula> is an upper bound even for the simpler model of erasures where the locations of the missing symbols are known. Closing the gap between <inline-formula> <tex-math notation="LaTeX">$(\sqrt {2} -1)$ </tex-math></inline-formula> and 1/2 for the limit of worst case deletions correctable by binary codes remains a tantalizing open question.

[1]  Michael Alekhnovich Linear diophantine equations over polynomials and soft decoding of Reed-Solomon codes , 2005, IEEE Transactions on Information Theory.

[2]  David Zuckerman,et al.  Asymptotically good codes correcting insertions, deletions, and transpositions , 1997, SODA '97.

[3]  S LuekerGeorge Improved bounds on the average length of longest common subsequences , 2009 .

[4]  Boris Bukh,et al.  Twins in words and long common subsequences in permutations , 2013, 1307.0088.

[5]  Jie Ma,et al.  Longest Common Subsequences in Sets of Words , 2014, SIAM J. Discret. Math..

[6]  Venkatesan Guruswami,et al.  Deletion Codes in the High-Noise and High-Rate Regimes , 2014, IEEE Transactions on Information Theory.

[7]  Venkatesan Guruswami,et al.  Deletion Codes in the High-noise and High-rate Regimes , 2015, APPROX-RANDOM.

[8]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[9]  Venkatesan Guruswami,et al.  Efficiently decodable codes meeting Gilbert-Varshamov bound for low rates , 2004, SODA '04.

[10]  Ron M. Roth,et al.  Efficient decoding of Reed-Solomon codes beyond half the minimum distance , 2000, IEEE Trans. Inf. Theory.

[11]  George S. Lueker,et al.  Improved bounds on the average length of longest common subsequences , 2003, JACM.

[12]  Madhu Sudan,et al.  Decoding of Reed Solomon Codes beyond the Error-Correction Bound , 1997, J. Complex..

[13]  Jirí Matousek,et al.  Expected Length of the Longest Common Subsequence for Large Alphabets , 2003, LATIN.