Guess & Check Codes for Deletions, Insertions, and Synchronization

We consider the problem of constructing codes that can correct $\delta$ deletions occurring in an arbitrary binary string of length $n$ bits. Varshamov-Tenengolts (VT) codes, dating back to 1965, are zero-error single deletion $(\delta=1)$ correcting codes, and have an asymptotically optimal redundancy. Finding similar codes for $\delta \geq 2$ deletions remains an open problem. In this work, we relax the standard zero-error (i.e., worst-case) decoding requirement by assuming that the positions of the $\delta$ deletions (or insertions) are independent of the codeword. Our contribution is a new family of explicit codes, that we call Guess & Check (GC) codes, that can correct with high probability up to a constant number of $\delta$ deletions (or insertions). GC codes are systematic; and have deterministic polynomial time encoding and decoding algorithms. We also describe the application of GC codes to file synchronization.

[1]  Mahdi Cheraghchi Capacity upper bounds for deletion-type channels , 2018, STOC.

[2]  Sampath Kannan,et al.  Reconstructing strings from random traces , 2004, SODA '04.

[3]  Eitan Yaakobi,et al.  Codes Correcting a Burst of Deletions or Insertions , 2016, IEEE Transactions on Information Theory.

[4]  Peter Elias,et al.  List decoding for noisy channels , 1957 .

[5]  Khaled A. S. Abdel-Ghaffar,et al.  Codes for correcting three or more adjacent deletions or insertions , 2014, 2014 IEEE International Symposium on Information Theory.

[6]  Frederic Sala,et al.  Synchronizing Files From a Large Number of Insertions and Deletions , 2016, IEEE Transactions on Communications.

[7]  Rongke Liu,et al.  Scenario-Simplified Successive Cancellation Decoding of Polar Codes for Channel With Deletions , 2019, IEEE Access.

[8]  Venkatesan Guruswami,et al.  Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[9]  Rongke Liu,et al.  Polar codes for channels with deletions , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Jehoshua Bruck,et al.  Two Deletion Correcting Codes from Indicator Vectors , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[11]  Salim El Rouayheb,et al.  Guess & check codes for deletions and synchronization , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[12]  Michael Mitzenmacher,et al.  A Simple Lower Bound for the Capacity of the Deletion Channel , 2006, IEEE Transactions on Information Theory.

[13]  Yuval Peres,et al.  Trace reconstruction with varying deletion probabilities , 2018, ANALCO.

[14]  Suhas N. Diggavi,et al.  Capacity Upper Bounds for the Deletion Channel , 2007, 2007 IEEE International Symposium on Information Theory.

[15]  Yuval Peres,et al.  Subpolynomial trace reconstruction for random strings and arbitrary deletion probability , 2018, COLT.

[16]  Olgica Milenkovic,et al.  Coded Trace Reconstruction , 2019, 2019 IEEE Information Theory Workshop (ITW).

[17]  Venkatesan Guruswami,et al.  Improved decoding of Reed-Solomon and algebraic-geometric codes , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[18]  Hendrik C. Ferreira,et al.  On multiple insertion/Deletion correcting codes , 2002, IEEE Trans. Inf. Theory.

[19]  Kannan Ramchandran,et al.  Achievable Rates for Channels With Deletions and Insertions , 2011, IEEE Transactions on Information Theory.

[20]  Daniel Cullina,et al.  An improvement to Levenshtein's upper bound on the cardinality of deletion correcting codes , 2013, ISIT.

[21]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[22]  Ghurumuruhan Ganesan Construction and redundancy of codes for correcting deletable errors , 2018, ArXiv.

[23]  Arman Fazeli,et al.  Polar Coding for Deletion Channels: Theory and Implementation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[24]  Venkatesan Guruswami,et al.  Deletion Codes in the High-Noise and High-Rate Regimes , 2014, IEEE Transactions on Information Theory.

[25]  Alexander Vardy,et al.  List decoding of polar codes , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[26]  Edward A. Ratzer Marker codes for channels with insertions and deletions , 2005, Ann. des Télécommunications.

[27]  Arman Fazeli,et al.  Polar Codes for the Deletion Channel: Weak and Strong Polarization , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[28]  Xin-Wen Wu,et al.  List decoding of q-ary Reed-Muller codes , 2004, IEEE Transactions on Information Theory.

[29]  Venkatesan Guruswami,et al.  Efficient Low-Redundancy Codes for Correcting Multiple Deletions , 2015, IEEE Transactions on Information Theory.

[30]  David Zuckerman,et al.  Asymptotically good codes correcting insertions, deletions, and transpositions , 1997, SODA '97.

[31]  Giuliano Garrammone On Decoding Complexity of Reed-Solomon Codes on the Packet Erasure Channel , 2013, IEEE Communications Letters.

[32]  Antonia Wachter-Zeh,et al.  List Decoding of Insertions and Deletions , 2017, IEEE Transactions on Information Theory.

[33]  Khaled A. S. Abdel-Ghaffar,et al.  Systematic Encoding of the Varshamov-Tenengol'ts Codes and the Constantin-Rao Codes , 1998, IEEE Trans. Inf. Theory.

[34]  Venkatesan Guruswami,et al.  Codes correcting deletions in oblivious and random models , 2016, ArXiv.

[35]  Zachary Chase New lower bounds for trace reconstruction , 2021 .

[36]  Khaled A. S. Abdel-Ghaffar,et al.  On Helberg's Generalization of the Levenshtein Code for Multiple Deletion/Insertion Error Correction , 2012, IEEE Transactions on Information Theory.

[37]  Kannan Ramchandran,et al.  Efficient file synchronization: A distributed source coding approach , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[38]  N.J.A. Sloane,et al.  On Single-Deletion-Correcting Codes , 2002, math/0207197.

[39]  Salim El Rouayheb,et al.  List Decoding of Deletions Using Guess & Check Codes , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[40]  Kannan Ramchandran,et al.  Low-Complexity Interactive Algorithms for Synchronization From Deletions, Insertions, and Substitutions , 2013, IEEE Transactions on Information Theory.

[41]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[42]  David J. C. MacKay,et al.  Reliable communication over channels with insertions, deletions, and substitutions , 2001, IEEE Trans. Inf. Theory.

[43]  Yuval Peres,et al.  Trace reconstruction with exp(O(n1/3)) samples , 2017, STOC.

[44]  Ryan O'Donnell,et al.  Optimal mean-based algorithms for trace reconstruction , 2017, STOC.

[45]  Sofya Vorotnikova,et al.  Trace Reconstruction Revisited , 2014, ESA.

[46]  Eitan Yaakobi,et al.  Codes in the damerau distance for DNA storage , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[47]  Zhengzhong Jin,et al.  Deterministic Document Exchange Protocols, and Almost Optimal Binary Codes for Edit Errors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[48]  Kannan Ramchandran,et al.  Interactive low-complexity codes for synchronization from deletions and insertions , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[49]  Michael Mitzenmacher,et al.  On Lower Bounds for the Capacity of Deletion Channels , 2006, IEEE Transactions on Information Theory.

[50]  Yashodhan Kanoria,et al.  Optimal Coding for the Binary Deletion Channel With Small Deletion Probability , 2013, IEEE Transactions on Information Theory.

[51]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[52]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[53]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[54]  Venkatesan Guruswami,et al.  Coding against deletions in oblivious and online models , 2018, SODA.

[55]  Yuval Peres,et al.  Average-Case Reconstruction for the Deletion Channel: Subpolynomially Many Traces Suffice , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[56]  Albert Guillén i Fàbregas,et al.  Multilayer codes for synchronization from deletions , 2017, 2017 IEEE Information Theory Workshop (ITW).

[57]  Alon Orlitsky Interactive Communication of Balanced Distributions and of Correlated Files , 1993, SIAM J. Discret. Math..

[58]  Jehoshua Bruck,et al.  Optimal k-Deletion Correcting Codes , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[59]  Salim El Rouayheb,et al.  Correcting bursty and localized deletions using guess & check codes , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[60]  Salim El Rouayheb,et al.  Guess & Check Codes for Deletions, Insertions, and Synchronization , 2019, IEEE Transactions on Information Theory.

[61]  Vladimir M. Blinovsky,et al.  List decoding , 1992, Discret. Math..

[62]  Han Mao Kiah,et al.  Synchronization and Deduplication in Coded Distributed Storage Networks , 2016, IEEE/ACM Transactions on Networking.

[63]  Tolga M. Duman,et al.  Upper Bounds on the Capacity of Deletion Channels Using Channel Fragmentation , 2015, IEEE Transactions on Information Theory.

[64]  Lara Dolecek,et al.  A Deterministic Polynomial-Time Protocol for Synchronizing From Deletions , 2014, IEEE Transactions on Information Theory.

[65]  Rina Panigrahy,et al.  Trace reconstruction with constant deletion probability and related results , 2008, SODA '08.

[66]  Ghurumuruhan Ganesan Correcting an ordered deletetion-erasure , 2018, 2018 Wireless Advanced (WiAd).

[67]  Lara Dolecek,et al.  Coding for Deletion Channels with Multiple Traces , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).