Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions

We consider the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions; a combination of errors that is usually observed in DNA-based data storage. One of the most basic versions of this problem was settled more than 50 years ago by Levenshtein, who proved that binary VarshamovTenengolts codes correct one arbitrary edit error, i.e., one deletion or one substitution, with nearly optimal redundancy. However, this approach fails to extend to many simple and natural variations of the binary single-edit error setting. In this work, we make progress on the code design problem above in three such variations: • We construct linear-time encodable and decodable length-n non-binary codes correcting a single edit error with nearly optimal redundancy logn + O(log logn), providing an alternative simpler proof of a result by Cai, Chee, Gabrys, Kiah, and Nguyen (IEEE Trans. Inf. Theory 2021). This is achieved by employing what we call weighted VT sketches, a notion that may be of independent interest. • We construct linear-time encodable and list-decodable binary codes with list-size 2 for one deletion and one substitution with redundancy 4 logn + O(log logn). This matches the existential bound up to an O(log logn) additive term. • We show the existence of a binary code correcting one deletion or one adjacent transposition with nearly optimal redundancy logn+O(log logn).

[1]  David Zuckerman,et al.  Asymptotically good codes correcting insertions, deletions, and transpositions , 1997, SODA '97.

[2]  2020 IEEE International Symposium on Information Theory (ISIT) , 2020 .

[3]  Jehoshua Bruck,et al.  Optimal k-Deletion Correcting Codes , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[4]  Antonia Wachter-Zeh,et al.  List Decoding of Insertions and Deletions , 2017, IEEE Transactions on Information Theory.

[5]  Farzad Farnoud,et al.  Non-binary Codes for Correcting a Burst of at Most 2 Deletions , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[6]  Nikita Polyanskii,et al.  Optimal Codes Correcting a Burst of Deletions of Variable Length , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[7]  Frederic Sala,et al.  Codes Correcting Two Deletions , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[8]  Venkatesan Guruswami,et al.  Efficient Low-Redundancy Codes for Correcting Multiple Deletions , 2015, IEEE Transactions on Information Theory.

[9]  Eitan Yaakobi,et al.  Single-Deletion Single-Substitution Correcting Codes , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[10]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[11]  Bernhard Haeupler Optimal Document Exchange and New Codes for Insertions and Deletions , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[12]  David J. Staley Insertions , 2020, Historical Imagination.

[13]  S. Cooper,et al.  Remote Control , 2002, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[14]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[15]  N.J.A. Sloane,et al.  On Single-Deletion-Correcting Codes , 2002, math/0207197.

[16]  Eitan Yaakobi,et al.  Codes Correcting a Burst of Deletions or Insertions , 2016, IEEE Transactions on Information Theory.

[17]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[18]  Bernhard Haeupler,et al.  Synchronization strings: explicit constructions, local decoding, and applications , 2017, STOC.

[19]  Kui Cai,et al.  On Multiple-Deletion Multiple-Substitution Correcting Codes , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[20]  H. Exner,et al.  Geographical variation in morphology of Chaetosiphella stipae stipae Hille Ris Lambers, 1947 (Hemiptera: Aphididae: Chaitophorinae) , 2017, Scientific Reports.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Zhengzhong Jin,et al.  Block Edit Errors with Transpositions: Deterministic Document Exchange Protocols and Almost Optimal Binary Codes , 2018, ICALP.

[23]  Venkatesan Guruswami,et al.  Explicit Two-Deletion Codes With Redundancy Matching the Existential Bound , 2020, IEEE Transactions on Information Theory.

[24]  Yeow Meng Chee,et al.  Correcting a Single Indel/Edit for DNA-Based Data Storage: Linear-Time Encoders and Order-Optimality , 2021, IEEE Transactions on Information Theory.

[25]  Eitan Yaakobi,et al.  Codes in the Damerau Distance for Deletion and Adjacent Transposition Correction , 2018, IEEE Transactions on Information Theory.

[26]  2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) , 2019 .

[27]  Venkatesan Guruswami,et al.  Optimally resilient codes for list-decoding from insertions and deletions , 2019, Electron. Colloquium Comput. Complex..

[28]  Zhengzhong Jin,et al.  Deterministic Document Exchange Protocols, and Almost Optimal Binary Codes for Edit Errors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).