Coding for Insertions and Deletions

Thus far, the course has discussed several problems with the common theme of protecting information transmissions against errors that are of symbol erasure or symbol substitution types. Today’s lecture will be focused on another type of error that occurs in certain communication or storage applications, called symbol insertions and symbol deletions. A symbol insertion error is defined as a symbol being inserted into a string of symbols and shifting all the following symbols one position forward and a symbol deletion error is deleting a symbol from a string of symbols without replacing it with any sort of placeholders. For instance, deleting the third position in string abacbc turns it into abcbc and inserting symbol c in the second position gives acbacbc. Symbol insertions and symbol deletions, or synchronization errors for short, are prevalent in communication situations where the parties of the communication have no means of staying in sync or any sort of application that involves DNAs like design of memories based on synthetic DNA strands. Note that, as opposed to symbol erasures and symbol substitutions (Hamming-type errors), insertions and deletions (synchronization errors) can potentially shift around uncorrupted symbols as well. This extra complication makes the problem of coding for insertions and deletions a much harder problem. In fact, our understanding of insertion-deletion codes significantly lags behind our thorough understanding of error correcting codes (ECCs).

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  Alon Orlitsky,et al.  Interactive communication: balanced distributions, correlated files, and average-case complexity , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[3]  M. Luby,et al.  Asymptotically Good Codes Correcting Insertions, Deletions, and Transpositions , 1999 .

[4]  Torsten Suel,et al.  Improved single-round protocols for remote file synchronization , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[5]  Venkatesan Guruswami,et al.  Linear-time encodable/decodable codes with near-optimal rate , 2005, IEEE Transactions on Information Theory.

[6]  Venkatesan Guruswami,et al.  Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[7]  Bernhard Haeupler,et al.  Synchronization strings: codes for insertions and deletions approaching the Singleton bound , 2017, STOC.

[8]  Venkatesan Guruswami,et al.  Deletion Codes in the High-Noise and High-Rate Regimes , 2014, IEEE Transactions on Information Theory.

[9]  Bernhard Haeupler Optimal Document Exchange and New Codes for Small Number of Insertions and Deletions , 2018, ArXiv.

[10]  Zhengzhong Jin,et al.  Deterministic Document Exchange Protocols, and Almost Optimal Binary Codes for Edit Errors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).