Synchronization Strings and Codes for Insertions and Deletions—A Survey

Already in the 1960s, Levenshtein and others studied error-correcting codes that protect against synchronization errors, such as symbol insertions and deletions. However, despite significant efforts, progress on designing such codes has been lagging until recently, particularly compared to the detailed understanding of error-correcting codes for symbol substitution or erasure errors. This paper surveys the recent progress in designing efficient error-correcting codes over finite alphabets that can correct a constant fraction of worst-case insertions and deletions. Most state-of-the-art results for such codes rely on synchronization strings, simple yet powerful pseudo-random objects that have proven to be very effective solutions for coping with synchronization errors in various settings. This survey also includes an overview of what is known about synchronization strings and discusses communication settings related to error-correcting codes in which synchronization strings have been applied. Supported in part by NSF grants CCF-1527110, CCF-1618280, CCF-1814603, CCF-1910588, NSF CAREER award CCF-1750808, a Sloan Research Fellowship, and funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (ERC grant agreement 949272). ar X iv :2 10 1. 00 71 1v 1 [ cs .I T ] 3 J an 2 02 1

[1]  Xiao-Ming Chen,et al.  Forward Error Correction for DNA Data Storage , 2016, ICCS.

[2]  Chaoping Xing,et al.  List Decoding of Insertion and Deletion Codes , 2019 .

[3]  Venkatesan Guruswami,et al.  An Improved Bound on the Fraction of Correctable Deletions , 2015, IEEE Transactions on Information Theory.

[4]  Bernhard Haeupler,et al.  Near-linear time insertion-deletion codes and (1+ε)-approximating edit distance via indexing , 2018, STOC.

[5]  Venkatesan Guruswami,et al.  Optimally resilient codes for list-decoding from insertions and deletions , 2019, Electron. Colloquium Comput. Complex..

[6]  Adriaan J. de Lind van Wijngaarden,et al.  On the construction of maximal prefix-synchronized codes , 1996, IEEE Trans. Inf. Theory.

[7]  Brett Hemenway,et al.  Local List Recovery of High-Rate Tensor Codes and Applications , 2017, SIAM J. Comput..

[8]  Bernhard Haeupler,et al.  Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions , 2017, ICALP.

[9]  Hendrik C. Ferreira,et al.  On multiple insertion/Deletion correcting codes , 2002, IEEE Trans. Inf. Theory.

[10]  Xin Li,et al.  Synchronization Strings: Highly Efficient Deterministic Constructions over Small Alphabets , 2019, SODA.

[11]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[12]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[13]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[14]  Michal Koucký,et al.  Streaming algorithms for embedding and computing edit distance in the low distance regime , 2016, STOC.

[15]  Qin Zhang,et al.  Edit Distance: Sketching, Streaming, and Document Exchange , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[16]  Venkatesan Guruswami,et al.  Deletion Codes in the High-Noise and High-Rate Regimes , 2014, IEEE Transactions on Information Theory.

[17]  Leo J. Guibas,et al.  Maximal Prefix-Synchronized Codes , 1978 .

[18]  Erdal Arikan,et al.  Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels , 2008, IEEE Transactions on Information Theory.

[19]  Jie Ma,et al.  Longest Common Subsequences in Sets of Words , 2014, SIAM J. Discret. Math..

[20]  Bernhard Haeupler,et al.  Synchronization strings: explicit constructions, local decoding, and applications , 2017, STOC.

[21]  Venkatesan Guruswami,et al.  General strong polarization , 2018, Electron. Colloquium Comput. Complex..

[22]  Zhengzhong Jin,et al.  Deterministic Document Exchange Protocols, and Almost Optimal Binary Codes for Edit Errors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  Venkatesan Guruswami,et al.  Efficient Linear and Affine Codes for Correcting Insertions/Deletions , 2020, ArXiv.

[24]  David Zuckerman,et al.  Asymptotically good codes correcting insertions, deletions, and transpositions , 1997, SODA '97.

[25]  Antonia Wachter-Zeh,et al.  List Decoding of Insertions and Deletions , 2017, IEEE Transactions on Information Theory.

[26]  N.J.A. Sloane,et al.  On Single-Deletion-Correcting Codes , 2002, math/0207197.

[27]  Edgar N. Gilbert,et al.  Synchronization of binary messages , 1960, IRE Trans. Inf. Theory.

[28]  Bernhard Haeupler Optimal Document Exchange and New Codes for Insertions and Deletions , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[29]  Khaled A. S. Abdel-Ghaffar,et al.  Insertion/deletion correction with spectral nulls , 1997, IEEE Trans. Inf. Theory.

[30]  Luis Ceze,et al.  A DNA-Based Archival Storage System , 2016, ASPLOS.

[31]  Venkatesan Guruswami,et al.  Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[32]  Rafail Ostrovsky,et al.  Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data , 2004, SIAM J. Comput..

[33]  Amir Shpilka,et al.  Explicit and Efficient Constructions of Coding Schemes for the Binary Deletion Channel , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[34]  Cyrus Rashtchian,et al.  Scaling up DNA data storage and random access retrieval , 2017, bioRxiv.

[35]  Shubhangi Saraf,et al.  On List Recovery of High-Rate Tensor Codes , 2019, IEEE Transactions on Information Theory.

[36]  Torsten Suel,et al.  Improved single-round protocols for remote file synchronization , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[37]  Bernhard Haeupler,et al.  Optimal Error Rates for Interactive Coding II: Efficiency and List Decoding , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[38]  Jian Ma,et al.  DNA-Based Storage: Trends and Methods , 2015, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[39]  Mark Braverman,et al.  Coding for Interactive Communication Correcting Insertions and Deletions , 2017, IEEE Transactions on Information Theory.

[40]  Frederick F. Sellers,et al.  Bit loss and gain correction code , 1962, IRE Trans. Inf. Theory.

[41]  Venkatesan Guruswami,et al.  Efficient Low-Redundancy Codes for Correcting Multiple Deletions , 2015, IEEE Transactions on Information Theory.

[42]  Alon Orlitsky,et al.  Interactive communication: balanced distributions, correlated files, and average-case complexity , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[43]  G. Tenengolts,et al.  Nonbinary codes, correcting single deletion or insertion , 1984, IEEE Trans. Inf. Theory.

[44]  Venkatesan Guruswami,et al.  Linear-time encodable/decodable codes with near-optimal rate , 2005, IEEE Transactions on Information Theory.

[45]  Hossein Jowhari,et al.  Efficient Communication Protocols for Deciding Edit Distance , 2012, ESA.

[46]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[47]  William H. Kautz Ieee Transactions on Information Theory Co~tcluding Remarks , 2022 .

[48]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[49]  A.J. van Wijngaarden,et al.  Extended prefix synchronization codes , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[50]  Mahdi Cheraghchi,et al.  An Overview of Capacity Results for Synchronization Channels , 2019, ArXiv.

[51]  Bernhard Haeupler,et al.  Synchronization strings: codes for insertions and deletions approaching the Singleton bound , 2017, STOC.

[52]  Madhu Sudan,et al.  Synchronization Strings: List Decoding for Insertions and Deletions , 2018, ICALP.

[53]  Bernhard Haeupler,et al.  Rate-Distance Tradeoffs for List-Decodable Insertion-Deletion Codes , 2020, ArXiv.

[54]  Karthekeyan Chandrasekaran,et al.  Deterministic algorithms for the Lovász Local Lemma , 2009, SODA '10.

[55]  A.J. Han Vinck,et al.  Prefix synchronized codes capable of correcting single insertion/deletion errors , 1997, Proceedings of IEEE International Symposium on Information Theory.

[56]  Khaled A. S. Abdel-Ghaffar,et al.  On Helberg's Generalization of the Levenshtein Code for Multiple Deletion/Insertion Error Correction , 2012, IEEE Transactions on Information Theory.

[57]  Bruce Spang,et al.  Coded trace reconstruction in a constant number of traces , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[58]  Vahid Tarokh,et al.  A survey of error-correcting codes for channels with symbol synchronization errors , 2010, IEEE Communications Surveys & Tutorials.

[59]  Kenji Yasunaga,et al.  On the List Decodability of Insertions and Deletions , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[60]  Frederic Sala,et al.  Codes Correcting Two Deletions , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[61]  Bernhard Haeupler,et al.  Interactive Channel Capacity Revisited , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.