Optimal Document Exchange and New Codes for Insertions and Deletions

We give the first communication-optimal document exchange protocol. For any n and k < n our randomized scheme takes any n-bit file F and computes a Θ(k log n/k) -bit summary from which one can reconstruct F, with high probability, given a related file F' with edit distance ED(F,F') ≤ k. The size of our summary is information-theoretically order optimal for all values of k, giving a randomized solution to a longstanding open question of [Orlitsky; FOCS'91]. It also is the first non-trivial solution for the interesting setting where a small constant fraction of symbols have been edited, producing an optimal summary of size O(H(δ)n) for k=δ n. This concludes a long series of better-and-better protocols which produce larger summaries for sub-linear values of k and sub-polynomial failure probabilities. In particular, the recent break-through of [Belazzougui, Zhang; FOCS'16] assumes that k < n^ε, produces a summary of size O(klog^2 k + k log n), and succeeds with probability 1-(k log n)^ -O(1). We also give an efficient derandomized document exchange protocol with summary size O(k log^2 n/k). This improves, for any k, over a deterministic document exchange protocol by Belazzougui with summary size O(k^2 + k log^2 n). Our deterministic document exchange directly provides new efficient systematic error correcting codes for insertions and deletions. These (binary) codes correct any δ fraction of adversarial insertions/deletions while having a rate of 1 - O(δ log^2 1/δ) and improve over the codes of Guruswami and Li and Haeupler, Shahrasbi and Vitercik which have rate 1 - Θ (√δ log^O(1) 1/ε).

[1]  Torsten Suel,et al.  Improved single-round protocols for remote file synchronization , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[2]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[3]  Bernhard Haeupler Optimal Document Exchange and New Codes for Small Number of Insertions and Deletions , 2018, ArXiv.

[4]  Michal Koucký,et al.  Streaming algorithms for embedding and computing edit distance in the low distance regime , 2016, STOC.

[5]  Bernhard Haeupler,et al.  Synchronization strings: explicit constructions, local decoding, and applications , 2017, STOC.

[6]  Venkatesan Guruswami,et al.  Deletion Codes in the High-Noise and High-Rate Regimes , 2014, IEEE Transactions on Information Theory.

[7]  Djamal Belazzougui,et al.  Efficient Deterministic Single Round Document Exchange for Edit Distance , 2015, ArXiv.

[8]  Moni Naor,et al.  Small-bias probability spaces: efficient constructions and applications , 1990, STOC '90.

[9]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[10]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[11]  Hossein Jowhari,et al.  Efficient Communication Protocols for Deciding Edit Distance , 2012, ESA.

[12]  Alon Orlitsky,et al.  Interactive communication: balanced distributions, correlated files, and average-case complexity , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[13]  Venkatesan Guruswami,et al.  Efficient Low-Redundancy Codes for Correcting Multiple Deletions , 2015, IEEE Transactions on Information Theory.

[14]  Venkatesan Guruswami,et al.  Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[15]  Rafail Ostrovsky,et al.  Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data , 2004, SIAM J. Comput..

[16]  Bernhard Haeupler,et al.  Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions , 2017, ICALP.

[17]  Venkatesan Guruswami,et al.  An Improved Bound on the Fraction of Correctable Deletions , 2015, IEEE Transactions on Information Theory.

[18]  N.J.A. Sloane,et al.  On Single-Deletion-Correcting Codes , 2002, math/0207197.

[19]  David Zuckerman,et al.  Asymptotically good codes correcting insertions, deletions, and transpositions , 1997, SODA '97.

[20]  Bernhard Haeupler,et al.  Synchronization strings: codes for insertions and deletions approaching the Singleton bound , 2017, STOC.

[21]  Qin Zhang,et al.  Edit Distance: Sketching, Streaming, and Document Exchange , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Zhengzhong Jin,et al.  Deterministic Document Exchange Protocols, and Almost Optimal Binary Codes for Edit Errors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).