Optimal Document Exchange and New Codes for Insertions and Deletions

We give the first communication-optimal document exchange protocol. For any n and k < n our randomized scheme takes any n-bit file F and computes a Θ(k log n/k) -bit summary from which one can reconstruct F, with high probability, given a related file F' with edit distance ED(F,F') ≤ k. The size of our summary is information-theoretically order optimal for all values of k, giving a randomized solution to a longstanding open question of [Orlitsky; FOCS'91]. It also is the first non-trivial solution for the interesting setting where a small constant fraction of symbols have been edited, producing an optimal summary of size O(H(δ)n) for k=δ n. This concludes a long series of better-and-better protocols which produce larger summaries for sub-linear values of k and sub-polynomial failure probabilities. In particular, the recent break-through of [Belazzougui, Zhang; FOCS'16] assumes that k < n^ε, produces a summary of size O(klog^2 k + k log n), and succeeds with probability 1-(k log n)^ -O(1). We also give an efficient derandomized document exchange protocol with summary size O(k log^2 n/k). This improves, for any k, over a deterministic document exchange protocol by Belazzougui with summary size O(k^2 + k log^2 n). Our deterministic document exchange directly provides new efficient systematic error correcting codes for insertions and deletions. These (binary) codes correct any δ fraction of adversarial insertions/deletions while having a rate of 1 - O(δ log^2 1/δ) and improve over the codes of Guruswami and Li and Haeupler, Shahrasbi and Vitercik which have rate 1 - Θ (√δ log^O(1) 1/ε).

