Low-Complexity Interactive Algorithms for Synchronization From Deletions, Insertions, and Substitutions

Consider two remote nodes having binary sequences X and Y, respectively. Y is an edited version of X, where the editing involves random deletions, insertions, and substitutions, possibly in bursts. The goal is for the node with Y to reconstruct X with minimal exchange of information over a noiseless link. The communication is measured in terms of both the total number of bits exchanged and the number of interactive rounds of communication. This paper focuses on the setting where the number of edits is o(n/log n), where n is the length of X. We first consider the case where the edits are a mixture of insertions and deletions (indels), and propose an interactive synchronization algorithm with near-optimal communication rate and average computational complexity of O(n) arithmetic operations. The algorithm uses interaction to efficiently split the source sequence into substrings containing exactly one deletion or insertion. Each of these substrings is then synchronized using an optimal one-way algorithm based on the single-deletion correcting channel codes of Varshamov and Tenengolts. We then build on this synchronization algorithm in three different ways. First, it is modified to work with a single round of interaction. The reduction in the number of rounds comes at the expense of higher communication, which is quantified. Next, we present an extension to the practically important case where the insertions and deletions may occur in (potentially large) bursts. Finally, we show how to synchronize the sources to within a target Hamming distance. This feature can be used to differentiate between substitution and indel edits. In addition to theoretical performance bounds, we provide several validating simulation results for the proposed algorithms.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  Jack K. Wolf,et al.  Noiseless coding of correlated information sources , 1973, IEEE Trans. Inf. Theory.

[3]  Aaron D. Wyner,et al.  Recent results in the Shannon theory , 1974, IEEE Trans. Inf. Theory.

[4]  Aaron D. Wyner,et al.  The rate-distortion function for source coding with side information at the decoder , 1976, IEEE Trans. Inf. Theory.

[5]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[6]  G. Tenengolts,et al.  Nonbinary codes, correcting single deletion or insertion , 1984, IEEE Trans. Inf. Theory.

[7]  Alon Orlitsky Interactive Communication of Balanced Distributions and of Correlated Files , 1993, SIAM J. Discret. Math..

[8]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[9]  Alexandre V. Evfimievski A probabilistic algorithm for updating files over a communication link , 1998, SODA '98.

[10]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[11]  Uzi Vishkin,et al.  Communication complexity of document exchange , 1999, SODA '00.

[12]  David J. C. MacKay,et al.  Reliable communication over channels with insertions, deletions, and substitutions , 2001, IEEE Trans. Inf. Theory.

[13]  Alon Orlitsky,et al.  Practical protocols for interactive communication , 2001, Proceedings. 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252).

[14]  N.J.A. Sloane,et al.  On Single-Deletion-Correcting Codes , 2002, math/0207197.

[15]  Kannan Ramchandran,et al.  Distributed source coding using syndromes (DISCUS): design and construction , 2003, IEEE Trans. Inf. Theory.

[16]  Alon Orlitsky,et al.  One-way communication and error-correcting codes , 2003, IEEE Transactions on Information Theory.

[17]  S. Ross THE INSPECTION PARADOX , 2003, Probability in the Engineering and Informational Sciences.

[18]  Zixiang Xiong,et al.  Distributed source coding for sensor networks , 2004, IEEE Signal Processing Magazine.

[19]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[20]  Sachin Agarwal,et al.  Bandwidth Efficient String Reconciliation Using Puzzles , 2006, IEEE Transactions on Parallel and Distributed Systems.

[21]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[22]  M. Mitzenmacher A survey of results for deletion channels and related synchronization channels , 2009 .

[23]  Kannan Ramchandran,et al.  Interactive low-complexity codes for synchronization from deletions and insertions , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Chuohao Yeo,et al.  VSYNC: Bandwidth-Efficient and Distortion-Tolerant Video File Synchronization , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Lara Dolecek,et al.  Synchronization from deletions through interactive communication , 2012, 2012 7th International Symposium on Turbo Codes and Iterative Information Processing (ISTC).

[26]  Aryeh Kontorovich,et al.  String reconciliation with unknown edit distance , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[27]  Kannan Ramchandran,et al.  A compression algorithm using mis-aligned side-information , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[28]  Lara Dolecek,et al.  Synchronization from insertions and deletions under a non-binary, non-uniform source , 2013, 2013 IEEE International Symposium on Information Theory.

[29]  L. J. Boya,et al.  On Regular Polytopes , 2012, 1210.0601.

[30]  Frederic Sala,et al.  A practical framework for efficient file synchronization , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[31]  Lili Su,et al.  Synchronizing rankings via interactive communication , 2014, 2014 IEEE International Symposium on Information Theory.

[32]  Lara Dolecek,et al.  A Deterministic Polynomial-Time Protocol for Synchronizing From Deletions , 2014, IEEE Transactions on Information Theory.