Interactive low-complexity codes for synchronization from deletions and insertions

We study the problem of synchronization of two remotely located data sources, which are mis-synchronized due to deletions and insertions. This is an important problem since a small number of synchronization errors can induce a large Hamming distance between the two sources. The goal is to effect synchronization with the rate-efficient use of lossless bidirectional links between the two sources. In this work, we focus on the following model. A binary sequence X of length n is edited to generate the sequence at the remote end, say Y, where the editing involves random deletions and insertions, possibly in small bursts. The problem is to synchronize Y with X with minimal exchange of information (in terms of both the average communication rate and the average number of interactive rounds of communication). We focus here on the case where the number of edits is much smaller than n, and propose an interactive algorithm which is computationally simple and has near-optimal communication complexity. Our algorithm works by efficiently splitting the source sequence into pieces containing either just a single deletion/insertion or a single burst deletion/insertion. Each of these pieces is then synchronized using an optimal one-way synchronization code, based on the single-deletion correcting channel codes of Varshamov and Tenengolts (VT codes).

[1]  Uzi Vishkin,et al.  Communication complexity of document exchange , 1999, SODA '00.

[2]  Aaron D. Wyner,et al.  Recent results in the Shannon theory , 1974, IEEE Trans. Inf. Theory.

[3]  N.J.A. Sloane,et al.  On Single-Deletion-Correcting Codes , 2002, math/0207197.

[4]  Chuohao Yeo,et al.  VSYNC: a novel video file synchronization protocol , 2008, ACM Multimedia.

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  G. Tenengolts,et al.  Nonbinary codes, correcting single deletion or insertion , 1984, IEEE Trans. Inf. Theory.

[7]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[8]  Aaron D. Wyner,et al.  The rate-distortion function for source coding with side information at the decoder , 1976, IEEE Trans. Inf. Theory.

[9]  R. A. McDonald,et al.  Noiseless Coding of Correlated Information Sources , 1973 .

[10]  A. Orlitsky,et al.  One-way communication and error-correcting codes , 2002, Proceedings IEEE International Symposium on Information Theory,.

[11]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[12]  K. Ramchandran,et al.  Distributed source coding using syndromes (DISCUS): design and construction , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[13]  David J. C. MacKay,et al.  Reliable communication over channels with insertions, deletions, and substitutions , 2001, IEEE Trans. Inf. Theory.

[14]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.