A practical framework for efficient file synchronization

Efficient synchronization of remote copies of files that have experienced insertions and deletions is an important problem with many applications including data storage, file sharing, online editing, and cloud computing. Suppose that user A is the owner of an original file X, and user B is the owner of the edited file Y that is obtained from X through a series of insertions and deletions. In our recent work [1], [2] we developed the first low-complexity two-way protocol between users A and B for synchronizing from a fixed rate of insertions and deletions. This protocol is order-wise optimal and achieves exponentially low probability of the reconstruction error. In this paper, we report on further results, including a description of implementation details of the synchronization protocol and comparisons with existing methods.

[1]  Uzi Vishkin,et al.  Communication complexity of document exchange , 1999, SODA '00.

[2]  Khaled A. S. Abdel-Ghaffar,et al.  On Helberg's Generalization of the Levenshtein Code for Multiple Deletion/Insertion Error Correction , 2012, IEEE Transactions on Information Theory.

[3]  Chuohao Yeo,et al.  VSYNC: a novel video file synchronization protocol , 2008, ACM Multimedia.

[4]  Vahid Tarokh,et al.  A survey of error-correcting codes for channels with symbol synchronization errors , 2010, IEEE Communications Surveys & Tutorials.

[5]  Kannan Ramchandran,et al.  Interactive low-complexity codes for synchronization from deletions and insertions , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  David Tse,et al.  Information theory for DNA sequencing: Part I: A basic model , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[7]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[8]  Lara Dolecek,et al.  Repetition Error Correcting Sets: Explicit Constructions and Prefixing Methods , 2009, SIAM J. Discret. Math..

[9]  Kannan Ramchandran,et al.  Efficient file synchronization: A distributed source coding approach , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[10]  H. C. Ferreira,et al.  On multiple insertion/deletion correcting codes , 1994, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[11]  Lara Dolecek,et al.  Synchronization from insertions and deletions under a non-binary, non-uniform source , 2013, 2013 IEEE International Symposium on Information Theory.

[12]  Lara Dolecek,et al.  Using Reed–Muller ${\hbox{RM}}\,(1, m)$ Codes Over Channels With Synchronization and Substitution Errors , 2007, IEEE Transactions on Information Theory.

[13]  Alon Orlitsky,et al.  Practical protocols for interactive communication , 2001, Proceedings. 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252).

[14]  Andrew Tridgell,et al.  Efficient Algorithms for Sorting and Synchronization , 1999 .

[15]  Alon Orlitsky Interactive Communication of Balanced Distributions and of Correlated Files , 1993, SIAM J. Discret. Math..

[16]  G. Tenengolts,et al.  Nonbinary codes, correcting single deletion or insertion , 1984, IEEE Trans. Inf. Theory.

[17]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[18]  Alexandre V. Evfimievski A probabilistic algorithm for updating files over a communication link , 1998, SODA '98.

[19]  Lara Dolecek,et al.  A Deterministic Polynomial-Time Protocol for Synchronizing From Deletions , 2014, IEEE Transactions on Information Theory.

[20]  Kannan Ramchandran,et al.  Efficient interactive algorithms for file synchronization under general edits , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).