Set reconciliation with nearly optimal communication complexity

We consider the problem of efficiently reconciling two similar sets held by different hosts while minimizing the communication complexity. This type of problem arises naturally from gossip protocols used for the distribution of information, but has other applications as well. We describe an approach to such reconciliation based on the encoding of sets as polynomials. The resulting protocols exhibit tractable computational complexity and nearly optimal communication complexity. Moreover, these protocols can be adapted to work over a broadcast channel, allowing many clients to reconcile with one host based on a single broadcast.

[1]  Jacob T. Schwartz,et al.  Fast Probabilistic Algorithms for Verification of Polynomial Identities , 1980, J. ACM.

[2]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[3]  Andrew Tridgell,et al.  Efficient Algorithms for Sorting and Synchronization , 1999 .

[4]  Alon Orlitsky Interactive Communication of Balanced Distributions and of Correlated Files , 1993, SIAM J. Discret. Math..

[5]  Sachin Agarwal,et al.  On the scalability of data synchronization protocols for PDAs and mobile devices , 2002, IEEE Netw..

[6]  Robbert van Renesse,et al.  Scalable and Secure Resource Location , 2000, HICSS.

[7]  John J. Metzner,et al.  A general decoding technique applicable to replicated file disagreement location and concatenated code decoding , 1990, IEEE Trans. Inf. Theory.

[8]  Alon Orlitsky Communication issues in distributed computing , 1987 .

[9]  Noga Alon,et al.  Source coding and graph entropies , 1996, IEEE Trans. Inf. Theory.

[10]  Richard A. Golding,et al.  Weak-consistency group communication and membership , 1992 .

[11]  Erich Kaltofen,et al.  Computing with Polynomials Given By Black Boxes for Their Evaluations: Greatest Common Divisors, Factorization, Separation of Numerators and Denominators , 1990, J. Symb. Comput..

[12]  Victor Shoup,et al.  A New Polynomial Factorization Algorithm and its Implementation , 1995, J. Symb. Comput..

[13]  Alon Orlitsky,et al.  Worst-case interactive communication I: Two messages are almost optimal , 1990, IEEE Trans. Inf. Theory.

[14]  Moni Naor,et al.  Three results on interactive communication , 1993, IEEE Trans. Inf. Theory.

[15]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[16]  H. S. WITSENHAUSEN,et al.  The zero-error side information problem and chromatic numbers (Corresp.) , 1976, IEEE Trans. Inf. Theory.

[17]  Uzi Vishkin,et al.  Communication complexity of document exchange , 1999, SODA '00.

[18]  Khaled A. S. Abdel-Ghaffar,et al.  An Optimal Strategy for Comparing File Copies , 1994, IEEE Trans. Parallel Distributed Syst..

[19]  Richard Zippel,et al.  Effective polynomial computation , 1993, The Kluwer international series in engineering and computer science.

[20]  Mor Harchol-Balter,et al.  Resource discovery in distributed networks , 1999, PODC '99.

[21]  Begnaud Francis Hildebrand,et al.  Introduction to numerical analysis: 2nd edition , 1987 .

[22]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[23]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[24]  Sachin Agarwal,et al.  Fast PDA synchronization using characteristic polynomial interpolation , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[25]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[26]  A. Trachtenberg,et al.  Practical Set Reconciliation , 2002 .

[27]  Richard J. Lipton,et al.  Efficient Checking of Computations , 1990, STACS.

[28]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[29]  Erich Kaltofen,et al.  Subquadratic-time factoring of polynomials over finite fields , 1995, STOC '95.

[30]  Mark G. Karpovsky,et al.  Data verification and reconciliation with generalized error-control codes , 2003, IEEE Transactions on Information Theory.

[31]  Vladimir I. Levenshtein,et al.  Efficient reconstruction of sequences , 2001, IEEE Trans. Inf. Theory.

[32]  Manuel Blum,et al.  Designing programs that check their work , 1989, STOC '89.

[33]  Robbert van Renesse,et al.  GSGC: An Efficient Gossip-Style Garbage Collection Scheme for Scalable Reliable Multicast , 1997 .

[34]  Sachin Agarwal,et al.  Efficient PDA Synchronization , 2003, IEEE Trans. Mob. Comput..

[35]  Ronald Fagin,et al.  Compactly encoding unstructured inputs with differential compression , 2002, JACM.

[36]  Richard J. Lipton,et al.  A Class of Randomized Strategies for Low-Cost Comparison of File Copies , 1991, IEEE Trans. Parallel Distributed Syst..

[37]  Erich Kaltofen,et al.  Computing with polynomials given by black boxes for their evaluations: greatest common divisors, factorization, separation of numerators and denominators , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.