The complexity of object reconciliation, and open problems related to set difference and coding

We explore the connections between the classical problems of set difference and error correction codes, motivated by some recent results on Invertible Bloom Filters (communication-efficient set difference) and Biff Codes (fast error correction coding based on set difference). In particular, we seek to understand how these results generalize to settings where many parties communicate over a network represented by a graph, and the goal is for the parties to reconcile the objects owned by each, for some suitable definition of reconcile. Our general framework encompasses standard problems such as rumor spreading and network coding. We suggest that generalizing to other objects such as sequences with other measures such as edit distance may lead to a theory of reconciling objects over graphs. Such a theory may have practical consequences for modern cloud-based deployments.

[1]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[2]  Raymond W. Yeung,et al.  Information Theory and Network Coding , 2008 .

[3]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[4]  R. Koetter,et al.  The benefits of coding over routing in a randomized setting , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Michael T. Goodrich,et al.  Invertible bloom lookup tables , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Dariusz R. Kowalski,et al.  Gossiping to reach consensus , 2002, SPAA '02.

[8]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[9]  Robert W. Bowdidge,et al.  Low cost comparisons of file copies , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[10]  Zhe Wang,et al.  Ferret: a toolkit for content-based similarity search of feature-rich data , 2006, EuroSys.

[11]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[12]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[13]  Yaron Minsky,et al.  Set reconciliation with nearly optimal communication complexity , 2003, IEEE Trans. Inf. Theory.

[14]  George Varghese,et al.  Biff (Bloom filter) codes: Fast error correction for large data sets , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[15]  George Varghese,et al.  What's the difference?: efficient set reconciliation without prior context , 2011, SIGCOMM.

[16]  Andrew Tridgell,et al.  Efficient Algorithms for Sorting and Synchronization , 1999 .

[17]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[18]  David Eppstein,et al.  Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters , 2007, IEEE Transactions on Knowledge and Data Engineering.

[19]  Hector Garcia-Molina,et al.  Exploiting symmetries for low-cost comparison of file copies , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[20]  Hadas Shachnai,et al.  Fast information spreading in graphs with large weak conductance , 2011, SODA '11.

[21]  Amin Vahdat,et al.  Bullet: high bandwidth data dissemination using an overlay mesh , 2003, SOSP '03.

[22]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[23]  Ari Trachtenberg,et al.  Reconciliation puzzles [separately hosted strings reconciliation] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[24]  Silvio Lattanzi,et al.  Almost tight bounds for rumour spreading with conductance , 2010, STOC '10.

[25]  Jack K. Wolf,et al.  Noiseless coding of correlated information sources , 1973, IEEE Trans. Inf. Theory.