High-bandwidth data dissemination for large-scale distributed systems

This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted at the source to distribute content among all interested receivers. We argue, however, that trees have two fundamental limitations for data dissemination. First, since all data comes from a single parent, participants must often continuously probe in search of a parent with an acceptable level of bandwidth. Second, due to packet losses and failures, available bandwidth is monotonically decreasing down the tree. To address these limitations, we present Bullet, a data dissemination mesh that takes advantage of the computational and storage capabilities of end hosts to create a distribution structure where a node receives data in parallel from multiple peers. For the mesh to deliver improved bandwidth and reliability, we need to solve several key problems: (i) disseminating disjoint data over the mesh, (ii) locating missing content, (iii) finding who to peer with (peering strategy), (iv) retrieving data at the right rate from all peers (flow control), and (v) recovering from failures and adapting to dynamically changing network conditions. Additionally, the system should be self-adjusting and should have few user-adjustable parameter settings. We describe our approach to addressing all of these problems in a working, deployed system across the Internet. Bullet outperforms state-of-the-art systems, including BitTorrent, by 25-70% and exhibits strong performance and reliability in a range of deployment settings. In addition, we find that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing.

[1]  Donald F. Towsley,et al.  Modeling TCP throughput: a simple model and its empirical validation , 1998, SIGCOMM '98.

[2]  Guillaume Urvoy-Keller,et al.  Rarest first and choke algorithms are enough , 2006, IMC '06.

[3]  Christos Gkantsidis,et al.  Anatomy of a P2P Content Distribution system with Network Coding , 2006, IPTPS.

[4]  Miguel Castro,et al.  SplitStream: High-Bandwidth Content Distribution in Cooperative Environments , 2003, IPTPS.

[5]  Kenneth L. Calvert,et al.  Modeling Internet topology , 1997, IEEE Commun. Mag..

[6]  Vinay S. Pai,et al.  Chainsaw: Eliminating Trees from Overlay Multicast , 2005, IPTPS.

[7]  Bo Li,et al.  DONet: A Data-Driven Overlay Network For Efficient Live Media Streaming , 2004, INFOCOM 2005.

[8]  Amin Vahdat,et al.  Using Random Subsets to Build Scalable Network Services , 2003, USENIX Symposium on Internet Technologies and Systems.

[9]  Anthony Young,et al.  Overlay mesh construction using interleaved spanning trees , 2004, IEEE INFOCOM 2004.

[10]  Amin Shokrollahi,et al.  Raptor codes , 2011, IEEE Transactions on Information Theory.

[11]  Rob Sherwood,et al.  Slurpie: a cooperative bulk data transfer protocol , 2004, IEEE INFOCOM 2004.

[12]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[13]  VahdatAmin,et al.  High-bandwidth data dissemination for large-scale distributed systems , 2008 .

[14]  László Lovász,et al.  Building scalable and robust peer-to-peer overlay networks for broadcasting using network coding , 2007, PODC '05.

[15]  Jeffrey Considine,et al.  Informed content delivery across adaptive overlay networks , 2002, IEEE/ACM Transactions on Networking.

[16]  Reuven Cohen,et al.  A unicast-based approach for streaming multicast , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[17]  Krishna P. Gummadi,et al.  An analysis of Internet content delivery systems , 2002, OPSR.

[18]  David E. Culler,et al.  A blueprint for introducing disruptive technology into the Internet , 2003, CCRV.

[19]  Hui Zhang,et al.  A case for end system multicast (keynote address) , 2000, SIGMETRICS '00.

[20]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[21]  Daniel A. Spielman,et al.  Practical loss-resilient codes , 1997, STOC '97.

[22]  Helen J. Wang,et al.  Resilient peer-to-peer streaming , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[23]  Anne-Marie Kermarrec,et al.  Lightweight probabilistic broadcast , 2003, TOCS.

[24]  Manish Jain,et al.  End-to-end available bandwidth: measurement methodology, dynamics, and relation with TCP throughput , 2003, TNET.

[25]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[26]  Rayadurgam Srikant,et al.  Modeling and performance analysis of BitTorrent-like peer-to-peer networks , 2004, SIGCOMM 2004.

[27]  Laurent Massoulié,et al.  Coupon replication systems , 2005, IEEE/ACM Transactions on Networking.

[28]  David Mazières,et al.  Rateless Codes and Big Downloads , 2003, IPTPS.

[29]  Helen J. Wang,et al.  Distributing streaming media content using cooperative networking , 2002, NOSSDAV '02.

[30]  Bobby Bhattacharjee,et al.  Scalable application layer multicast , 2002, SIGCOMM '02.

[31]  Arun Venkataramani,et al.  Do incentives build robustness in bit torrent , 2007 .

[32]  KyoungSoo Park,et al.  Scale and Performance in the CoBlitz Large-File Distribution Service , 2006, NSDI.

[33]  Peter Steenkiste,et al.  Evaluation and characterization of available bandwidth probing techniques , 2003, IEEE J. Sel. Areas Commun..

[34]  Sanjay Rao,et al.  Enabling contribution awareness in an overlay broadcasting system , 2006, SIGCOMM 2006.

[35]  Srinivasan Seshan,et al.  A case for end system multicast , 2002, IEEE J. Sel. Areas Commun..

[36]  Amin Vahdat,et al.  Scalability in adaptive multi-metric overlays , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[37]  Dejan Kostic,et al.  Scalability and accuracy in a large-scale network emulator , 2002, CCRV.

[38]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[39]  Laurent Massoulié,et al.  Coupon replication systems , 2008, TNET.

[40]  Helen J. Wang,et al.  Server-based inference of Internet link lossiness , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[41]  Srinivasan Seshan,et al.  Enabling conferencing applications on the internet using an overlay muilticast architecture , 2001, SIGCOMM '01.

[42]  Srinivasan Seshan,et al.  Enabling conferencing applications on the internet using an overlay muilticast architecture , 2001, SIGCOMM 2001.

[43]  Miguel Castro,et al.  SplitStream: high-bandwidth multicast in cooperative environments , 2003, SOSP '03.

[44]  Mark Handley,et al.  Equation-based congestion control for unicast applications , 2000, SIGCOMM 2000.

[45]  Mark Handley,et al.  Equation-based congestion control for unicast applications , 2000, SIGCOMM.

[46]  Vivek K. Goyal,et al.  Multiple description coding: compression meets the network , 2001, IEEE Signal Process. Mag..

[47]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[48]  K. Jain,et al.  Practical Network Coding , 2003 .

[49]  Robert Tappan Morris,et al.  Designing a DHT for Low Latency and High Throughput , 2004, NSDI.

[50]  Amin Vahdat,et al.  Bullet: high bandwidth data dissemination using an overlay mesh , 2003, SOSP '03.

[51]  Amin Vahdat,et al.  Maintaining High-Bandwidth Under Dynamic Network Conditions , 2005, USENIX Annual Technical Conference, General Track.

[52]  Walter Willinger,et al.  Towards capturing representative AS-level Internet topologies , 2002, SIGMETRICS '02.

[53]  Steven McCanne,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995, SIGCOMM '95.

[54]  R. Prim Shortest connection networks and some generalizations , 1957 .

[55]  David Mazières,et al.  On-the-fly verification of rateless erasure codes for efficient content distribution , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[56]  Stephen Deering,et al.  Multicast routing in a datagram internetwork , 1992 .

[57]  Alex C. Snoeren,et al.  Mesh-based content routing using XML , 2001, SOSP.

[58]  Dahlia Malkhi,et al.  The Julia Content Distribution Network , 2005, WORLDS.

[59]  Baochun Li,et al.  How Practical is Network Coding? , 2006, 200614th IEEE International Workshop on Quality of Service.

[60]  Mark Handley,et al.  Congestion control for high bandwidth-delay product networks , 2002, SIGCOMM.

[61]  Stefan Savage,et al.  The end-to-end effects of Internet path selection , 1999, SIGCOMM '99.

[62]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[63]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[64]  Kenneth P. Birman,et al.  Bimodal multicast , 1999, TOCS.

[65]  Ludmila Cherkasova,et al.  FastReplica: Efficient Large File Distribution Within Content Delivery Networks , 2003, USENIX Symposium on Internet Technologies and Systems.

[66]  Venkata N. Padmanabhan,et al.  Analyzing and Improving a BitTorrent Networks Performance Mechanisms , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[67]  Robert Metcalfe,et al.  Reverse path forwarding of broadcast packets , 1978, CACM.

[68]  Jibin Zhan,et al.  Early Experience with an Internet Broadcast System Based on Overlay Multicast , 2004, USENIX Annual Technical Conference, General Track.

[69]  Larry L. Peterson,et al.  Reliability and Security in the CoDeeN Content Distribution Network , 2004, USENIX Annual Technical Conference, General Track.

[70]  Michael Luby,et al.  LT codes , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[71]  Amin Vahdat,et al.  MACEDON: Methodology for Automatically Creating, Evaluating, and Designing Overlay Networks , 2004, NSDI.

[72]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[73]  Amin Vahdat,et al.  PlanetLab application management using plush , 2006, OPSR.

[74]  Min Sik Kim,et al.  Optimal distribution tree for Internet streaming media , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[75]  Kirk L. Johnson,et al.  Overcast: reliable multicasting with on overlay network , 2000, OSDI.

[76]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[77]  ZHANGLi-xia,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995 .

[78]  Mohammad R. Salavatipour,et al.  Packing Steiner trees , 2003, SODA '03.