Algorithms for High Performance, Wide-Area Distributed File Downloads

As peer-to-peer and wide-area storage systems become in vogue, the issue of delivering content that is cached, partitioned and replicated in the wide area, with high performance, becomes of great importance. This paper explores three algorithms for such downloads. The storage model is based on the Network Storage Stack, which allows for flexible sharing and utilization of writable storage as a network resource. The algorithms assume that data is replicated in various storage depots in the wide area, and the data must be delivered to the client either as a downloaded file or as a stream to be consumed by an application, such as a media player. The algorithms are threaded and adaptive, attempting to get good performance from nearby replicas, while still utilizing the faraway replicas. After defining the algorithms, we explore their performance downloading a 50 MB file replicated on six storage depots in the U.S., Europe and Asia, to two clients in different parts of the U.S. One algorithm, called progress-driven redundancy, exhibits excellent performance characteristics for both file and streaming downloads.

[1]  Terry Moore,et al.  An end-to-end approach to globally scalable network storage , 2002, SIGCOMM 2002.

[2]  John H. Hartman,et al.  The Swarm scalable storage system , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[3]  Brian Tierney,et al.  A TCP Tuning Daemon , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[5]  Michael Mitzenmacher,et al.  Accessing multiple mirror sites in parallel: using Tornado codes to speed up downloads , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[6]  Micah Beck,et al.  Exposed vs. Encapsulated Approaches to Grid Service Archtecture , 2001, GRID.

[7]  John S. Heidemann,et al.  Effects of ensemble-TCP , 2000, CCRV.

[8]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[9]  Joseph D. Touch Protocol Parallelization , 1995 .

[10]  Hai Jin,et al.  The Zebra Striped Network File System , 2002 .

[11]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[12]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[13]  Micah Beck,et al.  An end-to-end approach to globally scalable network storage , 2002, SIGCOMM '02.

[14]  Ian Clarke,et al.  Protecting Free Expression Online with Freenet , 2002, IEEE Internet Comput..

[15]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[16]  John S. Heidemann,et al.  Ongoing TCP Research Related to Satellites , 2000, RFC.

[17]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[18]  Sally Floyd,et al.  Congestion Control Principles , 2000, RFC.

[19]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[20]  S PlankJames,et al.  A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[21]  Ben Y. Zhao,et al.  Maintenance-Free Global Data Storage , 2001, IEEE Internet Comput..

[22]  Jerome H. Saltzer,et al.  Active Networking and End-To-End Arguments* , 1998 .

[23]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[24]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[25]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[26]  Luigi Rizzo,et al.  Effective erasure codes for reliable computer communication protocols , 1997, CCRV.

[27]  Daniel A. Spielman,et al.  Practical loss-resilient codes , 1997, STOC '97.

[28]  Alessandro Bassi,et al.  Managing Data Storage in the Network , 2001, IEEE Internet Comput..