Scheduling independent tasks sharing large data distributed with BitTorrent

Data-centric applications are still a challenging issue for large scale distributed computing systems. The emergence of new protocols and software for collaborative content distribution over Internet offers a new opportunity for efficient and fast delivery of high volume of data. In a previous paper, we have investigated BitTorrent as a protocol for data diffusion in the context of computational desktop grid. We showed that BitTorrent is efficient for large file transfers, scalable when the number of nodes increases but suffers from a high overhead when transmitting small files. This paper investigates two approach to overcome these limitations. First, we propose a performance model to select the best of FTP and BitTorrent protocols according to the size of the file to distribute and the number of receiver nodes. Next we propose enhancement of the BitTorrent protocol which provides more predictable communication patterns. We design a model for communication performance and evaluate BitTorrent-aware versions BT-MinMin, BT-MaxMin and BT-Sufferage scheduling heuristics against a synthetic parameter-sweep application.

[1]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[2]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[3]  Thomas Hérault,et al.  Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid , 2005, Future Gener. Comput. Syst..

[4]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[5]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[6]  Larry Carter,et al.  Bandwidth-centric allocation of independent tasks on heterogeneous platforms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[7]  Johan Pouwelse,et al.  A Measurement Study of the BitTorrent Peer-to-Peer File-Sharing System , 2004 .

[8]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[9]  Venkata N. Padmanabhan,et al.  Understanding and Deconstructing BitTorrent Performance , 2005 .

[10]  Larry Carter,et al.  Autonomous protocols for bandwidth-centric scheduling of independent-task applications , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  Andrew A. Chien,et al.  Henri Casanova , 2022 .

[12]  B. Levine,et al.  Exploring the Use of BitTorrent as the Basis for a Large Trace Repository , 2004 .

[13]  Baohua Wei Collaborative Data Distribution with BitTorrent for Computational Desktop Grids , 2005, The 4th International Symposium on Parallel and Distributed Computing (ISPDC'05).

[14]  David P. Anderson,et al.  A new major SETI project based on Project Serendip data and 100 , 1997 .

[15]  Rob Sherwood,et al.  Slurpie: a cooperative bulk data transfer protocol , 2004, IEEE INFOCOM 2004.

[16]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[17]  Rayadurgam Srikant,et al.  Modeling and performance analysis of BitTorrent-like peer-to-peer networks , 2004, SIGCOMM 2004.

[18]  Mikel Izal,et al.  Dissecting BitTorrent: Five Months in a Torrent's Lifetime , 2004, PAM.

[19]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[20]  R. F. Freund,et al.  Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[21]  Nazareno Andrade,et al.  OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.

[22]  Francisco Vilar Brasileiro,et al.  Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids , 2004, JSSPP.

[23]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[24]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[25]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[26]  F. Berman,et al.  Adaptive Performance Prediction for Distributed Data-Intensive Applications , 1999, ACM/IEEE SC 1999 Conference (SC'99).