Scheduling data-intensive bags of tasks in P2P grids with bittorrent-enabled data distribution

Scheduling Data-Intensive Bags of Tasks in P2P Grids leads to transfers of large input data files, which cause delays in completion times. We propose to combine several existing technologies and patterns to perform efficient data-aware scheduling: (1) use of the BitTorrent P2P file sharing protocol to transfer data, (2) data caching on computational Resources, (3) use of a data-aware Resource selection scheduling algorithm similar to Storage Affinity, (4) a new Task selection scheduling algorithm (Temporal Tasks Grouping), based on the temporally grouped scheduling of Tasks sharing input data files. Data replication is also discusse. The proposed approach does not need an overlay network or Predictive Communications Ordering, making our operational implementation of a P2P Grid middleware easily deployable in unstructured P2P networks. Experiments show that performance gains are achieved by combining BitTorrent, caching, Storage Affinity and Temporal Tasks Grouping. This work can be summarized as combining P2P Grid computing and P2P data transfer technologies.

[1]  Nazareno Andrade,et al.  Labs of the World, Unite!!! , 2006, Journal of Grid Computing.

[2]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[3]  Eugene L. Lawler,et al.  The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization , 1985 .

[4]  Gilles Fedak,et al.  Scheduling independent tasks sharing large data distributed with BitTorrent , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[5]  Henri E. Bal,et al.  Simple locality-aware co-allocation in peer-to-peer supercomputing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[6]  Nazareno Andrade,et al.  OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.

[7]  Fabrício Alves Barbosa da Silva,et al.  A Scheduling Algorithm for Running Bag-of-Tasks Data Mining Applications on the Grid , 2004, Euro-Par.

[8]  Cyril Briquet,et al.  What is the Grid ? Tentative Definitions Beyond Resource Coordination , 2006 .

[9]  Arnaud Legout Understanding BitTorrent: An Experimental Perspective , 2005 .

[10]  Baohua Wei Collaborative Data Distribution with BitTorrent for Computational Desktop Grids , 2005, The 4th International Symposium on Parallel and Distributed Computing (ISPDC'05).

[11]  Cyril Briquet,et al.  Description of a Lightweight Bartering Grid Architecture , 2006 .

[12]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[13]  Domenico Talia,et al.  A Super-Peer Model for Multiple Job Submission on a Grid , 2006, Euro-Par Workshops.

[14]  Guillaume Urvoy-Keller,et al.  Understanding BitTorrent: An Experimental , 2005 .

[15]  Antoine Vernois,et al.  Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid , 2005, Journal of Grid Computing.

[16]  Justus H. Piater,et al.  Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks , 2006, ECML.

[17]  M. Ripeanu,et al.  A Simulation Study of Data Distribution Strategies for Large-Scale Scientific Data Collaborations , 2007, 2007 Canadian Conference on Electrical and Computer Engineering.

[18]  Cosimo Anglano,et al.  The File Mover: an efficient data transfer system for Grid applications , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[19]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[20]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[21]  Cyril Briquet,et al.  Learning Reliability Models of Grid Resource Supplying , 2005 .

[22]  Francisco Vilar Brasileiro,et al.  Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids , 2004, JSSPP.