An Anticipative Recursively Adjusting Mechanism for parallel file transfer in data grids

Data Grids enable the sharing, selection, and connection of a wide variety of geographically distributed computational and storage resources for content needed by large‐scale data‐intensive applications such as high‐energy physics, bioinformatics, and virtual astrophysical observatories. In Data Grids, co‐allocation architectures were developed to enable parallel downloads of data sets from selected replica servers. As Internet is usually the underlying network of a grid, network bandwidth plays as the main factor affecting file transfers between clients and servers. In this paradigm, there are still some challenges that need to be solved, such as to reduce differences in finish times between selected replica servers, to avoid traffic congestion resulting from transferring the same blocks in different links among servers and clients, and to manage network performance variations among parallel transfers. In this paper, we propose the Anticipative Recursively Adjusting Mechanism (ARAM) scheme to adjust the workloads on selected replica servers and handle unpredictable variations in network performance by those servers. Our algorithm is based on using the finish rates for previously assigned transfers to anticipate the bandwidth status for the next section to adjust workloads, and to reduce file transfer times in grid environments. Our approach is useful in grid environments with unstable network link. It not only reduces idle time wasted waiting for the slowest server, but also decreases file transfer completion times. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Chao-Tung Yang,et al.  Improvements on dynamic adjustment mechanism in co-allocation data grid environments , 2007, The Journal of Supercomputing.

[2]  Jennifer M. Schopf,et al.  Predicting sporadic grid data transfers , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[3]  Chien-Min Wang,et al.  Efficient multi-source data transfer in data grids , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[4]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[5]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[6]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[7]  Ching-Hsien Hsu,et al.  A Recursively-Adjusting Co-allocation scheme with a Cyber-Transformer in Data Grids , 2009, Future generations computer systems.

[8]  Kavitha Ranganathan,et al.  Computation scheduling and data replication algorithms for data Grids , 2004 .

[9]  Jennifer M. Schopf,et al.  A performance study of monitoring and information services for distributed systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[10]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[11]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[12]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[13]  Ian T. Foster,et al.  Resource co-allocation in computational grids , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[14]  Ching-Hsien Hsu,et al.  Performance Analysis of Applying Replica Selection Technology for Data Grid Environments , 2005, PaCT.

[15]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[16]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[17]  Chao-Tung Yang,et al.  Implementation of a Cyber Transformer for Parallel Download in Co-Allocation Data Grid Environments , 2008, 2008 Seventh International Conference on Grid and Cooperative Computing.

[18]  Yoshiaki Katayama,et al.  Dynamic Co-allocation Scheme for Parallel Data Transfer in Grid Environment , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[19]  Jennifer M. Schopf,et al.  Using Regression Techniques to Predict Large Data Transfers , 2003, Int. J. High Perform. Comput. Appl..

[20]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[21]  Ian T. Foster,et al.  Improving parallel data transfer times using predicted variances in shared networks , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[22]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[23]  Yoshiaki Katayama,et al.  A Framework for an Integrated Co-allocator for Data Grid in Multi-Sender Environment , 2007, IEICE Trans. Commun..

[24]  Sudharshan S. Vazhkudai Enabling the co-allocation of grid data transfers , 2003, Proceedings. First Latin American Web Congress.

[25]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[26]  Chao-Tung Yang,et al.  Implementation of a dynamic adjustment strategy for parallel file transfer in co-allocation data grids , 2009, The Journal of Supercomputing.

[27]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.