RACAM: design and implementation of a recursively adjusting co‐allocation method with efficient replica selection in Data Grids

Data Grids enable the sharing, selection, and connection of a wide variety of geographically distributed computational and storage resources for addressing large‐scale data‐intensive scientific application needs in, for instance, high‐energy physics, bioinformatics, and virtual astrophysical observatories. Data sets are replicated in Data Grids and distributed among multiple sites. Unfortunately, data sets of interest sometimes are significantly large in size, and may cause access efficiency overhead. A co‐allocation architecture was developed in order to enable parallel downloading of data sets from multiple servers. Several co‐allocation strategies have been coupled and used to exploit download rate by specifying among various client–server divides files into multiple blocks of equal sizes to link and address dynamic rate fluctuations. However, one major obstacle, the idle time of faster servers having to wait for the slowest server to deliver the final block, makes it important to reduce differences in finishing time among replica servers. In this paper, we propose a dynamic co‐allocation method, called Recursively Adjusting Co‐Allocation Method (RACAM), to improve the performance of parallel data file transfer. Our approach reduces the idle time spent waiting for the slowest server and decreases data transfer completion time. We also provide an effective scheme for reducing the cost of reassembling data blocks. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[2]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[3]  Jennifer M. Schopf,et al.  Predicting sporadic grid data transfers , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[4]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[5]  Jonathan Armstrong,et al.  Introduction to grid computing with globus , 2003 .

[6]  Ian T. Foster,et al.  Resource co-allocation in computational grids , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[7]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[8]  Jennifer M. Schopf,et al.  A performance study of monitoring and information services for distributed systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[9]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[10]  Sudharshan S. Vazhkudai Enabling the co-allocation of grid data transfers , 2003, Proceedings. First Latin American Web Congress.

[11]  Chao-Tung Yang,et al.  Improvements on dynamic adjustment mechanism in co-allocation data grid environments , 2007, The Journal of Supercomputing.

[12]  Ching-Hsien Hsu,et al.  Performance Analysis of Applying Replica Selection Technology for Data Grid Environments , 2005, PaCT.

[13]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[14]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[15]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[16]  Kam-Wing Ng,et al.  Analyzing Multiple File Downloading in BitTorrent , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[17]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[18]  Junichi Funasaka,et al.  An analysis on adaptive parallel downloading method , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[19]  Ching-Hsien Hsu,et al.  A Recursively-Adjusting Co-allocation scheme with a Cyber-Transformer in Data Grids , 2009, Future generations computer systems.

[20]  Jennifer M. Schopf,et al.  Using Regression Techniques to Predict Large Data Transfers , 2003, Int. J. High Perform. Comput. Appl..

[21]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[22]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[23]  Chao-Tung Yang,et al.  Implementation of a dynamic adjustment strategy for parallel file transfer in co-allocation data grids , 2009, The Journal of Supercomputing.

[24]  Pablo Rodriguez,et al.  Dynamic parallel access to replicated content in the internet , 2002, TNET.

[25]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[26]  Steven Tuecke,et al.  Protocols and services for distributed data-intensive science , 2002 .

[27]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[28]  Catherine Rosenberg,et al.  Analysis of parallel downloading for large file distribution , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..