RACAM: design and implementation of a recursively adjusting co-allocation method with efficient replica selection in Data Grids

Data Grids enable the sharing, selection, and connection of a wide variety of geographically distributed computational and storage resources for addressing large-scale data-intensive scientific application needs in, for instance, high-energy physics, bioinformatics, and virtual astrophysical observatories. Data sets are replicated in Data Grids and distributed among multiple sites. Unfortunately, data sets of interest sometimes are significantly large in size, and may cause access efficiency overhead. A co-allocation architecture was developed in order to enable parallel downloading of data sets from multiple servers. Several co-allocation strategies have been coupled and used to exploit download rate by specifying among various client–server divides files into multiple blocks of equal sizes to link and address dynamic rate fluctuations. However, one major obstacle, the idle time of faster servers having to wait for the slowest server to deliver the final block, makes it important to reduce differences in finishing time among replica servers. In this paper, we propose a dynamic co-allocation method, called Recursively Adjusting Co-Allocation Method (RACAM), to improve the performance of parallel data file transfer. Our approach reduces the idle time spent waiting for the slowest server and decreases data transfer completion time. We also provide an effective scheme for reducing the cost of reassembling data blocks. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Jennifer M. Schopf,et al.  Using Regression Techniques to Predict Large Data Transfers , 2003, Int. J. High Perform. Comput. Appl..

[2]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[3]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[4]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[5]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[6]  Catherine Rosenberg,et al.  Analysis of parallel downloading for large file distribution , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..

[7]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[8]  Pablo Rodriguez,et al.  Dynamic parallel access to replicated content in the internet , 2002, TNET.

[9]  Chao-Tung Yang,et al.  Improvements on dynamic adjustment mechanism in co-allocation data grid environments , 2007, The Journal of Supercomputing.

[10]  Ching-Hsien Hsu,et al.  A Recursively-Adjusting Co-allocation scheme with a Cyber-Transformer in Data Grids , 2009, Future generations computer systems.

[11]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[12]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[13]  Geoffrey C. Fox,et al.  Special Issue: ACM 2000 Java Grande Conference , 2001, Concurr. Comput. Pract. Exp..

[14]  Kam-Wing Ng,et al.  Analyzing Multiple File Downloading in BitTorrent , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[15]  Sudharshan S. Vazhkudai Enabling the co-allocation of grid data transfers , 2003, Proceedings. First Latin American Web Congress.

[16]  Ian T. Foster,et al.  Resource co-allocation in computational grids , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[17]  Jennifer M. Schopf,et al.  Predicting sporadic grid data transfers , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[18]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[19]  Chao-Tung Yang,et al.  Implementation of a dynamic adjustment mechanism with efficient replica selection in data grid environments , 2006, SAC '06.

[20]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[21]  Jennifer M. Schopf,et al.  A performance study of monitoring and information services for distributed systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[22]  Chao-Tung Yang,et al.  Implementation of a dynamic adjustment strategy for parallel file transfer in co-allocation data grids , 2009, The Journal of Supercomputing.

[23]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[24]  Junichi Funasaka,et al.  An analysis on adaptive parallel downloading method , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[25]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[26]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[27]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[28]  Steven Tuecke,et al.  Protocols and services for distributed data-intensive science , 2002 .