Co-allocation in Data Grids: A Global, Multi-user Perspective

Several recent studies suggest that co-allocation techniques can improve user performance for distributed data retrieval in replicated grid systems. These studies demonstrate that co-allocation techniques can improve network bandwidth and network transfer times by concurrently utilizing as many data grid replicas as possible. However, these prior studies evaluate their techniques from a single user's perspective and overlook evaluations of system wide performance when multiple users are using co-allocation techniques. In our study, we provide multi-user evaluations of a co-allocation technique for replicated data in a controlled grid environment. We find that co-allocation works well under low-load conditions when there are only a few users using co-allocation. However, co-allocation works very poorly for medium and high-load conditions since the response time for co-allocating users grows rapidly as the number of grid users increases. The decreased performance for co-allocating users can be directly attributed to the increased workload that their greedy retrieval technique places on the replicas in the grid. Overall, we determine that uninformed, blind utilization of greedy co-allocation techniques by multiple users is detrimental to global system performance.

[1]  Prashant J. Shenoy,et al.  Rules of thumb in data engineering , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Michael Di Stefano Distributed Data Management for Grid Computing , 2005 .

[3]  Ian T. Foster,et al.  Globus GridFTP: what's new in 2007 , 2007, GridNets '07.

[4]  Rajkumar Buyya,et al.  On incorporating differentiated levels of network service into GridSim , 2007, Future Gener. Comput. Syst..

[5]  Jennifer M. Schopf,et al.  Using Disk Throughput Data in Predictions of End-to-End Grid Data Transfers , 2002, GRID.

[6]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[7]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[8]  Sudharshan S. Vazhkudai Distributed Downloads of Bulk, Replicated Grid Data , 2004, Journal of Grid Computing.

[9]  Michael Di Stefano Distributed Data Management for Grid Computing: Di Stefano/Distributed Data Management for Grid Computing , 2005 .

[10]  Jacek Kitowski,et al.  Implementation of Replication Methods in the Grid Environment , 2005, EGC.

[11]  Heon Young Yeom,et al.  ReCon: A Fast and Reliable Replica Retrieval Service for the Data Grid , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[12]  Chao-Tung Yang,et al.  Improvements on dynamic adjustment mechanism in co-allocation data grid environments , 2007, The Journal of Supercomputing.

[13]  Marc Farley Storage networking fundamentals : an introduction to storage devices, subsystems, applications, management, and filing systems , 2005 .

[14]  Kurt Stockinger,et al.  Dynamic data replication in LCG 2008 , 2008 .

[15]  Daniel Minoli A Networking Approach to Grid Computing , 2004 .

[16]  E. Deelman,et al.  Data replication strategies in grid environments , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[17]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[18]  Sudharshan S. Vazhkudai Enabling the co-allocation of grid data transfers , 2003, Proceedings. First Latin American Web Congress.

[19]  Chao-Tung Yang,et al.  Redundant Parallel File Transfer with Anticipative Adjustment Mechanism in Data Grids , 2007 .

[20]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[21]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[22]  Marc Farley Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and File Systems (Cisco Press Fundamentals) , 2004 .

[23]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[24]  J. Stanley Warford Computer Systems , 1998 .

[25]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[26]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[27]  Jun Feng,et al.  Eliminating replica selection - using multiple replicas to accelerate data transfer on grids , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[28]  Ian Foster,et al.  Globus GridFTP: What's New in 2007 (Invited Paper) , 2007 .

[29]  Chao-Tung Yang,et al.  Implementation of a dynamic adjustment mechanism with efficient replica selection in data grid environments , 2006, SAC '06.