Optimal Replica Placement in Data Grid Environments with Locality Assurance

Data replication is typically used to improve access performance and data availability in Data Grid systems. To date, research on data replication in Grid systems has focused on infrastructures for replication and mechanisms for creating/deleting replicas. The important problem of choosing suitable locations to place replicas in Data Grids has not been well studied. In this paper, we address three issues concerning data replica placement in Data Grids. The first is how to ensure load balance among replicas. To achieve this, we propose a placement algorithm that finds the optimal locations for replicas so that their workload is balanced. The second issue is how to minimize the number of replicas. To solve this problem, we propose an algorithm that determines the minimum number of replicas required when the maximum workload capacity of each replica server is known. Finally, we address the issue of service quality by proposing a new model in which each request must be given a quality-of-service guarantee. We describe new algorithms that ensure both workload balance and quality of service simultaneously.

[1]  Ouri Wolfson,et al.  The multicast policy and its relationship to replicated data placement , 1991, TODS.

[2]  Nian-Feng Tzeng,et al.  Resource Allocation in Cube Network Systems Based on the Covering Radius , 1996, IEEE Trans. Parallel Distributed Syst..

[3]  Myung M. Bae,et al.  Resource placement in torus-based networks , 1996, Proceedings of International Conference on Parallel Processing.

[4]  Placement Algorithms for Hierarchical Cooperative Caching , 1999, J. Algorithms.

[5]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[6]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[7]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[8]  Konstantinos Kalpakis,et al.  Optimal Placement of Replicas in Trees with Read, Write, and Storage Costs , 2001, IEEE Trans. Parallel Distributed Syst..

[9]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[10]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[11]  Kurt Stockinger,et al.  Simulation of Dynamic Grid Replication Strategies in OptorSim , 2002, GRID.

[12]  Shay Kutten,et al.  Optimal allocation of electronic content , 2002, Comput. Networks.

[13]  E. Deelman,et al.  Data replication strategies in grid environments , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[14]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[15]  Deying Li,et al.  Placement of Web-Server Proxies with Consideration of Read and Update Operations on the Internet , 2003, Comput. J..

[16]  Jemal H. Abawajy,et al.  An efficient replicated data access approach for large-scale distributed systems , 2004, CCGRID.

[17]  Israel Cidon,et al.  Optimal Content Location in Multicast Based Overlay Networks with Content Updates , 2004, World Wide Web.

[18]  Jemal H. Abawajy,et al.  Placement of File Replicas in Data Grid Environments , 2004, International Conference on Computational Science.

[19]  Jianliang Xu,et al.  QoS-aware replica placement for content distribution , 2005, IEEE Transactions on Parallel and Distributed Systems.

[20]  Carl Kesselman,et al.  Wide area data replication for scientific collaborations , 2005, Int. J. High Perform. Comput. Netw..