Replica placement in data grid: considering utility and risk

Grid computing emerges from the need to integrate a collection of distributed computing resources to offer performance unattainable by any single machine. Grid technology facilitates data sharing across many organizations in different geographical locations. Data replication is an excellent technique to move and cache data close to users. Replication reduces access latency and bandwidth consumption. It also facilitates load balancing and improves reliability by creating multiple data copies. However, grid environments introduce significant new challenges such as dynamic resource availability and network performance changes. As users requests vary constantly, the system needs a dynamic replication strategy that adapts to users' dynamic behavior. To address such issues, this paper presents and evaluates the performance of six dynamic replication strategies for two different kinds of access patterns. Our replication strategies are mainly based on utility and risk. Before placing a replica at a site, we calculate an expected utility and risk index for each site by considering current network load and user requests. A replication site is then chosen by optimizing expected utility or risk indexes.

[1]  T. Howes,et al.  A Scalable, Deployable Directory Service Framework for the Internet , 1995 .

[2]  Heinz Stockinger Distributed Database Management Systems and the Data Grid , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[3]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[4]  Steven Tuecke,et al.  Protocols and services for distributed data-intensive science , 2002 .

[5]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[6]  Yu Hu,et al.  GRESS - a Grid Replica Selection Service , 2003, ISCA PDCS.

[7]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[8]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[9]  David Abramson,et al.  Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[10]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[11]  W.Alcock,et al.  Globus Toolkit Support for Distributed Data—Intensive Science , 2001 .

[12]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[13]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Min Cai,et al.  A Peer-to-Peer Replica Location Service Based on a Distributed Hash Table , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[15]  Hong Va Leong,et al.  On adaptive caching in mobile databases , 1997, SAC '97.