Dynamic replication in a data grid using a Modified BHR Region Based Algorithm

Grid computing is emerging as a key part of the infrastructure for a wide range of disciplines in science and engineering, including astronomy, high energy physics, molecular biology and earth sciences. These applications handle large data sets that need to be transferred and replicated among different grid sites. A data grid deals with data intensive applications in scientific and enterprise computing. Data grid technology is developed to permit data sharing across many organizations in geographically disperse locations. Replication of data to different sites will help researchers around the world analyse and initiate future experiments. The general idea of replication is to store copies of data in different locations so that data can be easily recovered if a copy at one location is lost or unavailable. In a large-scale data grid, replication provides a suitable solution for managing data files, which enhances data reliability and availability. In this paper, a Modified BHR algorithm is proposed to overcome the limitations of the standard BHR algorithm. The algorithm is simulated using a data grid simulator, OptorSim, developed by European Data Grid projects. The performance of the proposed algorithm is improved by minimizing the data access time and avoiding unnecessary replication.

[1]  Sang Boem Lim,et al.  Combination of Replication and Scheduling in Data Grids , 2007 .

[2]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  Boleslaw K. Szymanski,et al.  Simulation of dynamic data replication strategies in Data Grids , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[5]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[6]  Ming Tang,et al.  The impact of data replication on job scheduling performance in the Data Grid , 2006, Future Gener. Comput. Syst..

[7]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[8]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[9]  R. Sepahvand,et al.  A Hierarchical Scheduling and Replication Strategy , 2008 .

[10]  Xiaoyan Hong,et al.  An on-line replication strategy to increase availability in Data Grids , 2008, Future Gener. Comput. Syst..

[11]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[12]  Kavitha Ranganathan,et al.  Design and Evaluation of Dynamic Replication Strategies for a High-Performance Data Grid , 2001 .

[13]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[14]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[15]  Carl Kesselman,et al.  High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[16]  Michal Szymaniak,et al.  Latency-driven replica placement , 2005, The 2005 Symposium on Applications and the Internet.

[17]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[18]  Yi-Fang Lin,et al.  Optimal placement of replicas in data grid environments with locality assurance , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[19]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[20]  Chan Huah Yong,et al.  Replica Management in Data Grid , 2008 .

[21]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..