A Novel Replication Strategy in Data Grid Environment with a Dynamic Threshold

Data Grid is a type of Grid Computing systems whichis designed to provide geographically distributed data resources to large computational problems that require mining and evaluating large amounts of data. Managing this data in a centralized location increases the data access time and hence much time is taken to execute the job. So to reduce the data access time, "Replication" is used. Data replication is known as an important optimization technique that aims to improve data access time and toutilize network and storage resources efficiently.Since the data files are very large and the Grid storagesare limited, managing replicas in storage for the purpose of more effective utilization requiresmore attention.In this paper, a novel data replication strategy, called Dynamic Hierarchical Replicationwith Threshold (DHRT) is proposed. This strategy is an enhanced version of the Dynamic Hierarchical Replication (DHR)strategy that uses a new threshold for characterizing the number of appropriate sites for replication. Appropriate sites have the higher number of access for that particular replica from other sites. It also minimizes access latency by selectingthe best replica when various sites hold replicas. The proposed replica selection strategy selects thebest replica location for the users' running jobs by considering the replica requests that are waite in thestorage and number of stored file. The simulated results with OptorSim, i.e. European Data Grid simulatorshow that the DHRT strategy gives better performance compared to the other algorithms and preventsthe unnecessary creation of replicas which leads to efficient storage usage.

[1]  Gholamhossein Dastghaibyfard,et al.  A dynamic replica management strategy in data grid , 2012, J. Netw. Comput. Appl..

[2]  Miron Livny,et al.  Data placement for scientific applications in distributed environments , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[3]  Veronika Rehn-Sonigo Optimal Replica Placement in Tree Networks with QoS and Bandwidth Constraints and the Closest Allocation Policy , 2007, ArXiv.

[4]  Peng Li,et al.  Replica placement algorithms for mobile transaction systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[6]  Srikumar Venugopal,et al.  A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[7]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[8]  Sang Boem Lim,et al.  Combination of Replication and Scheduling in Data Grids , 2007 .

[9]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[10]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[11]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[12]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[13]  Hanene Chettaoui,et al.  Dynamic Period vs Static Period in Data Grid Replication , 2010, 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[14]  Ouri Wolfson,et al.  The multicast policy and its relationship to replicated data placement , 1991, TODS.

[15]  R. Sepahvand,et al.  A Hierarchical Scheduling and Replication Strategy , 2008 .

[16]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[17]  Ruay-Shiung Chang,et al.  A dynamic data replication strategy using access-weights in data grids , 2008, The Journal of Supercomputing.

[18]  Won-Sik Yoon,et al.  Dynamic Data Grid Replication Strategy Based on Internet Hierarchy , 2003, GCC.

[19]  Maozhen Li,et al.  The grid - core technologies , 2005 .

[20]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[21]  Ming Tang,et al.  Dynamic replication algorithms for the multi-tier Data Grid , 2005, Future Gener. Comput. Syst..

[22]  K. V. Madhu Murthy,et al.  Agent Based Replica Placement in a Data Grid Environement , 2009, 2009 First International Conference on Computational Intelligence, Communication Systems and Networks.

[23]  Tutut Herawan,et al.  Replication Techniques in Data Grid Environments , 2012, ACIIDS.

[24]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[25]  Peter C. J. Graham,et al.  Adaptive popularity-driven replica placement in hierarchical data grids , 2010, The Journal of Supercomputing.

[26]  Najme Mansouri,et al.  An Effective Weighted Data Replication Strategy for Data Grid , 2012 .

[27]  Rajkumar Buyya,et al.  Data Replication Strategies in Wide-Area Distributed Systems , 2007 .

[28]  BuyyaRajkumar,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2006 .

[29]  Shubhashis Sengupta,et al.  Scalable and Distributed Mechanisms for Integrated Scheduling and Replication in Data Grids , 2008, ICDCN.

[30]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[31]  Nian-Feng Tzeng,et al.  Resource Allocation in Cube Network Systems Based on the Covering Radius , 1996, IEEE Trans. Parallel Distributed Syst..