Dynamic replica placement and selection strategies in data grids - A comprehensive survey

Data replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth. Data replication enhances data availability and thereby increases the system reliability. There are two steps involved in data replication, namely, replica placement and replica selection. Replica placement involves identifying the best possible node to duplicate data based on network latency and user request. Replica selection involves selecting the best replica location to access the data for job execution in the data grid. Various replica placement and selection algorithms are available in the literature. These algorithms measure and analyze different parameters such as bandwidth consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper, various replica placement and selection strategies along with their merits and demerits are discussed. This paper also analyses the performance of various strategies with respect to the parameters mentioned above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the data grid environment.

[1]  Bhavani M. Thuraisingham,et al.  Secure Data Objects Replication in Data Grid , 2010, IEEE Transactions on Dependable and Secure Computing.

[2]  Chan Huah Yong,et al.  On Fairness, Optimizing Replica Selection in Data Grids , 2009, IEEE Transactions on Parallel and Distributed Systems.

[3]  Yaw-Ling Lin,et al.  Dynamic file replica location and selection strategy in data grids , 2008, 2008 First IEEE International Conference on Ubi-Media Computing.

[4]  Gholamhossein Dastghaibyfard,et al.  Combination of data replication and scheduling algorithm for improving data availability in Data Grids , 2013, J. Netw. Comput. Appl..

[5]  Satoshi Matsuoka,et al.  Grid Datafarm Architecture for Petascale Data Intensive Computing , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[6]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[7]  Xuejie Zhang,et al.  A Dynamic Replica Management Strategy Based on Data Grid , 2010, 2010 Ninth International Conference on Grid and Cloud Computing.

[8]  Yu Hu,et al.  GRESS - a Grid Replica Selection Service , 2003, ISCA PDCS.

[9]  R. Salleh,et al.  Imitating K-Means to Enhance Data Selection , 2009 .

[10]  Kurt Stockinger,et al.  Dynamic data replication in LCG 2008 , 2008 .

[11]  Ruay-Shiung Chang,et al.  Job scheduling and data replication on data grids , 2007, Future Gener. Comput. Syst..

[12]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[13]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[14]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[15]  Yuan Lin,et al.  Rigel: A Scalable and Lightweight Replica Selection Service for Replicated Distributed File System , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[16]  Boleslaw K. Szymanski,et al.  Simulation of dynamic data replication strategies in Data Grids , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[17]  Xun-yi Ren,et al.  Method for replica creation in data grids based on complex networks , 2010 .

[18]  David Abramson,et al.  The GriddLeS data replication service , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[19]  Mohd Abdul Hameed,et al.  Enhancing Data Selection Using Genetic Algorithm , 2010, 2010 International Conference on Computational Intelligence and Communication Networks.

[20]  Jizhou Sun,et al.  Ant Algorithm for File Replica Selection in Data Grid , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[21]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[22]  Carl Kesselman,et al.  High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[23]  Feng Yu,et al.  Dynamic Data Replication based on Local Optimization Principle in Data Grid , 2007, Sixth International Conference on Grid and Cooperative Computing (GCC 2007).

[24]  Atul Negi,et al.  Smart Replica Selection for Data Grids Using Rough Set Approximations (RSDG) , 2010, 2010 International Conference on Computational Intelligence and Communication Networks.

[25]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[26]  Xun-yi Ren,et al.  Using optorsim to efficiently simulate replica placement strategies , 2010 .

[27]  Yi-Fang Lin,et al.  Optimal replica placement in hierarchical Data Grids with locality assurance , 2008, J. Parallel Distributed Comput..

[28]  Jesús Carretero,et al.  Branch replication scheme: A new model for data replication in large scale data grids , 2010, Future Gener. Comput. Syst..

[29]  Atul Negi,et al.  Replica Selection in Data Grids Using Preconditioning of Decision Attributes by K-means Clustering (K-RSDG) , 2010, 2010 Second Vaagdevi International Conference on Information Technology for Real World Problems.

[30]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[31]  Yi Kui,et al.  A Global Dynamic Scheduling with Replica Selection Algorithm Using GridFTP , 2010, 2010 International Conference on Challenges in Environmental Science and Computer Engineering.

[32]  Thouraya Bouabana-Tebibel,et al.  A priori replica placement strategy in data grid , 2010, 2010 International Conference on Machine and Web Intelligence.

[33]  Muhammad Sher,et al.  A survey of dynamic replication strategies for improving data availability in data grids , 2012, Future Gener. Comput. Syst..

[34]  Jon B. Weissman,et al.  Dynamic replica management in the service grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[35]  Bostjan Slivnik,et al.  The complexity of static data replication in data grids , 2005, Parallel Comput..

[36]  Ruay-Shiung Chang,et al.  A dynamic weighted data replication strategy in data grids , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[37]  Atul Negi,et al.  Rough set clustering approach to replica selection in data grids (RSCDG) , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[38]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[39]  R. Sepahvand,et al.  A Hierarchical Scheduling and Replication Strategy , 2008 .

[40]  Reda Alhajj,et al.  A Predictive Technique for Replica Selection in Grid Environment , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[41]  Ming Tang,et al.  Dynamic replication algorithms for the multi-tier Data Grid , 2005, Future Gener. Comput. Syst..

[42]  Dimosthenis Kyriazis,et al.  Dynamic QoS-aware data replication in grid environments based on data "importance" , 2012, Future Gener. Comput. Syst..

[43]  Gholamhossein Dastghaibyfard,et al.  A dynamic replica management strategy in data grid , 2012, J. Netw. Comput. Appl..

[44]  Antony Selvadoss Thanamani,et al.  Dynamic replication in a data grid using a Modified BHR Region Based Algorithm , 2011, Future Gener. Comput. Syst..

[45]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[46]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[47]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..