A File Group Data Replication Algorithm for Data Grids

In recent years data grids have been deployed and grown in many scientific experiments and data centers. The deployment of such environments has allowed grid users to gain access to a large number of distributed data. Data replication is a key issue in a data grid and should be applied intelligently because it reduces data access time and bandwidth consumption for each grid site. Therefore this area will be very challenging as well as providing much scope for improvement. In this paper, we introduce a new dynamic data replication algorithm named Popular File Group Replication, PFGR which is based on three assumptions: first, users in a grid site (Virtual Organization) have similar interests in files and second, they have the temporal locality of file accesses and third, all files are read-only. Based on file access history and first assumption, PFGR builds a connectivity graph for a group of dependent files in each grid site and replicates the most popular group files to the requester grid site. After that, when a user of that grid site needs some files, they are available locally. The simulation results show that our algorithm increases performance by minimizing the mean job execution time and bandwidth consumption and avoids unnecessary replication.

[1]  Fang-Yie Leu,et al.  PFRF: An adaptive data replication algorithm based on star-topology data grids , 2012, Future Gener. Comput. Syst..

[2]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[3]  Wei Zhou,et al.  HDCache: A Distributed Cache System for Real-Time Cloud Services , 2016, Journal of Grid Computing.

[4]  Amir Masoud Rahmani,et al.  PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids , 2012, Future Gener. Comput. Syst..

[5]  Tao Xie,et al.  FIRE: A File Reunion Based Data Replication Strategy for Data Grids , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[6]  Xiong Li,et al.  A replication strategy based on swarm intelligence in spatial data grid , 2010, 2010 18th International Conference on Geoinformatics.

[7]  Koen Holtman,et al.  CMS Data Grid System Overview and Requirements , 2001 .

[8]  Ayaz Isazadeh,et al.  PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid , 2011, Future Gener. Comput. Syst..

[9]  Kenli Li,et al.  A Reliability-aware Task Scheduling Algorithm Based on Replication on Heterogeneous Computing Systems , 2017, Journal of Grid Computing.

[10]  Gholamhossein Dastghaibyfard,et al.  Combination of data replication and scheduling algorithm for improving data availability in Data Grids , 2013, J. Netw. Comput. Appl..

[11]  Yuping Zhang,et al.  A Dynamic Optimal Replication Strategy in Data Grid Environment , 2010, 2010 International Conference on Internet Technology and Applications.

[12]  James Griffioen Randy Appleton Performance Measurements of Automatic Prefetching , 1995 .

[13]  Osvaldo Gervasi,et al.  User Interaction and Data Management for Large Scale Grid Applications , 2014, Journal of Grid Computing.

[14]  Ruay-Shiung Chang,et al.  A dynamic weighted data replication strategy in data grids , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[15]  Bin Tang,et al.  Data Replication in Data Intensive Scientific Applications with Performance Guarantee , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Amir Masoud Rahmani,et al.  A new Replica Placement Algorithm in Data Grid , 2012 .

[17]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[18]  Kurt Stockinger,et al.  Simulation of Dynamic Grid Replication Strategies in OptorSim , 2002, GRID.

[19]  Antony Selvadoss Thanamani,et al.  Dynamic replication in a data grid using a Modified BHR Region Based Algorithm , 2011, Future Gener. Comput. Syst..

[20]  Kavitha Ranganathan,et al.  Design and Evaluation of Dynamic Replication Strategies for a High-Performance Data Grid , 2001 .

[21]  Dimosthenis Kyriazis,et al.  Dynamic QoS-aware data replication in grid environments based on data "importance" , 2012, Future Gener. Comput. Syst..

[22]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[23]  Paul Millar,et al.  OptorSim : a Simulation Tool for Scheduling and Replica Optimisation in Data Grids , 2005 .

[24]  Albert Y. Zomaya,et al.  A Bee Colony based optimization approach for simultaneous job scheduling and data replication in grid environments , 2013, Comput. Oper. Res..