A Decentralized Periodic Replication Strategy Based on Knapsack Problem

Data grids provide services and infrastructures for data-intensive applications that need to access to huge amount of data stored at distributed locations around the world. The size of these data can reach hundreds of petabytes scale in many applications. Ensuring an efficient and fast access to such massive data is a challenge that must be addressed. Replication is a key technique used in data grids to improve data access efficiency. Replication also provides high availability, decreased bandwidth consumption, improved fault tolerance and enhanced scalability. In this paper, we propose a new decentralized replication strategy for dynamic data grids, called DPRSKP which stands for Decentralized Periodic Replication Strategy based on Knapsack Problem. Our goal is to select the best candidate files for replication and to place them in the best locations assuming limited storage for replicas. The problem isformulated according to the knapsack problem. Our proposed strategy includes LRU and LFU strategies. The obtained experiment results, using OptorSim, show that our strategy outperforms other replication strategies in terms of response time and bandwidth consumption.

[1]  Manpreet Singh,et al.  DR 2 : A Two-Stage Dynamic Replication Strategy for Data Grid , 2009 .

[2]  Johan Montagnat,et al.  Using grid technologies to face medical image analysis challenges , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[3]  Tetsuro Kondo,et al.  VLBI@home -- VLBI Correlator by GRID Computing System , 2004 .

[4]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[5]  Antony Selvadoss Thanamani,et al.  Dynamic Replica Management for Data Grid , 2010 .

[6]  Jianzhong Li,et al.  Fair-Share Replication in Data Grid , 2008 .

[7]  Soonwook Hwang,et al.  Improving Job Scheduling Performance with Dynamic Replication Strategy in Data Grids , 2007, PaCT.

[8]  Ashton Shortridge,et al.  Grid Computing for Real Time Distributed Collaborative Geoprocessing , 2002 .

[9]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[10]  Ruay-Shiung Chang,et al.  A dynamic data replication strategy using access-weights in data grids , 2008, The Journal of Supercomputing.

[11]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[12]  Ruay-Shiung Chang,et al.  A dynamic weighted data replication strategy in data grids , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[13]  Reda Alhajj,et al.  Replica Placement Strategies in Data Grid , 2008, Journal of Grid Computing.

[14]  Kavitha Ranganathan,et al.  Design and Evaluation of Dynamic Replication Strategies for a High-Performance Data Grid , 2001 .

[15]  Reda Alhajj,et al.  Replica placement in data grid: considering utility and risk , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[16]  Maozhen Li,et al.  The grid - core technologies , 2005 .

[17]  Bu-Sung Lee,et al.  A model to predict the optimal performance of the Hierarchical Data Grid , 2010, Future Gener. Comput. Syst..

[18]  Boleslaw K. Szymanski,et al.  Simulation of dynamic data replication strategies in Data Grids , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[19]  Boleslaw K. Szymanski,et al.  Decentralized data management framework for Data Grids , 2007, Future Gener. Comput. Syst..

[20]  Soonwook Hwang,et al.  Improvement of Data Grid's Performance by Combining Job Scheduling with Dynamic Replication Strategy , 2007, Sixth International Conference on Grid and Cooperative Computing (GCC 2007).

[21]  Reda Alhajj,et al.  Replica placement design with static optimality and dynamic maintainability , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[22]  Hanene Chettaoui,et al.  An efficient replica placement strategy in highly dynamic data grids , 2011, Int. J. Grid Util. Comput..

[23]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[24]  Hanene Chettaoui,et al.  Dynamic Period vs Static Period in Data Grid Replication , 2010, 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[25]  Carl Kesselman,et al.  Real-time Analysis, Visualization, and Steering of Microtomography Experiments at Photon Source , 2000, PP.

[26]  Floriano Zini,et al.  Analysis of Scheduling and Replica Optimisation Strategies for Data Grids Using OptorSim , 2004, Journal of Grid Computing.

[27]  Muhammad Sher,et al.  A survey of dynamic replication strategies for improving data availability in data grids , 2012, Future Gener. Comput. Syst..

[28]  Gregor von Laszewski,et al.  Real-time Analysis, Visualization, and Steering of Microtomography Experiments at Photon Source , 1999, PPSC.

[29]  Kurt Stockinger,et al.  Simulation of Dynamic Grid Replication Strategies in OptorSim , 2002, GRID.

[30]  Sang Boem Lim,et al.  Combination of Replication and Scheduling in Data Grids , 2007 .

[31]  Hanene Chettaoui,et al.  An Efficient Replication Strategy for Dynamic Data Grids , 2010, 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[32]  Chan Huah Yong,et al.  Replica Management in Data Grid , 2008 .

[33]  Javier Jaén Martínez,et al.  Models for replica synchronisation and consistency in a data grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[34]  Heinz Stockinger,et al.  Defining the grid: a snapshot on the current view , 2007, The Journal of Supercomputing.