Data Sets Replicas Placements Strategy from Cost-Effective View in the Cloud

Replication technology is commonly used to improve data availability and reduce data access latency in the cloud storage system by providing users with different replicas of the same service. Most current approaches largely focus on system performance improvement, neglecting management cost in deciding replicas number and their store places, which cause great financial burden for cloud users because the cost for replicas storage and consistency maintenance may lead to high overhead with the number of new replicas increased in a pay-as-you-go paradigm. In this paper, towards achieving the approximate minimum data sets management cost benchmark in a practical manner, we propose a replicas placements strategy from cost-effective view with the premise that system performance meets requirements. Firstly, we design data sets management cost models, including storage cost and transfer cost. Secondly, we use the access frequency and the average response time to decide which data set should be replicated. Then, the method of calculating replicas’ number and their store places with minimum management cost is proposed based on location problem graph. Both the theoretical analysis and simulations have shown that the proposed strategy offers the benefits of lower management cost with fewer replicas.

[1]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[2]  Xiao Liu,et al.  A Local-Optimisation Based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[3]  Bhavani M. Thuraisingham,et al.  Secure Data Objects Replication in Data Grid , 2010, IEEE Transactions on Dependable and Secure Computing.

[4]  Konstantinos Kalpakis,et al.  Steiner-optimal data replication in tree networks with storage costs , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[5]  Li Rui-xuan Load Balancing in Peer-to-Peer Systems Using Dynamic Replication Policy , 2007 .

[6]  Joan Navarro,et al.  Classic Replication Techniques on the Cloud , 2012, 2012 Seventh International Conference on Availability, Reliability and Security.

[7]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[8]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[9]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[10]  Ruay-Shiung Chang,et al.  Job scheduling and data replication on data grids , 2007, Future Gener. Comput. Syst..

[11]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[12]  Mark S. Daskin,et al.  Capacitated facility location/network design problems , 2001, Eur. J. Oper. Res..

[13]  Xiao Liu,et al.  An Algorithm in SwinDeW-C for Scheduling Transaction-Intensive Cost-Constrained Cloud Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[14]  Xun-yi Ren,et al.  Using optorsim to efficiently simulate replica placement strategies , 2010 .

[15]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[16]  Hai Jin,et al.  Peer-to-Peer Based Grid Workflow Runtime Environment of SwinDeW-G , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[17]  Éva Tardos,et al.  Approximation algorithms for facility location problems (extended abstract) , 1997, STOC '97.

[18]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[19]  Yun Yang,et al.  SwinDeW-a p2p-based decentralized workflow management system , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[20]  Dimosthenis Kyriazis,et al.  Dynamic QoS-aware data replication in grid environments based on data "importance" , 2012, Future Gener. Comput. Syst..

[21]  Rajiv Gandhi,et al.  Improved Self Fused Check pointing Replication for Handling Multiple Faults in Cloud Computing , 2012 .

[22]  Muhammad Sher,et al.  A survey of dynamic replication strategies for improving data availability in data grids , 2012, Future Gener. Comput. Syst..

[23]  Antony Selvadoss Thanamani,et al.  Dynamic replication in a data grid using a Modified BHR Region Based Algorithm , 2011, Future Gener. Comput. Syst..

[24]  Reda Alhajj,et al.  A Predictive Technique for Replica Selection in Grid Environment , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[25]  Mark S. Daskin,et al.  AN INTEGRATED MODEL OF FACILITY LOCATION AND TRANSPORTATION NETWORK DESIGN , 2001 .

[26]  José Duato,et al.  A New Cost-Effective Technique for QoS Support in Clusters , 2007, IEEE Transactions on Parallel and Distributed Systems.

[27]  Xun-yi Ren,et al.  Method for replica creation in data grids based on complex networks , 2010 .