A comprehensive review of the data replication techniques in the cloud environments: Major trends and future directions

Nowadays, in various scientific domains, large data sets are becoming an important part of shared resources. Such huge mass of data is usually stored in cloud data centers. Therefore, data replication which is generally used to manage large volumes of data in a distributed manner speeds up data access, reduces access latency and increases data availability. However, despite the importance of the data replication techniques and mechanisms in cloud environments, there has not been a comprehensive study about reviewing and analyzing its important techniques systematically. Therefore, in this paper, the comprehensive and detailed study and survey of the state of art techniques and mechanisms in this field are provided. Also, we discuss the data replication mechanisms in the cloud systems and categorize them into two main groups including static and dynamic mechanisms. Static mechanisms of data replication determine the location of replication nodes during the design phase while dynamic ones select replication nodes at the run time. Furthermore, the taxonomy and comparison of the reviewed mechanisms are presented and their main features are highlighted. Finally, the related open issues and some hints to solve the challenges are mapped out. The review indicates that some dynamic approaches allow their associated replication strategies to be adjusted at run time according to changes in user behavior and network topology. Also, they are applicable for a service-oriented environment where the number and location of the users who intend to access data often have to be determined in a highly dynamic fashion. Discussing the most important data replication mechanisms in cloud environments.Categorizing the data replication mechanisms into static and dynamic mechanisms.Presenting the comparison of the reviewed mechanisms and highlighting their features.Mapping out the related open issues and some hints to solve the challenges.

[1]  Yanchun Zhang,et al.  Distributed data possession checking for securing multiple replicas in geographically-dispersed clouds , 2012, J. Comput. Syst. Sci..

[2]  J. Morris Chang,et al.  QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing Systems , 2013, IEEE Transactions on Cloud Computing.

[3]  Ian Sommerville,et al.  Cloud Migration: A Case Study of Migrating an Enterprise IT System to IaaS , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[4]  Atakan Dogan,et al.  A study on performance of dynamic file replication algorithms for real-time file access in Data Grids , 2009, Future Gener. Comput. Syst..

[5]  Sanjay Chaudhary,et al.  Policy based resource allocation in IaaS cloud , 2012, Future Gener. Comput. Syst..

[6]  Alexandru Iosup,et al.  IaaS cloud benchmarking: approaches, challenges, and experience , 2013, HotTopiCS '13.

[7]  Albert Y. Zomaya,et al.  Energy-efficient data replication in cloud computing datacenters , 2013, GLOBECOM Workshops.

[8]  Hai Jin,et al.  RTRM: A Response Time-Based Replica Management Strategy for Cloud Storage System , 2013, GPC.

[9]  Dan Feng,et al.  CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster , 2010, 2010 IEEE International Conference on Cluster Computing.

[10]  Ahmad Habibizad Navin,et al.  Job scheduling in the Expert Cloud based on genetic algorithms , 2014, Kybernetes.

[11]  Erol Gelenbe,et al.  Energy-Efficient Cloud Computing , 2010, Comput. J..

[12]  Singh Ghuman,et al.  Cloud Computing-A Study of Infrastructure as a Service , 2015 .

[13]  Wei Chen,et al.  MORM: A Multi-objective Optimized Replication Management strategy for cloud storage cluster , 2014, J. Syst. Archit..

[14]  Boleslaw K. Szymanski,et al.  Decentralized data management framework for data grids , 2007 .

[15]  Bharadwaj Veeravalli,et al.  Optimal metadata replications and request balancing strategy on cloud data centers , 2014, J. Parallel Distributed Comput..

[16]  E. Rodney Canfield,et al.  Replication in Overlay Networks: A Multi-objective Optimization Approach , 2008, CollaborateCom.

[17]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[18]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[19]  Nima Jafari Navimipour,et al.  Priority-based task scheduling on heterogeneous resources in the Expert Cloud , 2015, Kybernetes.

[20]  Rajkumar Buyya,et al.  Special section: Federated resource management in grid and cloud computing systems , 2010, Future Gener. Comput. Syst..

[21]  Baochun Li,et al.  Revenue maximization with dynamic auctions in IaaS cloud markets , 2013, 2013 IEEE/ACM 21st International Symposium on Quality of Service (IWQoS).

[22]  Reda Alhajj,et al.  Replica placement design with static optimality and dynamic maintainability , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[23]  Ahmad Habibizad Navin,et al.  Expert Cloud: A Cloud-based framework to share the knowledge and skills of human resources , 2015, Comput. Hum. Behav..

[24]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[25]  Christopher E. Dabrowski,et al.  Reliability in grid computing systems , 2009, Concurr. Comput. Pract. Exp..

[26]  Nima Jafari Navimipour,et al.  Task Scheduling in Cloud Computing Based on The Cuckoo Search Algorithm , 2015, Iraqi Journal of Computer, Communication, Control and System Engineering.

[27]  Clement E. Onime,et al.  A User Identity Management Protocol for Cloud Computing Paradigm , 2011, Int. J. Commun. Netw. Syst. Sci..

[28]  Ghalem Belalem,et al.  Managing Data Replication and Placement Based on Availability , 2013 .

[29]  Mohammad-Reza Khayyambashi,et al.  A Comparative Study of Replication Techniques in Grid Computing Systems , 2013, ArXiv.

[30]  Naixue Xiong,et al.  RFH: A Resilient, Fault-Tolerant and High-Efficient Replication Algorithm for Distributed Cloud Storage , 2012, 2012 41st International Conference on Parallel Processing.

[31]  Yao Sun,et al.  A file assignment strategy independent of workload characteristic assumptions , 2009, TOS.

[32]  Rajkumar Buyya,et al.  Data Replication Strategies in Wide-Area Distributed Systems , 2007 .

[33]  Nima Jafari Navimipour,et al.  A comprehensive study of the resource discovery techniques in Peer-to-Peer networks , 2015, Peer-to-Peer Netw. Appl..

[34]  Mohamed Mohamed,et al.  PaaS-Independent Provisioning and Management of Applications in the Cloud , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[35]  Nima Jafari Navimipour,et al.  A formal approach for the specification and verification of a Trustworthy Human Resource Discovery mechanism in the Expert Cloud , 2015, Expert Syst. Appl..

[36]  Thomas Hess,et al.  Software as a Service , 2008, Wirtschaftsinf..

[37]  Deborah Estrin,et al.  Directed diffusion: a scalable and robust communication paradigm for sensor networks , 2000, MobiCom '00.

[38]  Shahram Ghandeharizadeh,et al.  Near Optimal Number of Replicas for Continuous Media in Ad-hoc Networks of Wireless Devices , 2004, Multimedia Information Systems.

[39]  V. K. Agrawal,et al.  Multi-level authentication technique for accessing cloud services , 2012, 2012 International Conference on Computing, Communication and Applications.

[40]  Prashant Pandey,et al.  Cloud computing , 2010, ICWET.

[41]  Lakshmi Sobhana Kalli,et al.  Market-Oriented Cloud Computing : Vision , Hype , and Reality for Delivering IT Services as Computing , 2013 .

[42]  Sarbjeet Singh,et al.  Dynamic Cost-Aware Re-replication and Rebalancing Strategy in Cloud System , 2014, FICTA.

[43]  Shudong Jin,et al.  Content and service replication strategies in multi-hop wireless mesh networks , 2005, MSWiM '05.

[44]  Andrew S. Tanenbaum,et al.  Distributed Systems , 2007 .

[45]  Yun Yang,et al.  A Novel Cost-Effective Dynamic Data Replication Strategy for Reliability in Cloud Data Centres , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[46]  Roslina Mohd Sidek,et al.  Lowest Data Replication Storage of Binary Vote Assignment Data Grid , 2010, NDT.

[47]  Anand Sivasubramaniam,et al.  Managing server energy and operational costs in hosting centers , 2005, SIGMETRICS '05.

[48]  Rajkumar Buyya,et al.  Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[49]  Konstantinos A. Tarabanis,et al.  A user-centric multi-PaaS application management solution for hybrid multi-Cloud scenarios , 2013, Scalable Comput. Pract. Exp..

[50]  Boleslaw K. Szymanski,et al.  Decentralized data management framework for Data Grids , 2007, Future Gener. Comput. Syst..

[51]  Peter Scheuermann,et al.  File Assignment in Parallel I/O Systems with Minimal Variance of Service Time , 2000, IEEE Trans. Computers.

[52]  Mehdi Hosseinzadeh,et al.  Expert Grid: New Type of Grid to Manage the Human Resources and Study the Effectiveness of Its Task Scheduler , 2014 .

[53]  Ahmad Habibizad Navin,et al.  Resource discovery mechanisms in grid systems: A survey , 2014, J. Netw. Comput. Appl..

[54]  Ahmad Habibizad Navin,et al.  Behavioral modeling and automated verification of a Cloud-based framework to share the knowledge and skills of human resources , 2015, Comput. Ind..

[55]  Karl Aberer,et al.  Dynamic cost-efficient replication in data clouds , 2009, ACDC '09.

[56]  Nima Jafari Navimipour,et al.  Behavioral modeling and formal verification of a resource discovery approach in Grid computing , 2014, Expert Syst. Appl..

[57]  Bin Tang,et al.  Benefit-Based Data Caching in Ad Hoc Networks , 2008, IEEE Trans. Mob. Comput..

[58]  Vidyanand Choudhary,et al.  Software as a Service: Implications for Investment in Software Development , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[59]  Xiaoyan Hong,et al.  An on-line replication strategy to increase availability in Data Grids , 2008, Future Gener. Comput. Syst..

[60]  Ruay-Shiung Chang,et al.  A dynamic data replication strategy using access-weights in data grids , 2008, The Journal of Supercomputing.

[61]  Said Mirza Pahlevi,et al.  Editorial: A Special Issue from the Open Grid Forum , 2009 .

[62]  Walter Binder,et al.  Optimizing service replication in clouds , 2011, Proceedings of the 2011 Winter Simulation Conference (WSC).

[63]  Domenico Talia,et al.  Introduction to Cloud Computing , 2015 .

[64]  John Grundy,et al.  Adaptable, model-driven security engineering for SaaS cloud-based applications , 2013, Automated Software Engineering.

[65]  Rami Bahsoon,et al.  Scalable service-oriented replication with flexible consistency guarantee in the cloud , 2014, Inf. Sci..

[66]  Shang Gao,et al.  Modeling a Dynamic Data Replication Strategy to Increase System Availability in Cloud Computing Environments , 2012, Journal of Computer Science and Technology.