RFH: A Resilient, Fault-Tolerant and High-Efficient Replication Algorithm for Distributed Cloud Storage

To avoid failure and achieve higher availability, replication scheme is now widely used in distributed Cloud storage systems [25]. However, most of them only statically replicate data on some randomly chosen nodes for a fixed number of times and it is obviously not enough for more reasonable resource allocation. Moreover, query load for Web application is highly irregular. It throws us into a dilemma to always maintain maximum number of replicas in case of explosive query load outburst or save resources with fewer replicas at the expense of performance. In this paper, we present a Resilient, Fault-tolerant and High-efficient global replication algorithm (RFH) for distributed Cloud storage systems. RFHis especially efficient facing 'flash crowd' problem. Each data partition is represented by a virtual node. Each virtual node itself decides whether to replicate, migrate or suicide by weighing up the pros and cons. It is based on the evaluation of traffic load of all nodes, and selects among physical nodes with the most traffic (traffic hub) to replicate or migrate on. After that, it takes into account blocking probability to achieve quicker response and better load balance performance. Extensive simulations have been conducted and the results have demonstrated that the proposed scheme RFH outperforms the main existing algorithms the request-oriented algorithms[16] [5], the owner-oriented algorithms [7] [11] [12] [13] and the random algorithms [4] [21] [22] in terms of high replica utilization rate, high query efficiency and reasonable path length at a low cost while maintaining high availability.

[1]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[2]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[3]  Dan Feng,et al.  CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster , 2010, 2010 IEEE International Conference on Cluster Computing.

[4]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[5]  Haiying Shen,et al.  An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[6]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[7]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[8]  Naixue Xiong,et al.  Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems , 2009, IEEE Journal on Selected Areas in Communications.

[9]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[10]  Michael B. Jones,et al.  Overlook: scalable name service on an overlay network , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[11]  Karl Aberer,et al.  A self-organized, fault-tolerant and scalable replication scheme for cloud storage , 2010, SoCC '10.

[12]  John S. Heidemann,et al.  The Ficus Replicated File System , 1992, OPSR.

[13]  Eitan Altman,et al.  Forward Correction and Fountain Codes in Delay-Tolerant Networks , 2008, IEEE/ACM Transactions on Networking.

[14]  Kenneth Salem,et al.  Lazy database replication with snapshot isolation , 2006, VLDB.

[15]  GhemawatSanjay,et al.  The Google file system , 2003 .

[16]  Dan Feng,et al.  Adaptive Object Placement in Object-Based Storage Systems with Minimal Blocking Probability , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[17]  Jing Yuan,et al.  DAC: Generic and Automatic Address Configuration for Data Center Networks , 2012, IEEE/ACM Transactions on Networking.

[18]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[19]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[20]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[21]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[22]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.