Minimizing data redundancy for high reliable cloud storage systems

Cloud storage system provides reliable service to users by widely deploying redundancy schemes in its system - which brings high reliability to the data storage, but inversely introduces significant overhead to the system, consisting of storage cost and energy consumption. The core behind this issue is how to leverage the relationship between data redundancy and data reliability. To optimize both concurrently is apparently difficult. As such, to fix one as a constraint and then to reach another one becomes the consensus. We aim in the paper to pursue a storage allocation scheme that minimizes the data redundancy while achieving a given (high) data reliability. For this purpose, we have provided a novel model based on generating function. With this model, we have proposed a practical and efficient storage allocation scheme, which is proved to be able to minimize the data redundancy. We analytically demonstrate that the suggested solution brings several advantages, in particular the reduction of the search space and the acceleration to the computation. We also assess the improvement on the savings of data redundancy experimentally by adopting availability traces collected from real world - which encouragingly shows that the reduction of data redundancy by our solution can reach up to more than 30% as compared to the heuristic method recently proposed in the research community.

[1]  Zhen Huang,et al.  Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems , 2012, Journal of Grid Computing.

[2]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[3]  Yang Tang,et al.  NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds , 2014, IEEE Transactions on Computers.

[4]  Yijie Wang,et al.  Research and performance evaluation of data replication technology in distributed storage systems , 2006, Comput. Math. Appl..

[5]  Alexandros G. Dimakis,et al.  Distributed Storage Allocations , 2010, IEEE Transactions on Information Theory.

[6]  Anne-Marie Kermarrec,et al.  Regenerating Codes: A System Perspective , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[7]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[8]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[9]  Dah Ming Chiu,et al.  Erasure code replication revisited , 2004 .

[10]  Marc Sánchez Artigas,et al.  Heterogeneity-Aware Erasure Codes for Peer-to-Peer Storage Systems , 2009, 2009 International Conference on Parallel Processing.

[11]  Yunnan Wu,et al.  Network coding for distributed storage systems , 2010, IEEE Trans. Inf. Theory.

[12]  Alexandros G. Dimakis,et al.  Distributed storage allocation problems , 2009, 2009 Workshop on Network Coding, Theory, and Applications.

[13]  Zhen Huang,et al.  Reducing Repair Traffic in P2P Backup Systems: Exact Regenerating Codes on Hierarchical Codes , 2011, TOS.

[14]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[15]  Marc Sánchez Artigas,et al.  Towards the design of optimal data redundancy schemes for heterogeneous cloud storage infrastructures , 2011, Comput. Networks.

[16]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[17]  Herbert S. Wilf,et al.  Generating functionology , 1990 .

[18]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[19]  Gwan-Hwan Hwang,et al.  The Design and Implementation of Appointed File Prefetching for Distributed File Systems , 2008, J. Res. Pract. Inf. Technol..

[20]  Jie Xu,et al.  Internet-based Virtual Computing Environment: Beyond the data center as a computer , 2013, Future Gener. Comput. Syst..

[21]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[22]  Alexandros G. Dimakis,et al.  Distributed Storage Allocation for High Reliability , 2010, 2010 IEEE International Conference on Communications.

[23]  Karan Gupta,et al.  Energy proportionality for storage: impact and feasibility , 2010, OPSR.