Cost Analysis of Redundancy Schemes for Distributed Storage Systems

Distributed storage infrastructures require the use of data redundancy to achieve high data reliability. Unfortunately, the use of redundancy introduces storage and communication overheads, which can either reduce the overall storage capacity of the system or increase its costs. To mitigate the storage and communication overhead, different redundancy schemes have been proposed. However, due to the great variety of underlaying storage infrastructures and the different application needs, optimizing these redundancy schemes for each storage infrastructure is cumbersome. The lack of rules to determine the optimal level of redundancy for each storage configuration leads developers in industry to often choose simpler redundancy schemes, which are usually not the optimal ones. In this paper we analyze the cost of different redundancy schemes and derive a set of rules to determine which redundancy scheme minimizes the storage and the communication costs for a given system configuration. Additionally, we use simulation to show that theoretically-optimal schemes may not be viable in a realistic setting where nodes can go off-line and repairs may be delayed. In these cases, we identify which are the trade-offs between the storage and communication overheads of the redundancy scheme and its data reliability.

[1]  Antoine Vernois,et al.  Data durability in peer to peer storage systems , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[2]  Chen Gui Redundancy Schemes for High Availability in DHTs , 2008 .

[3]  Andreas Haeberlen,et al.  Proactive Replication for Data Durability , 2006, IPTPS.

[4]  Xiaosong Ma,et al.  Does erasure coding have a role to play in my data center , 2010 .

[5]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[6]  James S. Plank,et al.  A practical analysis of low-density parity-check erasure codes for wide-area storage applications , 2004, International Conference on Dependable Systems and Networks, 2004.

[7]  Pedro García López,et al.  Maintaining data reliability without availability in P2P storage systems , 2010, SAC '10.

[8]  D. M. Chiu,et al.  Erasure code replication revisited , 2004, Proceedings. Fourth International Conference on Peer-to-Peer Computing, 2004. Proceedings..

[9]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[10]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[11]  Ravi Jain,et al.  An Experimental Study of the Skype Peer-to-Peer VoIP System , 2005, IPTPS.

[12]  Dah Ming Chiu,et al.  Erasure code replication revisited , 2004 .

[13]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[14]  Karl Aberer,et al.  Internet-scale storage systems under churn - A steady state analysis , 2005 .

[15]  Pietro Michiardi,et al.  Online Data Backup: A Peer-Assisted Approach , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[16]  Marc Sánchez Artigas,et al.  Availability and Redundancy in Harmony: Measuring Retrieval Times in P2P Storage Systems , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[17]  Xin Wang,et al.  Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes , 2010, 2010 Proceedings IEEE INFOCOM.

[18]  Yunnan Wu,et al.  Network coding for distributed storage systems , 2010, IEEE Trans. Inf. Theory.

[19]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[20]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[21]  Taoufik En-Najjary,et al.  A global view of kad , 2007, IMC '07.

[22]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[23]  Ernst W. Biersack,et al.  Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems , 2008, 2008 Eighth International Conference on Peer-to-Peer Computing.

[24]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[25]  Karan Gupta,et al.  Energy proportionality for storage: impact and feasibility , 2010, OPSR.

[26]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[27]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[28]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[29]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[30]  Frédérique Oggier,et al.  Self-repairing homomorphic codes for distributed storage systems , 2010, 2011 Proceedings IEEE INFOCOM.

[31]  David Moore,et al.  Replication Strategies for Highly Available Peer-to-Peer Storage , 2002, Future Directions in Distributed Computing.

[32]  GhemawatSanjay,et al.  The Google file system , 2003 .

[33]  Karl Aberer,et al.  Internet-Scale Storage Systems under Churn -- A Study of the Steady-State using Markov Models , 2006, Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06).

[34]  Ernst W. Biersack,et al.  A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[35]  Laurent Massoulié,et al.  ECHOS: edge capacity hosting overlays of nano data centers , 2008, CCRV.

[36]  Taoufik En-Najjary,et al.  Proactive replication in distributed storage systems using machine availability estimation , 2007, CoNEXT '07.

[37]  Dmitri Loguinov,et al.  Modeling Heterogeneous User Churn and Local Resilience of Unstructured P2P Networks , 2006, Proceedings of the 2006 IEEE International Conference on Network Protocols.

[38]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.