On the impact of erasure coding parameters to the reliability of distributed brick storage systems

For a given amount of storage overhead, erasure coding offers a higher degree of survivability than pure replication. Consequently, erasure coding attracts much attention these years in the research area of reliable distributed storage systems. Although numerous erasure codes have been put forward, how to choose erasure coding parameters to maximize system reliability has not yet been sufficiently investigated. Erasure coding parameters greatly affect not only system survivability with multiple concurrent failures, but also data repair speed. We propose a method that can quantitatively evaluate these effects. Based on the method, the issue of determining erasure coding parameters is solved. Besides, relationships among other reliability-affecting factors such as storage overhead, repair bandwidth and single brick's properties are also investigated.

[1]  Wei Chen,et al.  On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[2]  Brighten Godfrey,et al.  OpenDHT: a public DHT service and its uses , 2005, SIGCOMM '05.

[3]  Wenqing Cheng,et al.  A simulation study on network coding parameters in P2P content distribution system , 2008, 2008 Third International Conference on Communications and Networking in China.

[4]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[5]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[6]  Vinod M. Prabhakaran,et al.  Decentralized erasure codes for distributed networked storage , 2006, IEEE Transactions on Information Theory.

[7]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[8]  D. M. Chiu,et al.  Erasure code replication revisited , 2004, Proceedings. Fourth International Conference on Peer-to-Peer Computing, 2004. Proceedings..

[9]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[10]  Marcos K. Aguilera,et al.  Using erasure codes efficiently for storage in a distributed system , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[11]  Taoufik En-Najjary,et al.  Proactive replication in distributed storage systems using machine availability estimation , 2007, CoNEXT '07.

[12]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[13]  Andreas Haeberlen,et al.  Proactive Replication for Data Durability , 2006, IPTPS.