Erasure Code with Shingled Local Parity Groups for Efficient Recovery from Multiple Disk Failures

The ever-growing importance and volume of digital content generated by ICT services has led to the demand for highly durable and space-efficient content storage technology. Erasure code can be an effective solution to such requirements, but the current research outcomes do not efficiently handle simultaneous multiple disk failures. We propose Shingled Erasure Code (SHEC), an erasure code with local parity groups shingled with each other, to provide efficient recovery for multiple disk failures while ensuring that the conflicting properties of space efficiency and durability are adjustable according to user requirements. We have confirmed that SHEC meets the design goals using the result of a numerical study on the relationships among the conflicting properties, and a performance evaluation of an actual SHEC implementation on Ceph, a type of open source scalable object storage software.

[1]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[2]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[3]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[4]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[5]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[6]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7]  Alexandros G. Dimakis,et al.  Repairable Fountain Codes , 2014, IEEE J. Sel. Areas Commun..

[8]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[9]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[10]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[11]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[12]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Luigi Rizzo,et al.  Effective erasure codes for reliable computer communication protocols , 1997, CCRV.

[14]  Kannan Ramchandran,et al.  Explicit and optimal exact-regenerating codes for the minimum-bandwidth point in distributed storage , 2010, 2010 IEEE International Symposium on Information Theory.

[15]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[16]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[17]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[18]  GhemawatSanjay,et al.  The Google file system , 2003 .