EH-Code: An Extended MDS Code to Improve Single Write Performance of Disk Arrays for Correcting Triple Disk Failures

In the information explosion era, with the sharp increasing requirements of storage devices, concurrent multiple disk failures are not rare. In large data centers, erasure code is one of the most efficient ways to protect user data with low monetary cost. One class of erasure codes is called Maximum Distance Separable (MDS) codes, which aims to offer data protection with minimal storage overhead. However, existing Triple Disk Failure Tolerant arrays (3DFTs) based on MDS codes suffer from low single write performance, because the corresponding codes have high computational cost and low encoding performance. To address this problem, in this paper, we propose a novel MDS coding scheme called EH-Code, which is an extension of H-Code. It has three different parities, horizontal, diagonal and anti-diagonal parities, which can tolerate concurrent disk failures of any triple disks. Our mathematical analysis shows that EH-Code offers optimal storage efficiency and encoding computational complexity. Specifically, compared to STAR code, Triple-Star code and Cauchy-RS codes, EH-Code can improve the single write performance by up to \(16.13\,\%\), \(14.53\,\%\) and \(26.27\,\%\), respectively.

[1]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[2]  André Brinkmann,et al.  Evaluation of Applied Intra-disk Redundancy Schemes to Improve Single Disk Reliability , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[3]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[4]  Kern Koh,et al.  A lifespan-aware reliability scheme for RAID-based flash storage , 2011, SAC '11.

[5]  Tzone-I Wang,et al.  Efficient parity placement schemes for tolerating triple disk failures in RAID architectures , 2003, 17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003..

[6]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[7]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[8]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[9]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[10]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[11]  Xiaojing Wang,et al.  A New Class of Highly Fault Tolerant Erasure Code for the Disk Array , 2008, 2008 Workshop on Power Electronics and Intelligent Transportation System.

[12]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[13]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[14]  Mingqiang Li,et al.  STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures , 2014, TOS.

[15]  Fred Douglis,et al.  RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures , 2015, FAST.

[16]  Chentao Wu,et al.  HDP code: A Horizontal-Diagonal Parity Code to Optimize I/O load balancing in RAID-6 , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[17]  Daniel Stodolsky,et al.  Parity logging overcoming the small write problem in redundant disk arrays , 1993, ISCA '93.

[18]  Xubin He,et al.  A Comprehensive Analysis of XOR-Based Erasure Codes Tolerating 3 or More Concurrent Failures , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[19]  James S. Plank The RAID-6 Liberation Codes , 2008, FAST.

[20]  Chentao Wu,et al.  H-Code: A Hybrid MDS Array Code to Optimize Partial Stripe Writes in RAID-6 , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[21]  James Lee Hafner,et al.  HoVer Erasure Codes For Disk Arrays , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[22]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[23]  Mario Blaum,et al.  Partial-MDS Codes and Their Application to RAID Type of Architectures , 2012, IEEE Transactions on Information Theory.

[24]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[25]  Hong Jiang,et al.  P-Code: a new RAID-6 code with optimal properties , 2009, ICS '09.

[26]  Luo Xianghong and Shu Jiwu Summary of Research for Erasure Code in Storage System , 2012 .

[27]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[28]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[29]  Dharmendra S. Modha,et al.  CacheCOW: providing QoS for storage system caches , 2003, SIGMETRICS '03.

[30]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[31]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.