A New Non-MDS RAID-6 Code to Support Fast Reconstruction and Balanced I/Os

RAID-6 is widely applied to tolerate double concurrent disk failures in both disk arrays and storage clusters. Among numerous erasure codes developed to implement RAID-6, Maximum Distance Separable (MDS) Codes are highly popular. Owing to the limitation of parity generating schemes used in MDS codes, RAID-6-based storage systems suffer from unbalance I/Os and low reconstruction performance. Out of consideration for high performance and reliability, we propose a new class of XOR-based RAID-6 code (i.e. V 2-Code), which improves both load balancing and reconstruction performance of the MDS RAID-6 codes. V 2-Code, a very simple yet flexible NonMDS vertical code, can be easily implemented and deployed in storage systems. V 2-Code’s unique features include lowest density code, steady parity chain length and well-balanced computation. We perform theoretical analysis and empirical evaluation of the coding scheme by running a wide range of workload under various configurations. Experimental results show that V 2-Code outperforms four popular codes (i.e. EVENODD, RDP, X-Code and Code-M) in terms of load balancing and reconstruction time. In the single-disk-failure and double-disk-failure cases, V 2-Code can speed up the reconstruction time of X-Code by a factor of up to 3.31 and 1.79, respectively.

[1]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[2]  Patrick P. C. Lee,et al.  A cost-based heterogeneous recovery scheme for distributed storage systems with RAID-6 codes , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[3]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[4]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[5]  Gerhard Weikum,et al.  Adaptive Load Balancing in Disk Arrays , 1993, FODO.

[6]  Hong Jiang,et al.  JOR: A Journal-guided Reconstruction Optimization for RAID-Structured Storage Systems , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[7]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[8]  James S. Plank A New Minimum Density RAID-6 Code with a Word Size of Eight , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[9]  Alexandros G. Dimakis,et al.  Rebuilding for array codes in distributed storage systems , 2010, 2010 IEEE Globecom Workshops.

[10]  Xiaozhou Li,et al.  Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[11]  Patrick P. C. Lee,et al.  On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[12]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[13]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[14]  Albert Y. Zomaya,et al.  Observations on Using Genetic Algorithms for Dynamic Load-Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[15]  Xubin He,et al.  Code-M: A non-MDS erasure code scheme to support fast recovery from up to two-disk failures in storage systems , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[16]  J. Plank A New MDS Erasure Code for RAID-6 , 2007 .

[17]  Hong Jiang,et al.  P-Code: a new RAID-6 code with optimal properties , 2009, ICS '09.

[18]  Qi Zhang,et al.  Characterization of storage workload traces from production Windows Servers , 2008, 2008 IEEE International Symposium on Workload Characterization.

[19]  James Lee Hafner,et al.  HoVer Erasure Codes For Disk Arrays , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[20]  John C. S. Lui,et al.  A Hybrid Approach to Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation , 2011, TOS.

[21]  Gerhard Weikum,et al.  Data partitioning and load balancing in parallel disk systems , 1998, The VLDB Journal.

[22]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[23]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[24]  Chentao Wu,et al.  HDP code: A Horizontal-Diagonal Parity Code to Optimize I/O load balancing in RAID-6 , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[25]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[26]  Jehoshua Bruck,et al.  Cyclic Lowest Density MDS Array Codes , 2009, IEEE Transactions on Information Theory.

[27]  Yale N. Patt,et al.  Disk subsystem load balancing: disk striping vs. conventional data placement , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[28]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[29]  James S. Plank,et al.  Small parity-check erasure codes - exploration and observations , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[30]  Andrea C. Arpaci-Dusseau,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[31]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[32]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[33]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[34]  Gerhard Weikum,et al.  Dynamic file allocation in disk arrays , 1991, SIGMOD '91.

[35]  Hong Jiang,et al.  WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance , 2009, FAST.

[36]  John C. S. Lui,et al.  Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems , 2014, IEEE Transactions on Computers.

[37]  Mario Blaum,et al.  On Lowest Density MDS Codes , 1999, IEEE Trans. Inf. Theory.

[38]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.