Single Failure Recovery Method for Erasure Coded Storage System with Heterogeneous Devices

As the demand of data reliability becomes more and more larger, most of today’s storage systems adopt erasure codes to assure the data could be reconstructed when suffering from physical device failures. In order to fast recover the lost data from a single failure, recovery optimization methods have attracted a lot of attention in recent years. However, most of the existing optimization methods focus on homogeneous devices, ignoring the fact that the storage devices are usually heterogeneous. In this paper, we propose a new recovery optimization method named HSR (Heterogeneous Storage Recovery) method, which uses both loads and speed rate among physical devices as the optimization target, in order to further improve the recovery performance for heterogeneous devices. The experiment results show that, compared to existing popular recovery optimization methods, HSR method gains much higher recovery speed over heterogeneous storage devices. key words: storage system, erasure code, heterogeneous devices, single failure recovery

[1]  Tapas Kanungo,et al.  IBM Research Report Performance Metrics for Erasure Codes in Storage Systems , 2004 .

[2]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[3]  Jiwu Shu,et al.  D-Code: An Efficient RAID-6 Code to Optimize I/O Loads and Read Performance , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[4]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[5]  Jiwu Shu,et al.  Seek-Efficient I/O Optimization in Single Failure Recovery for XOR-Coded Storage Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[6]  Jiwu Shu,et al.  A Stack-Based Single Disk Failure Recovery Scheme for Erasure Coded Storage Systems , 2014, 2014 IEEE 33rd International Symposium on Reliable Distributed Systems.

[7]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[8]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Jiwu Shu,et al.  Reconsidering Single Disk Failure Recovery for Erasure Coded Storage Systems: Optimizing Load Balancing in Stack-Level , 2016, IEEE Transactions on Parallel and Distributed Systems.

[11]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[12]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[13]  Shilin Wen,et al.  Strip-Switched Deployment Method to Optimize Single Failure Recovery for Erasure Coded Storage Systems , 2018, IEICE Trans. Inf. Syst..

[14]  James S. Plank The RAID-6 Liberation Codes , 2008, FAST.

[15]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.