论文信息 - Optimization of the Recovery Time of Pyramid Code in Distributed Storage System

Optimization of the Recovery Time of Pyramid Code in Distributed Storage System

In large scale distributed storage systems, erasure code is a basic technology that provides high reliability at low cost. Compared with traditional redundancy technology, erasure coding technology has low redundancy and high flexibility. Therefore, it is a good choice for distributed systems to construct Pyramid codes that are flexible and suitable for a variety of application scenarios, but its disadvantage is that the data recovery time is still long. To address the above problems, in this paper, we propose an Active Fault-Tolerant Pyramid (AFTP) based code, which dynamically adjusts the length of the group in the Pyramid code and the original data block correlation with redundant blocks by using the hard disk fault prediction model based on the decision tree to reduce the length of the group of data blocks in a potentially faulty hard disk, which can be used for multiple hard disk failures. All read and recovery operations are performed within the group, reducing recovery time without adding additional storage overhead. To verify the validity of the AFTP code, we conduct intensive experiments on the distributed storage system based on Ceph. The results show that, compared with Basic-Pyramid (BP), the recovery time of AFTP code is reduced by 8%-64%, and compared with the commonly used classic block codes, the recovery time of AFTP code is reduced by 11%-52%.

Liu Longxiang | Dan Tang | Rui He | Geng Wei | Hang Zhang

[1] Takeshi Miyamae,et al. Erasure Code with Shingled Local Parity Groups for Efficient Recovery from Multiple Disk Failures , 2014, HotDep.

[2] Gang Wang,et al. ProCode: A Proactive Erasure Coding Scheme for Cloud Storage Systems , 2016, 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS).

[3] Greg Hamerly,et al. Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[4] Gang Wang,et al. Proactive drive failure prediction for large scale storage systems , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[5] Joseph F. Murray,et al. Hard drive failure prediction using non-parametric statistical methods , 2003 .

[6] Dong Xu,et al. A Dynamic Erasure Code Based on Block Code , 2019, EWSN.

[7] Joseph F. Murray,et al. Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..