Optimization of the Recovery Time of Pyramid Code in Distributed Storage System

In large scale distributed storage systems, erasure code is a basic technology that provides high reliability at low cost. Compared with traditional redundancy technology, erasure coding technology has low redundancy and high flexibility. Therefore, it is a good choice for distributed systems to construct Pyramid codes that are flexible and suitable for a variety of application scenarios, but its disadvantage is that the data recovery time is still long. To address the above problems, in this paper, we propose an Active Fault-Tolerant Pyramid (AFTP) based code, which dynamically adjusts the length of the group in the Pyramid code and the original data block correlation with redundant blocks by using the hard disk fault prediction model based on the decision tree to reduce the length of the group of data blocks in a potentially faulty hard disk, which can be used for multiple hard disk failures. All read and recovery operations are performed within the group, reducing recovery time without adding additional storage overhead. To verify the validity of the AFTP code, we conduct intensive experiments on the distributed storage system based on Ceph. The results show that, compared with Basic-Pyramid (BP), the recovery time of AFTP code is reduced by 8%-64%, and compared with the commonly used classic block codes, the recovery time of AFTP code is reduced by 11%-52%.