Three-Dimensional Redundancy Codes for Archival Storage

Fault-tolerant disk arrays rely on replication or erasure-coding to reconstruct lost data after a disk failure. As disk capacity increases, so does the risk of encountering irrecoverable read errors that would prevent the full recovery of the lost data. We propose a three-dimensional erasure-coding technique that reduces that risk by guaranteeing full recovery in the presence of all triple and nearly all quadruple disk failures. Our solution performs better than existing solutions, such as sets of disk arrays using Reed-Solomon codes against triple failures in each individual array. Given its very high reliability, it is especially suited to the needs of very large data sets that must be preserved over long periods of time.

[1]  Randy H. Katz,et al.  Coding techniques for handling failures in large disk arrays , 2005, Algorithmica.

[2]  Ahmed Amer,et al.  Low-redundancy two-dimensional RAID arrays , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[3]  Minoru Uehara Design and Implementation of 3D MeshRAID in Virtual Large-Scale Disks , 2011, 2011 Third International Conference on Intelligent Networking and Collaborative Systems.

[4]  Minoru Uehara Design and Implementation of Mesh RAID with Multiple Parities in Virtual Large-Scale Disks , 2012, 2012 IEEE 26th International Conference on Advanced Information Networking and Applications.

[5]  Jehan-Francois Paris,et al.  Self-adjusting two-failure tolerant disk arrays , 2010, 2010 5th Petascale Data Storage Workshop (PDSW '10).

[6]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[7]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[8]  Thomas Schwarz,et al.  Reliability and performance of disk arrays , 1994 .