JOR: A Journal-guided Reconstruction Optimization for RAID-Structured Storage Systems

This paper proposes a simple and practical RAID reconstruction optimization scheme, called JOurnal-guided Reconstruction (JOR). JOR exploits the fact that significant portions of data blocks in typical disk arrays are unused. JOR monitors the storage space utilization status at the block level to guide the reconstruction process so that only failed data on the used stripes is recovered to the spare disk. In JOR, data consistency is ensured by the requirement that all blocks in a disk array be initialized to zero (written with value zero) during synchronization while all blocks in the spare disk also be initialized to zero in the background. JOR can be easily incorporated into any existing reconstruction approach to optimize it, because the former is independent of and orthogonal to the latter. Experimental results obtained from our JOR prototype implementation demonstrate that JOR reduces reconstruction times of two state-of-the-art reconstruction schemes by an amount that is approximately proportional to the percentage of unused storage space while ensuring data consistency.

[1]  Jin Qian,et al.  PARAID: A gear-shifting power-aware RAID , 2007, TOS.

[2]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[3]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[4]  Antony I. T. Rowstron,et al.  Everest: Scaling Down Peak Loads Through I/O Off-Loading , 2008, OSDI.

[5]  Hong Jiang,et al.  Implementation and Evaluation of a Popularity-Based Reconstruction Optimization Algorithm in Availability-Oriented Disk Arrays , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[6]  Andrea C. Arpaci-Dusseau,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[7]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[8]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[9]  Eitan Bachmat,et al.  Analysis of methods for scheduling low priority disk drive tasks , 2002, SIGMETRICS '02.

[10]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[11]  Alan Jay Smith,et al.  The performance impact of I/O optimizations and disk improvements , 2004, IBM J. Res. Dev..

[12]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[13]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[14]  Hong Jiang,et al.  WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance , 2009, FAST.

[15]  Min Zhou,et al.  Analysis of personal computer workloads , 1999, MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[16]  Robert Y. Hou,et al.  Balancing I/O response time and disk rebuild time in a RAID5 disk array , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[17]  Aleksey Pesterev,et al.  Fast, Inexpensive Content-Addressed Storage in Foundation , 2008, USENIX Annual Technical Conference.

[18]  Daniel P. Siewiorek,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.

[19]  Andrea C. Arpaci-Dusseau,et al.  Life or Death at Block-Level , 2004, OSDI.

[20]  Xiang Yu,et al.  Trading capacity for performance in a disk array , 2000, OSDI.

[21]  Andrea C. Arpaci-Dusseau,et al.  Journal-guided resynchronization for software RAID , 2005, FAST'05.

[22]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[23]  Ethan L. Miller,et al.  Evaluation of distributed recovery in large-scale storage systems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[24]  Daniel P. Siewiorek,et al.  Fast, on-line failure recovery in redundant disk arrays , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[25]  Nikolai Joukov,et al.  A nine year study of file system and storage benchmarking , 2008, TOS.

[26]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[27]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[28]  Mark Holland,et al.  On-Line Data Reconstruction in Redundant Disk Arrays (CMU-CS-94-164) , 1994 .

[29]  John C. S. Lui,et al.  Automatic Recovery from Disk Failure in Continuous-Media Servers , 2002, IEEE Trans. Parallel Distributed Syst..