PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems

This paper proposes and evaluates a novel dynamic data reconstruction optimization algorithm, called popularity-based multi-threaded reconstruction optimization (PRO), which allows the reconstruction process in a RAID-structured storage system to rebuild the frequently accessed areas prior to rebuilding infrequently accessed areas to exploit access locality. This approach has the salient advantage of simultaneously decreasing reconstruction time and alleviating user and system performance degradation. It can also be easily adopted in various conventional reconstruction approaches. In particular, we optimize the disk-oriented reconstruction (DOR) approach with PRO. The PRO-powered DOR is shown to induce a much earlier onset of response-time improvement and sustain a longer time span of such improvement than the original DOR. Our benchmark studies on read-only web workloads have shown that the PRO-powered DOR algorithm consistently outperforms the original DOR algorithm in the failurerecovery process in terms of user response time, with a 3.6%-23.9% performance improvement and up to 44.7% reconstruction time improvement simultaneously.

[1]  Gianfranco Ciardo,et al.  Characterizing temporal locality and its impact on web server performance , 2000, Proceedings Ninth International Conference on Computer Communications and Networks (Cat.No.00EX440).

[2]  Andrea C. Arpaci-Dusseau,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[3]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[4]  Garth A. Gibson Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis , 1990 .

[5]  John C. S. Lui,et al.  Performance Analysis of Disk Arrays under Failure , 1990, VLDB.

[6]  John C. S. Lui,et al.  Automatic Recovery from Disk Failure in Continuous-Media Servers , 2002, IEEE Trans. Parallel Distributed Syst..

[7]  Jim Zelenka,et al.  RAIDframe: rapid prototyping for disk arrays , 1996, SIGMETRICS '96.

[8]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[9]  Daniel P. Siewiorek,et al.  Fast, on-line failure recovery in redundant disk arrays , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[10]  Ethan L. Miller,et al.  Evaluation of distributed recovery in large-scale storage systems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[12]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[13]  Fabrizio Lombardi,et al.  Analysis of repair algorithms for mirrored-disk systems , 1997 .

[14]  Daniel P. Siewiorek,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.

[15]  Ludmila Cherkasova,et al.  Analysis of enterprise media server workloads: access patterns, locality, content evolution, and rates of change , 2004, IEEE/ACM Transactions on Networking.

[16]  Robert Y. Hou,et al.  Balancing I/O response time and disk rebuild time in a RAID5 disk array , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[17]  Prashant J. Shenoy,et al.  Efficient failure recovery in multi-disk multimedia servers , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[18]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[19]  Mark Holland,et al.  On-Line Data Reconstruction in Redundant Disk Arrays (CMU-CS-94-164) , 1994 .

[20]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.