An Energy-Efficient Reliability Model for Parallel Disk Systems

In the last decade, parallel disk systems have increasingly become popular for data-intensive applications running on high-performance computing platforms. Conservation of energy in parallel disk systems has a strong impact on the cost of cooling equipment and backup power-generation. This is because a significant amount of energy is consumed by parallel disks in high-performance computing centers. Although a wide range of energy conservation techniques have been developed for disk systems, most energy saving schemes have adverse impacts on the reliability of parallel disk systems. To address this deficiency, we must focus on reliability analysis for energy-efficient parallel disk systems. In this paper, we make use of a Markov process to develop a quantitative reliability model for energy-efficient parallel disk systems using data mirroring. With the new model in place, a reliability analysis tool is developed to efficiently evaluate reliability of fault-tolerant parallel disk systems with two power modes.

[1]  Satish K. Tripathi,et al.  Availability of a distributed computer system with failures , 2004, Acta Informatica.

[2]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[3]  Mahmut T. Kandemir,et al.  Software-directed disk power management for scientific applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[4]  Yuanyuan Zhou,et al.  Hibernator: helping disk arrays sleep through the winter , 2005, SOSP '05.

[5]  David A. Patterson,et al.  Designing Disk Arrays for High Data Reliability , 1993, J. Parallel Distributed Comput..

[6]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.

[7]  Donald F. Towsley,et al.  A Performance Evaluation of RAID Architectures , 1996, IEEE Trans. Computers.

[8]  Ricardo Bianchini,et al.  Exploiting redundancy to conserve energy in storage systems , 2006, SIGMETRICS '06/Performance '06.

[9]  Yale N. Patt,et al.  Using non-volatile storage to improve the reliability of RAID5 disk arrays , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[10]  Donald F. Towsley,et al.  The Design and Evaluation of RAID 5 and Parity Striping Disk Array Architectures , 1993, J. Parallel Distributed Comput..

[11]  Kang G. Shin,et al.  Real-time dynamic voltage scaling for low-power embedded operating systems , 2001, SOSP.

[12]  Sung Hoon Baek,et al.  Reliability and performance of hierarchical RAID with multiple controllers , 2001, PODC '01.

[13]  Mahmut T. Kandemir,et al.  DRPM: dynamic speed control for power management in server class disks , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[14]  Kishor S. Trivedi,et al.  An analytic treatment of the reliability and performance of mirrored disk subsystems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[15]  Xiao Qin,et al.  An Energy-Efficient Framework for Large-Scale Parallel Storage Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[16]  Jin Qian,et al.  PARAID: A gear-shifting power-aware RAID , 2007, TOS.

[17]  Xiao Qin,et al.  Design and analysis of a load balancing strategy in Data Grids , 2007, Future Gener. Comput. Syst..

[18]  Israel Koren,et al.  Towards energy-aware software-based fault tolerance in real-time systems , 2002, ISLPED '02.

[19]  Darrell D. E. Long,et al.  Adaptive disk spin‐down for mobile computers , 2000, Mob. Networks Appl..

[20]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[21]  O. Gaudoin,et al.  More on the Mis-Specification of the Shape Parameter with Weibull-to-Exponential Transformation , 2000 .

[22]  Paul Horton,et al.  A Quantitative Analysis of Disk Drive Power Management in Portable Computers , 1994, USENIX Winter.

[23]  Skee Smith,et al.  The U.S. Department of Education. , 1979 .

[24]  P. Krishnan,et al.  Thwarting the Power-Hungry Disk , 1994, USENIX Winter.

[25]  Yuanyuan Zhou,et al.  Reducing Energy Consumption of Disk Storage Using Power-Aware Cache Management , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[26]  John C. S. Lui,et al.  Automatic Recovery from Disk Failure in Continuous-Media Servers , 2002, IEEE Trans. Parallel Distributed Syst..

[27]  Mahmut T. Kandemir,et al.  Interplay of energy and performance for disk arrays running transaction processing workloads , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[28]  Philip M. Long,et al.  Adaptive Disk Spindown via Optimal Rent-to-Buy in Probabilistic Environments , 1999, Algorithmica.

[29]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[30]  Elias Drakopoulos,et al.  Performance Analysis of Client-Server Storage Systems , 1992, IEEE Trans. Computers.

[31]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[32]  Author $article.title , 2002, Nature.

[33]  Mahmut T. Kandemir,et al.  Energy-aware data prefetching for multi-speed disks , 2006, CF '06.