Availability and Reliability Modeling for Computer Systems

Publisher Summary Dependability calculates the capability of a product to deliver its intended level of service to the user, especially in light of failures or other incidents that impinge on its performance, and combines various underlying ideas, such as reliability, maintainability, availability, and user demand patterns, into a basic overall measure of quality, which customers use along with cost and performance to evaluate products. This chapter describes the computer system dependability analysis and its types, different classes of dependability measures, Markov and Markov reward models commonly involved for dependability analysis and their solution methods. The three classes of dependability measures are system availability measures, system reliability measures, and task completion measures. The chapter also describes four types of dependability analyses: evaluation, sensitivity analysis, specification determination, and tradeoff analysis. A model-based evaluation, or sometimes a hybrid approach based on a judicious combination of models and measurements, is used for cost-effective dependability analysis. The chapter discusses the determination of the parameters, such as failure rates, coverage probabilities, repair rates, and reward rates as well as model verification and validation. The chapter also demonstrates the use of these methods, a detailed dependability analysis on a full-system example representative of existing computer systems.

[1]  D. Cox A use of complex probabilities in the theory of stochastic processes , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  Kishor S. Trivedi,et al.  Queueing Analysis of Fault-Tolerant Computer Systems , 1987, IEEE Trans. Software Eng..

[3]  G. V. Kulkarni,et al.  The Completion Time of a Job on Multi-Mode Systems , 1985 .

[4]  Kishor S. Trivedi,et al.  Probabilistic modeling of computer system availability , 1987 .

[5]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[6]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[7]  Richard R. Muntz,et al.  Bounding availability of repairable computer systems , 1989, SIGMETRICS '89.

[8]  Kishor S. Trivedi,et al.  Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems , 1989, IEEE Trans. Computers.

[9]  Kishor S. Trivedi,et al.  Approximate availability analysis of VAXcluster systems , 1989 .

[10]  Malathi Veeraraghavan,et al.  An Approach to Solving Large Reliability Models , 1988 .

[11]  Kishor S. Trivedi,et al.  Ultrahigh Reliability Prediction for Fault-Tolerant Computer Systems , 1983, IEEE Transactions on Computers.

[12]  Kishor S. Trivedi,et al.  Reliability Modeling Using SHARPE , 1987, IEEE Transactions on Reliability.

[13]  Jean Arlat,et al.  Fault injection for dependability validation of fault-tolerant computing systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[14]  Kishor S. Trivedi,et al.  Markov and Markov reward model transient analysis: An overview of numerical approaches , 1989 .

[15]  Bev Littlewood,et al.  Software reliability prediction , 1986 .

[16]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[17]  P. Smith Santa Fe, New Mexico , 1969 .

[18]  K. Mani Chandy,et al.  Current trends in programming methodology , 1977 .

[19]  Kishor S. Trivedi,et al.  The hybrid automated reliability predictor , 1986 .

[20]  Kishor S. Trivedi,et al.  SPNP: stochastic Petri net package , 1989, Proceedings of the Third International Workshop on Petri Nets and Performance Models, PNPM89.

[21]  John F. Meyer,et al.  Closed-Form Solutions of Performability , 1982, IEEE Transactions on Computers.

[22]  Kishor S. Trivedi,et al.  Hierarchical Modeling for Reliability and Performance Measures , 1988 .

[23]  Kishor S. Trivedi,et al.  Analysis of Typical Fault-Tolerant Architectures using HARP , 1987, IEEE Transactions on Reliability.

[24]  Kishor S. Trivedi,et al.  System performance in a failure prone environment , 1988 .

[25]  Kishor S. Trivedi,et al.  Numerical transient analysis of markov models , 1988, Comput. Oper. Res..

[26]  Thomas H. Naylor,et al.  Verification of Computer Simulation Models , 1967 .

[27]  Kishor S. Trivedi,et al.  Transient analysis of cumulative measures of markov model behavior , 1989 .

[28]  Kishor S. Trivedi,et al.  Performability Analysis: Measures, an Algorithm, and a Case Study , 1988, IEEE Trans. Computers.

[29]  Kishor S. Trivedi,et al.  Reliability analysis of interconnection networks using hierarchical composition , 1989 .

[30]  Kishor S. Trivedi,et al.  Stochastic Petri net modeling of VAXcluster system availability , 1989, Proceedings of the Third International Workshop on Petri Nets and Performance Models, PNPM89.

[31]  Bruno Sericola,et al.  Performability Analysis Using Semi-Markov Reard Processes , 1990, IEEE Trans. Computers.

[32]  John A. Silvester,et al.  Performance Analysis of Networks with Unreliable Components , 1984, IEEE Trans. Commun..

[33]  Kishor S. Trivedi,et al.  An Aggregation Technique for the Transient Analysis of Stiff Markov Chains , 1986, IEEE Transactions on Computers.

[34]  Kishor S. Trivedi,et al.  Performability Modeling Based on Real Data: A Case Study , 1988, IEEE Trans. Computers.

[35]  G. V. Kulkarni,et al.  Effects of Checkpointing and Queueing on Program Performance , 1987 .