Performability Analysis of Computer Systems: From Model Spacification to Solution

Abstract Computer systems reliability/availability modeling deals with the representation of changes in the structure of the system being modeled, which are generally due to faults, and how such changes affect the availability of the system. On the other hand, performance modeling involves representing the probabilistic nature of user demands and predicting the system capacity to perform useful work, under the assumption that the system structure remains constant. With the advent of degradable systems, the system may be restructured in response to faults and may continue to perform useful work, even though operating at lower capacity. Performability modeling considers the effect of structural changes and their impact on the overall performance of the system. The complexity of current computer systems and the variety of different problems to be analyzed, including the simultaneous evaluation of performance and availability, demonstrate the need for sophisticated tools that allow the specification of general classes of problems while incorporating powerful analytic and/or simulation techniques. Concerning model specification, a recently proposed object oriented modeling paradigm that accommodates a wide variety of applications is discussed and compared with other approaches. With respect to solution methods, a brief overview of past work on performability evaluation of Markov models is presented. Then it is shown that many performability related measures can be calculated using the uniformization or randomization technique by coloring distinguished states and/or transitions of the Markov model of the system being studied. Finally, the state space explosion problem is addressed and several techniques for dealing with the problem are discussed.

[1]  John F. Meyer,et al.  Closed-Form Solutions of Performability , 1982, IEEE Transactions on Computers.

[2]  Richard R. Muntz,et al.  Bounding Availability of Repairable Computer Systems , 1989, IEEE Trans. Computers.

[3]  William H. Sanders,et al.  Stochastic Activity Networks: Structure, Behavior, and Application , 1985, PNPM.

[4]  Pierre Semal,et al.  Bounds for the Positive Eigenvectors of Nonnegative Matrices and for their Approximations by Decomposition , 1984, JACM.

[5]  Kishor S. Trivedi,et al.  The Conservativeness of Reliability Estimates Based on Instantaneous Coverage , 1985, IEEE Transactions on Computers.

[6]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[7]  S. J. Bavuso A User's View of CARE III , 1984 .

[8]  Kishor S. Trivedi,et al.  Extended Stochastic Petri Nets: Applications and Analysis , 1984, Performance.

[9]  Kishor S. Trivedi,et al.  SPNP: stochastic Petri net package , 1989, Proceedings of the Third International Workshop on Petri Nets and Performance Models, PNPM89.

[10]  Marco Ajmone Marsan,et al.  A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems , 1984, TOCS.

[11]  Barry R. Borgerson,et al.  A Reliability Model for Gracefully Degrading and Standby-Sparing Systems , 1975, IEEE Transactions on Computers.

[12]  Che-Liang Yang,et al.  Efficient computation of most probably states for communication networks with multimode components , 1989, IEEE Trans. Commun..

[13]  William H. Sanders,et al.  A Unified Approach for Specifying Measures of Performance, Dependability and Performability , 1991 .

[14]  Juan A. Carrasco,et al.  METFAC: design and implementation of a software tool for modeling and evaluation of complex fault-tolerant computing systems , 1986 .

[15]  Pierre Semal,et al.  Computable Bounds for Conditional Steady-State Probabilities in Large Markov Chains and Queueing Models , 1986, IEEE J. Sel. Areas Commun..

[16]  Boudewijn R. Haverkort Performability modelling tools, evaluation techniques, and applications , 1990 .

[17]  Prem S. Puri,et al.  A method for studying the integral functional of stochastic processes with applications , 1972 .

[18]  R. Muntz,et al.  An object oriented methodology for the specification of Markov models , 1988 .

[19]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[20]  Mon-Song Chen,et al.  Dynamic State Exploration in Quantitative Protocol Analysis , 1989, PSTV.

[21]  Boudewijn R. Haverkort,et al.  DyQNtool-a performability modelling tool based on the dynamic queueing network concept , 1991 .

[22]  Ushio Sumita,et al.  Analysis of fault tolerant computer systems , 1987 .

[23]  Marco Ajmone Marsan,et al.  Performance models of multiprocessor systems , 1987, MIT Press series in computer systems.

[24]  Edmundo de Souza e Silva,et al.  Calculating availability and performability measures of repairable computer systems using randomization , 1989, JACM.

[25]  W. Grassmann Transient solutions in Markovian queues : An algorithm for finding them and determining their waiting-time distributions , 1977 .

[26]  Michael K. Molloy Performance Analysis Using Stochastic Petri Nets , 1982, IEEE Transactions on Computers.

[27]  William H. Sanders,et al.  METASAN: A Performability Evaluation Tool Based on Stochastic Acitivity Networks , 1986, FJCC.

[28]  Mon-Song Chen,et al.  An integrated algorithm for probabilistic protocol verification and evaluation , 1989, IEEE INFOCOM '89, Proceedings of the Eighth Annual Joint Conference of the IEEE Computer and Communications Societies.

[29]  Philip Heidelberger,et al.  Analysis of Performability for Stochastic Models of Fault-Tolerant Systems , 1986, IEEE Transactions on Computers.

[30]  John F. Meyer,et al.  A Performability Solution Method for Degradable Nonrepairable Systems , 1984, IEEE Transactions on Computers.

[31]  H. Weisberg,et al.  The Distribution of Linear Combinations of Order Statistics from the Uniform Distribution , 1971 .

[32]  Edmundo de Souza e Silva,et al.  Analyzing Scheduled Maintenance Policies for Repairable Computer Systems , 1990, IEEE Trans. Computers.

[33]  Edmundo de Souza e Silva,et al.  Calculating Cumulative Operational Time Distributions of Repairable Computer Systems , 1986, IEEE Transactions on Computers.

[34]  Kishor S. Trivedi,et al.  Hybrid reliability modeling of fault-tolerant computer systems , 1984 .

[35]  Giuseppe Iazeolla,et al.  Performability evaluation of multicomponent fault-tolerant systems , 1988 .

[36]  Boudewijn R. Haverkort,et al.  On the mutual performance-dependability influence in dynamic queueing networks , 1991 .

[37]  Kishor S. Trivedi,et al.  Performability Analysis: Measures, an Algorithm, and a Case Study , 1988, IEEE Trans. Computers.

[38]  Kishor S. Trivedi,et al.  NUMERICAL EVALUATION OF PERFORMABILITY AND JOB COMPLETION TIME IN REPAIRABLE FAULT-TOLERANT SYSTEMS. , 1990 .

[39]  J. Keilson Markov Chain Models--Rarity And Exponentiality , 1979 .

[40]  John F. Meyer Computation-Based Reliability Analysis , 1976, IEEE Transactions on Computers.

[41]  Giovanni Chiola,et al.  A Software Package for the Analysis of Generalized Stochastic Petri Net Models , 1985, PNPM.

[42]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[43]  D. R. Miller Reliability calculation using randomization for Markovian fault-tolerant computing systems , 1982 .

[44]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[45]  Kishor S. Trivedi,et al.  A unified performance reliability analysis of a system with a cumulative down time constraint , 1992 .

[46]  A. O. Pittenger Introduction to Stochastic Processes (Erhan Çinlar) , 1977 .

[47]  Bruno Sericola,et al.  Performability Analysis Using Semi-Markov Reard Processes , 1990, IEEE Trans. Computers.

[48]  A. Jensen,et al.  Markoff chains as an aid in the study of Markoff processes , 1953 .

[49]  Ali Movaghar,et al.  Performability modeling with stochastic activity networks , 1985 .

[50]  L. Donatiello,et al.  On Evaluating the Cumulative Performance Distribution of Fault-Tolerant Computer Systems , 1991, IEEE Trans. Computers.

[51]  Erhan Çinlar,et al.  Introduction to stochastic processes , 1974 .

[52]  Nico M. van Dijk,et al.  Transient Error Bound Analysis for Continuous-Time Markov Reward Structures , 1991, Perform. Evaluation.

[53]  W. K. Grassmann Numerical Solutions for Markovian Event Systems , 1989 .

[54]  Nico M. Van Dijk,et al.  The Importance of Bias Terms for Error Bounds and Comparison Results , 1989 .

[55]  Krishna R. Pattipati,et al.  On the Computational Aspects of Performability Models of Fault-Tolerant Computer Systems , 1990, IEEE Trans. Computers.

[56]  John A. Silvester,et al.  Performance Analysis of Networks with Unreliable Components , 1984, IEEE Trans. Commun..

[57]  Winfried K. Grassmann Transient solutions in markovian queueing systems , 1977, Comput. Oper. Res..

[58]  Prem S. Puri,et al.  A method for studying the integral functionals of stochastic processes with applications: I. Markov chain case , 1971, Journal of Applied Probability.

[59]  Kishor S. Trivedi,et al.  Ultrahigh Reliability Prediction for Fault-Tolerant Computer Systems , 1983, IEEE Transactions on Computers.

[60]  Kishor S. Trivedi,et al.  Reliability Modeling Using SHARPE , 1987, IEEE Transactions on Reliability.

[61]  Kishor S. Trivedi,et al.  Analysis of Stiff Markov Chains , 1989, INFORMS J. Comput..

[62]  Miroslaw Malek,et al.  Survey of software tools for evaluating reliability, availability, and serviceability , 1988, CSUR.

[63]  William H. Sanders,et al.  Reduced Base Model Construction Methods for Stochastic Activity Networks , 1991, IEEE J. Sel. Areas Commun..

[64]  Kishor S. Trivedi,et al.  An Aggregation Technique for the Transient Analysis of Stiff Markov Chains , 1986, IEEE Transactions on Computers.

[65]  Winfried K. Grassmann,et al.  Means and variances of time averages in Markovian environments , 1987 .

[66]  Asser N. Tantawi,et al.  Evaluation of Performability for Degradable Computer Systems , 1987, IEEE Transactions on Computers.

[67]  Lorenzo Donatiello,et al.  Analysis of a composite performance reliability measure for fault-tolerant systems , 1987, JACM.

[68]  H. A. David,et al.  Order Statistics (2nd ed). , 1981 .

[69]  Ignas G. Niemegeers,et al.  Performability Modelling Using Dynamic Queueing Networks , 1989, SIGMETRICS.

[70]  Vincenzo Grassi,et al.  Performability Evaluation of Fault-Tolerant Satellite Systems , 1987, IEEE Trans. Commun..

[71]  P. Courtois Error Analysis in Nearly-Completely Decomposable Stochastic Systems , 1975 .

[72]  Kishor S. Trivedi,et al.  On modelling the performance and reliability of multimode computer systems , 1986, J. Syst. Softw..

[73]  Stephen S. Lavenberg,et al.  Calculating joint queue-length distributions in product-form queuing networks , 1988, JACM.

[74]  Donald Gross,et al.  The Randomization Technique as a Modeling Tool and Solution Procedure for Transient Markov Processes , 1984, Oper. Res..

[75]  Herbert A. Simon,et al.  Aggregation of Variables in Dynamic Systems , 1961 .

[76]  James Lyle Peterson,et al.  Petri net theory and the modeling of systems , 1981 .

[77]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis Using Directed Acyclic Graphs , 1987, IEEE Transactions on Software Engineering.

[78]  Kishor S. Trivedi,et al.  Probabilistic modeling of computer system availability , 1987 .

[79]  Micha Yadin,et al.  Randomization Procedures in the Computation of Cumulative-Time Distributions over Discrete State Markov Processes , 1984, Oper. Res..

[80]  Richard R. Muntz,et al.  An Object-Oriented Modeling Environment , 1989, Conference on Object-Oriented Programming Systems, Languages, and Applications.

[81]  Gianfranco Balbo International workshop on timed Petri nets , 1986 .

[82]  Kishor S. Trivedi,et al.  Transient Analysis of Acyclic Markov Chains , 1987, Perform. Evaluation.