A Unified Framework for Simulating Markovian Models of Highly Dependable Systems

The authors present a unified framework for simulating Markovian models of highly dependable systems. It is shown that a variance reduction technique called importance sampling can be used to speed up the simulation by many orders of magnitude over standard simulation. This technique can be combined very effectively with regenerative simulation to estimate measures such as steady-state availability and mean time to failure. Moveover, it can be combined with conditional Monte Carlo methods to quickly estimate transient measures such as reliability, expected interval availability, and the distribution of interval availability. The authors show the effectiveness of these methods by using them to simulate large dependability models. They discuss how these methods can be implemented in a software package to compute both transient and steady-state measures simultaneously from the same sample run. >

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  H. Kahn,et al.  Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..

[3]  Walter L. Smith,et al.  Regenerative stochastic processes , 1955, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[4]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[5]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .

[6]  W. C. Carter,et al.  Reliability modeling techniques for self-repairing computer systems , 1969, ACM '69.

[7]  Rupert G. Miller The jackknife-a review , 1974 .

[8]  Michael A. Crane,et al.  Simulating Stable Stochastic Systems: III. Regenerative Processes and Discrete-Event Simulations , 1975, Oper. Res..

[9]  D. Siegmund Importance Sampling in the Monte Carlo Study of Sequential Tests , 1976 .

[10]  D. Iglehart,et al.  Discrete time methods for simulating continuous time Markov chains , 1976, Advances in Applied Probability.

[11]  Donald E. Knuth,et al.  Big Omicron and big Omega and big Theta , 1976, SIGA.

[12]  Stephen S. Lavenberg,et al.  Concomitant Control Variables Applied to the Regenerative Simulation of Queuing Systems , 1979, Oper. Res..

[13]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[14]  Kishor S. Trivedi,et al.  Ultrahigh Reliability Prediction for Fault-Tolerant Computer Systems , 1983, IEEE Transactions on Computers.

[15]  Marie Cottrell,et al.  Large deviations and rare events in the study of stochastic algorithms , 1983 .

[16]  Donald Gross,et al.  The Randomization Technique as a Modeling Tool and Solution Procedure for Transient Markov Processes , 1984, Oper. Res..

[17]  Elmer E Lewis,et al.  Monte Carlo simulation of Markov unreliability models , 1984 .

[18]  S. Ross,et al.  Using Simulation to Estimate First Passage Distribution , 1985 .

[19]  Kishor S. Trivedi,et al.  An Aggregation Technique for the Transient Analysis of Stiff Markov Chains , 1986, IEEE Transactions on Computers.

[20]  William H. Sanders,et al.  METASAN: A Performability Evaluation Tool Based on Stochastic Acitivity Networks , 1986, FJCC.

[21]  Kishor S. Trivedi,et al.  The hybrid automated reliability predictor , 1986 .

[22]  P. Glynn,et al.  Discrete-time conversion for simulating semi-Markov processes , 1986 .

[23]  Stephen S. Lavenberg,et al.  Modeling and Analysis of Computer System Availability , 1987, Computer Performance and Reliability.

[24]  Kishor S. Trivedi,et al.  Probabilistic modeling of computer system availability , 1987 .

[25]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis Using Directed Acyclic Graphs , 1987, IEEE Transactions on Software Engineering.

[26]  Philip Heidelberger,et al.  Measure specific dynamic importance sampling for availability simulations , 1987, WSC '87.

[27]  P. Glynn,et al.  Varaince reduction in mean time to failure simulations , 1988, 1988 Winter Simulation Conference Proceedings.

[28]  B. Fox,et al.  Discrete-Time Conversion for Finite-Horizon Markov Processes , 1988 .

[29]  Miroslaw Malek,et al.  Survey of software tools for evaluating reliability, availability, and serviceability , 1988, CSUR.

[30]  Pierre L'Ecuyer,et al.  Efficient and portable combined random number generators , 1988, CACM.

[31]  Philip Heidelberger,et al.  Varaince reduction in mean time to failure simulations (1988) , 2007, WSC '07.

[32]  Robert Geist,et al.  Ultrahigh reliability estimates through simulation , 1989, Proceedings., Annual Reliability and Maintainability Symposium.

[33]  Richard R. Muntz,et al.  Bounding Availability of Repairable Computer Systems , 1989, IEEE Trans. Computers.

[34]  Donald L. Iglehart,et al.  Importance sampling for stochastic simulations , 1989 .

[35]  Jean Walrand,et al.  A quick simulation method for excessive backlogs in networks of queues , 1989 .

[36]  Peter W. Glynn,et al.  Simulation and analysis of highly reliable systems , 1990 .

[37]  P. Glynn,et al.  Discrete-time conversion for simulating finite-horizon Markov processes , 1990 .

[38]  Philip Heidelberger,et al.  Bias Properties of Budget Constrained Simulations , 1990, Oper. Res..

[39]  Philip Heidelberger,et al.  Fast simulation of dependability models with general failure, repair and maintenance processes , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.