Fast simulation of steady-state availability in non-Markovian highly dependable systems

Efficient simulation techniques for estimating steady-state quantities in models of highly dependable computing systems with general component failure and repair time distributions are considered. Earlier approaches in this application setting for steady-state estimation rely on the regenerative method of simulation, which an be used when the failure time distributions are exponentially distributed. However, when the failure times are generally distributed the regenerative structure is lost and a new approach must be taken. The approach the authors take is to exploit a ratio representation for steady-state quantities in terms of cycles that are no longer independent and identically distributed. A splitting technique is used in which importance sampling is used to speed up the simulation of rare system failure events during a cycle, and standard simulation is used to estimate the expected cycle length. Experimental results show that the method is effective in practice.

[1]  P. Glynn A GSMP formalism for discrete event systems , 1989, Proc. IEEE.

[2]  Philip Heidelberger,et al.  Bounded relative error in estimating transient measures of highly dependable non-Markovian systems , 1994, TOMC.

[3]  A. Jensen,et al.  Markoff chains as an aid in the study of Markoff processes , 1953 .

[4]  Philip Heidelberger,et al.  Simultaneous and efficient simulation of highly dependable systems with different underlying distributions , 1992, WSC '92.

[5]  Philip Heidelberger,et al.  Fast simulation of dependability models with general failure, repair and maintenance processes , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[6]  Perwez Shahabuddin,et al.  Importance sampling for the simulation of highly reliable Markovian systems , 1994 .

[7]  R. Cogburn A Uniform Theory for Sums of Markov Chain Transition Probabilities , 1975 .

[8]  Philip Heidelberger,et al.  Efficient estimation of the mean time between failures in non-regenerative dependability models , 1993, WSC '93.

[9]  Linus Schrage,et al.  A guide to simulation , 1983 .

[10]  Philip Heidelberger,et al.  A Unified Framework for Simulating Markovian Models of Highly Dependable Systems , 1992, IEEE Trans. Computers.

[11]  Stephen S. Lavenberg,et al.  Modeling and Analysis of Computer System Availability , 1987, Computer Performance and Reliability.

[12]  Juan A. Carrasco Failure distance-based simulation of repairable fault-tolerant systems , 1992 .

[13]  Juan A. Carrasco Efficient transient simulation of failure/repair Markovian models , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[14]  Boudewijn R. Haverkort,et al.  Fault Injection Simulation: A Variance Reduction Technique for Systems with Rare Events , 1992 .

[15]  Peter W. Glynn,et al.  Replication Schemes For Limiting Expectations , 1989, Probability in the Engineering and Informational Sciences.

[16]  Philip Heidelberger,et al.  Uniformization and exponential transformation: Techniques for fast simulation of highly dependable non-Markovian systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[17]  Michael A. Crane,et al.  Simulating Stable Stochastic Systems: III. Regenerative Processes and Discrete-Event Simulations , 1975, Oper. Res..

[18]  Jean Walrand,et al.  A quick simulation method for excessive backlogs in networks of queues , 1989 .

[19]  Robert Geist,et al.  Ultrahigh reliability estimates through simulation , 1989, Proceedings., Annual Reliability and Maintainability Symposium.

[20]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .

[21]  J. Sadowsky Large deviations theory and efficient simulation of excessive backlogs in a GI/GI/m queue , 1991 .

[22]  Donald L. Iglehart,et al.  Importance sampling for stochastic simulations , 1989 .

[23]  Perwez Shahabuddin,et al.  Fast Transient Simulation of Markovian Models of Highly Dependable Systems , 1994, Perform. Evaluation.

[24]  Michael R. Frater,et al.  Optimally efficient estimation of the statistics of rare events in queueing networks , 1991 .

[25]  Elmer E Lewis,et al.  Monte Carlo simulation of Markov unreliability models , 1984 .

[26]  Sandeep Juneja,et al.  Fast simulation of Markovian reliability/availability models with general repair policies , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[27]  P. Glynn,et al.  Estimating time averages via randomly-spaced observations , 1987 .