Techniques for fast simulation of models of highly dependable systems

With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable systems.

[1]  Kishor S. Trivedi,et al.  An Aggregation Technique for the Transient Analysis of Stiff Markov Chains , 1986, IEEE Transactions on Computers.

[2]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[3]  Kishor S. Trivedi,et al.  THE SYSTEM AVAILABILITY ESTIMATOR , 1996 .

[4]  Christos Alexopoulos,et al.  Estimating reliability measures for highly-dependable Markov systems, using balanced likelihood ratios , 2001, IEEE Trans. Reliab..

[5]  K. Burn,et al.  Algorithms for the calculation of the second moment of geometrical splitting in Monte Carlo , 1987 .

[6]  J. George Shanthikumar Uniformization and Hybrid Simulation/Analytic Models of Renewal Processes , 1986, Oper. Res..

[7]  P. Glynn,et al.  Estimating time averages via randomly-spaced observations , 1987 .

[8]  Richard E. Barlow,et al.  Statistical Theory of Reliability and Life Testing: Probability Models , 1976 .

[9]  Asser N. Tantawi,et al.  Evaluation of Performability for Degradable Computer Systems , 1987, IEEE Transactions on Computers.

[10]  Ward Whitt,et al.  The Asymptotic Efficiency of Simulation Estimators , 1992, Oper. Res..

[11]  Elmer E Lewis,et al.  Monte Carlo simulation of Markov unreliability models , 1984 .

[12]  William S. Griffith,et al.  athematical Theory of Reliability of Time Dependent Systems With Practical Applications , 1999, Technometrics.

[13]  R. Y. Rubinstein,et al.  A fast Monte Carlo method for evaluating reliability indexes , 1999 .

[14]  Juan A. Carrasco Efficient transient simulation of failure/repair Markovian models , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[15]  Michael Devetsikiotis,et al.  Importance Sampling Methodologies for Simulation of Communication Systems with Time-Varying Channels and Adaptive Equalizers , 1993, IEEE J. Sel. Areas Commun..

[16]  Hisashi Kobayashi,et al.  Modeling and analysis , 1978 .

[17]  I. Gertsbakh Asymptotic methods in reliability theory: a review , 1984, Advances in Applied Probability.

[18]  Philip Heidelberger,et al.  Uniformization and exponential transformation: Techniques for fast simulation of highly dependable non-Markovian systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[19]  Alan Weiss,et al.  Sensitivity analysis via likelihood ratios , 1986, WSC '86.

[20]  P. Shahabuddin,et al.  Estimation of reliability and its derivatives for large time horizons in Markovian systems , 1993, WSC '93.

[21]  P. Glynn A GSMP formalism for discrete event systems , 1989, Proc. IEEE.

[22]  Philip Heidelberger,et al.  Measure specific dynamic importance sampling for availability simulations , 1987, WSC '87.

[23]  Donald L. Iglehart,et al.  Importance sampling for stochastic simulations , 1989 .

[24]  P. Glynn,et al.  Varaince reduction in mean time to failure simulations , 1988, 1988 Winter Simulation Conference Proceedings.

[25]  Sandeep Juneja,et al.  Splitting-based importance-sampling algorithm for fast simulation of Markov reliability models with general repair-policies , 2001, IEEE Trans. Reliab..

[26]  William H. Sanders,et al.  Performability Modeling with UltraSAN , 1991, IEEE Softw..

[27]  T. Zajic,et al.  Splitting for rare event simulation: analysis of simple cases , 1996, Proceedings Winter Simulation Conference.

[28]  M. Nakayama Asymptotics of likelihood ratio derivative estimators in simulations of highly reliable Markovian systems , 1995 .

[29]  A. Jensen,et al.  Markoff chains as an aid in the study of Markoff processes , 1953 .

[30]  Philip Heidelberger,et al.  Effective Bandwidth and Fast Simulation of ATM Intree Networks , 1994, Perform. Evaluation.

[31]  J. K. Townsend,et al.  The theory of direct probability redistribution and its application to rare event simulation , 1998, ICC '98. 1998 IEEE International Conference on Communications. Conference Record. Affiliated with SUPERCOMM'98 (Cat. No.98CH36220).

[32]  William H. Sanders,et al.  An environment for importance sampling based on stochastic activity networks , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[33]  Peter W. Glynn,et al.  Simulation and analysis of highly reliable systems , 1990 .

[34]  Stephen S. Lavenberg,et al.  Modeling and Analysis of Computer System Availability , 1987, Computer Performance and Reliability.

[35]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[36]  Peter W. Glynn Likelihood Ratio Derivative Estimators For Stochastic Systems , 1989, 1989 Winter Simulation Conference Proceedings.

[37]  Dirk P. Kroese,et al.  Efficient Estimation of Overflow Probabilities in Queues with Breakdowns , 1998, Perform. Evaluation.

[38]  Douglas R. Miller,et al.  An importance sampling scheme for simulating the degradation and failure of complex systems during finite missions , 1983, WSC '83.

[39]  Philip Heidelberger,et al.  Fast simulation of dependability models with general failure, repair and maintenance processes , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[40]  Paul Glasserman,et al.  Multilevel Splitting for Estimating Rare Event Probabilities , 1999, Oper. Res..

[41]  R. Cogburn A Uniform Theory for Sums of Markov Chain Transition Probabilities , 1975 .

[42]  Peter W. Glynn,et al.  Likelihood Ratio Sensitivity Analysis for Markovian Models of Highly Dependable Systems , 1994, Oper. Res..

[43]  Richard F. Serfozo,et al.  Semi-stationary processes , 1972 .

[44]  Richard R. Muntz,et al.  Bounding availability of repairable computer systems , 1989, SIGMETRICS '89.

[45]  G. Shedler,et al.  Simulation of Nonhomogeneous Poisson Processes by Thinning , 1979 .

[46]  Michael R. Frater,et al.  Optimally efficient estimation of the statistics of rare events in queueing networks , 1991 .

[47]  Mark Brown Error bounds for exponential approximations of geometric convolutions , 1990 .

[48]  Perwez Shahabuddin,et al.  Importance sampling for the simulation of highly reliable Markovian systems , 1994 .

[49]  J. Keilson Markov Chain Models--Rarity And Exponentiality , 1979 .

[50]  Alan Weiss,et al.  Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..

[51]  Marvin K. Nakayama On Derivative Estimation of the Mean Time to Failure in Simulations of Highly Reliable Markovian Systems , 1998, Oper. Res..

[52]  Perwez Shahabuddin Rare event simulation in stochastic models , 1995, WSC '95.

[53]  Robert Geist,et al.  Ultrahigh reliability estimates through simulation , 1989, Proceedings., Annual Reliability and Maintainability Symposium.

[54]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .

[55]  Peter W. Glynn,et al.  Gradient estimation for ratios , 1991, 1991 Winter Simulation Conference Proceedings..

[56]  P. W. Glynn Likelihood ratio derviative estimators for stochastic systems , 1989, WSC '89.

[57]  José Villén-Altamirano,et al.  RESTART: a straightforward method for fast simulation of rare events , 1994, Proceedings of Winter Simulation Conference.

[58]  Stefano Giordano,et al.  Rare event simulation , 2002, Eur. Trans. Telecommun..

[59]  D. Siegmund Importance Sampling in the Monte Carlo Study of Sequential Tests , 1976 .

[60]  Perwez Shahabuddin,et al.  Fast Transient Simulation of Markovian Models of Highly Dependable Systems , 1994, Perform. Evaluation.

[61]  Dirk P. Kroese,et al.  A comparison of RESTART implementations , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[62]  Philip Heidelberger,et al.  Bounded relative error in estimating transient measures of highly dependable non-Markovian systems , 1994, TOMC.

[63]  Kishor S. Trivedi,et al.  Ultrahigh Reliability Prediction for Fault-Tolerant Computer Systems , 1983, IEEE Transactions on Computers.

[64]  Philip Heidelberger,et al.  Fast Simulation of Highly Dependable Systems with General Failure and Repair Processes , 1993, IEEE Trans. Computers.

[65]  Edward Ignall,et al.  Virtual Measures: A Variance Reduction Technique for Simulation , 1975 .

[66]  Nico M. van Dijk,et al.  Guest editorial to the first international workshop on performability modelling of computer and communication systems , 1992 .

[67]  Boudewijn R. Haverkort,et al.  Fault Injection Simulation: A Variance Reduction Technique for Systems with Rare Events , 1992 .

[68]  Peter W. Glynn,et al.  Replication Schemes For Limiting Expectations , 1989, Probability in the Engineering and Informational Sciences.

[69]  A. J. Bayes,et al.  A Minimum Variance Sampling Technique for Simulation Models , 1972, JACM.

[70]  P. Glynn,et al.  Discrete-time conversion for simulating finite-horizon Markov processes , 1990 .

[71]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[72]  Jean Walrand,et al.  A quick simulation method for excessive backlogs in networks of queues , 1989 .

[73]  J.P.C. Kleijnen,et al.  Importance sampling in systems simulation : A practical failure? , 1979 .

[74]  Sandeep Juneja,et al.  Fast Simulation of Markov Chains with Small Transition Probabilities , 2001, Manag. Sci..

[75]  Michael A. Crane,et al.  Simulating Stable Stochastic Systems: III. Regenerative Processes and Discrete-Event Simulations , 1975, Oper. Res..

[76]  Sandeep Juneja,et al.  Fast simulation of Markovian reliability/availability models with general repair policies , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[77]  Kishor S. Trivedi,et al.  Analysis of Stiff Markov Chains , 1989, INFORMS J. Comput..

[78]  Christos Alexopoulos,et al.  The balanced likelihood ratio method for estimating performance measures of highly reliable systems , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[79]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[80]  V. B. Melas,et al.  Branching Technique for Markov Chain Simulation (Finite State Case) , 1994 .

[81]  J. Sadowsky Large deviations theory and efficient simulation of excessive backlogs in a GI/GI/m queue , 1991 .

[82]  R. Rubinstein,et al.  Quick estimation of rare events in stochastic networks , 1997 .

[83]  P. Haas,et al.  Regenerative generalized semi-markov processes , 1987 .

[84]  Marvin K. Nakayama A characterization of the simple failure-biasing method for simulations of highly reliable Markovian Systems , 1994, TOMC.

[85]  Manuel Villén-Altamirano,et al.  Enhancement of the Accelerated Simulation Method RESTART by Considering Multiple Thresholds , 1994 .

[86]  P. Glynn Importance sampling for markov chains: asymptotics for the variance , 1994 .

[87]  Philip Heidelberger,et al.  Modeling and analysis of system dependability using the System Availability Estimator , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[88]  J. Townsend,et al.  Efficient rare event simulation using DPR for multidimensional parameter spaces , 1998 .

[89]  Philip Heidelberger,et al.  Fast simulation of steady-state availability in non-Markovian highly dependable systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[90]  D. Iglehart,et al.  Discrete time methods for simulating continuous time Markov chains , 1976, Advances in Applied Probability.

[91]  N. Dijk On a simple proof of uniformization for continuous and discrete-state continuous-time Markov chains , 1990 .

[92]  Philip Heidelberger,et al.  Efficient estimation of the mean time between failures in non-regenerative dependability models , 1993, WSC '93.

[93]  Stephen G. Strickland,et al.  Optimal Importance Sampling for Quick Simulation of Highly Reliable Markovian Systems , 1993, Proceedings of 1993 Winter Simulation Conference - (WSC '93).

[94]  Gerardo Rubino Network reliability evaluation , 1999 .

[95]  William H. Sanders,et al.  Importance Sampling Simulation in UltraSAN , 1994, Simul..

[96]  Marvin K. Nakayama,et al.  General conditions for bounded relative error in simulations of highly reliable Markovian systems , 1996, Advances in Applied Probability.

[97]  A. J. Bayes Statistical Techniques for Simulation Models , 1970, Aust. Comput. J..

[98]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[99]  V. Kalashnikov Analytical and simulation estimates of reliability for regenerative models , 1990 .

[100]  Ambuj,et al.  Monte Carlo Simulation of Computer System Availability / Reliability Models , 2001 .

[101]  Ralph A. Evans,et al.  IEEE transactions on reliability , 2004, IEEE Transactions on Reliability.

[102]  Philip Heidelberger,et al.  Simultaneous and efficient simulation of highly dependable systems with different underlying distributions , 1992, WSC '92.

[103]  Bruce Chase Shultes Regenerative techniques for estimating performance measures of highly dependable systems with repairs , 1997 .

[104]  Philip Heidelberger,et al.  Fast simulation of rare events in queueing and reliability models , 1993, TOMC.

[105]  Donald Gross,et al.  The Randomization Technique as a Modeling Tool and Solution Procedure for Transient Markov Processes , 1984, Oper. Res..

[106]  P. Glynn,et al.  Discrete-time conversion for simulating semi-Markov processes , 1986 .

[107]  Marvin K. Nakayama Fast simulation methods for highly dependable systems , 1994, Proceedings of Winter Simulation Conference.

[108]  P. Glasserman,et al.  A large deviations perspective on the efficiency of multilevel splitting , 1998, IEEE Trans. Autom. Control..

[109]  Philip Heidelberger,et al.  A Unified Framework for Simulating Markovian Models of Highly Dependable Systems , 1992, IEEE Trans. Computers.

[110]  C. Görg,et al.  Simulating rare event details of ATM delay time distributions with RESTART/LRE , 1999 .

[111]  Marie Cottrell,et al.  Large deviations and rare events in the study of stochastic algorithms , 1983 .

[112]  Juan A. Carrasco Failure distance-based simulation of repairable fault-tolerant systems , 1992 .