An adaptive approach to accelerated evaluation of highly available services

We motivate and describe improved fast simulation techniques for the accelerated performance evaluation of highly available services. In systems that provide such services, service unavailability events are rare due to a low component failure rate or high resource capacity. Using traditional Monte Carlo simulation to evaluate such services requires a large amount of runtime. Importance sampling (IS) has been applied to certain instances of such systems, focusing on single-class and/or homogeneous resource demands. In this article, we formulate highly available services as multiresource losstype systems, and we present two IS methods for fast simulation, extending to multiple classes and nonhomogeneous resource demands. First, for the cases in which component failure rates are small, we prove that static IS using the Standard Clock (S-ISSC) method exhibits the bounded relative error (BRE) property. Second, for estimating failure probabilities due to large capacity or fast service in systems that have nonrare component failure rates, we propose adaptive ISSC (A-ISSC), which estimates the relative probability of reaching each possible state of system failure in every step of the simulation. Using A-ISSC, IS methods which are proven to be efficient can be extended to multidimensional cases, while still retaining a very favorable performance, as supported by our validation experiments.

[1]  Hamid Bagheri,et al.  Parallel optical interconnects for enterprise class server clusters: needs and technology solutions , 2003, IEEE Commun. Mag..

[2]  Christos Alexopoulos,et al.  Estimating reliability measures for highly-dependable Markov systems, using balanced likelihood ratios , 2001, IEEE Trans. Reliab..

[3]  Poul E. Heegaard Adaptive optimisation of importance sampling for multi-dimensional state space models with irregular resource boundaries , 2007 .

[4]  Marvin K. Nakayama,et al.  Techniques for fast simulation of models of highly dependable systems , 2001, IEEE Trans. Reliab..

[5]  Keith W. Ross,et al.  Multiservice Loss Models for Broadband Telecommunication Networks , 1997 .

[6]  Pirooz Vakili,et al.  Using a standard clock technique for efficient simulation , 1991, Oper. Res. Lett..

[7]  Michael Devetsikiotis,et al.  Fast simulation of networks of queues with effective and decoupling bandwidths , 1999, TOMC.

[8]  Donald L. Iglehart,et al.  Importance sampling for stochastic simulations , 1989 .

[9]  Philip Heidelberger,et al.  A Unified Framework for Simulating Markovian Models of Highly Dependable Systems , 1992, IEEE Trans. Computers.

[10]  P. Glynn,et al.  Discrete-time conversion for simulating semi-Markov processes , 1986 .

[11]  Jorma T. Virtamo,et al.  Nearly optimal importance sampling for Monte Carlo simulation of loss systems , 2000, TOMC.

[12]  Philip Heidelberger,et al.  Fast simulation of rare events in queueing and reliability models , 1993, TOMC.

[13]  Paul Glasserman,et al.  Multilevel Splitting for Estimating Rare Event Probabilities , 1999, Oper. Res..

[14]  Poul E. Heegaard A SCHEME FOR ADAPTIVE BIASING IN IMPORTANCE SAMPLING , 1997 .

[15]  Michel Mandjes,et al.  Fast simulation of blocking probabilities in loss networks , 1997 .

[16]  Lachlan L. H. Andrew,et al.  Estimation of blocking probabilities in cellular networks with dynamic channel assignment , 2002, TOMC.

[17]  Poul E. Heegaard,et al.  Efficient simulation of network performance by importance sampling , 1997 .

[18]  George N. Rouskas,et al.  Traffic grooming in WDM networks: past and future , 2002, IEEE Netw..

[19]  Mansoor Shafi,et al.  Quick Simulation: A Review of Importance Sampling Techniques in Communications Systems , 1997, IEEE J. Sel. Areas Commun..

[20]  Juan A. Carrasco,et al.  Failure Transition Distance-Based Importance Sampling Schemes for theSimulation of Repairable Fault-Tolerant Computer Systems , 2006, IEEE Transactions on Reliability.

[21]  Philip Heidelberger,et al.  Fast simulation of packet loss rates in a shared buffer communications switch , 1995, TOMC.

[22]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[23]  Lachlan L. H. Andrew,et al.  Fast simulation of wavelength continuous WDM networks , 2004, IEEE/ACM Transactions on Networking.

[24]  Perwez Shahabuddin,et al.  Importance sampling for the simulation of highly reliable Markovian systems , 1994 .

[25]  Pasi Lassila,et al.  Efficient Importance Sampling for Monte Carlo Simulation of Loss Systems , 1999 .

[26]  Michael Devetsikiotis,et al.  Approximation techniques for the analysis of large traffic-groomed tandem optical networks , 2005, 38th Annual Simulation Symposium.

[27]  Walter J. Gutjahr,et al.  Importance Sampling of Test Cases in Markovian Software Usage Models , 1997, Probability in the Engineering and Informational Sciences.

[28]  Michael Devetsikiotis,et al.  An adaptive approach to fast simulation of traffic groomed optical networks , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..

[29]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[30]  Francis Tam On the development of an open standard for highly available telecommunication infrastructure systems , 2002, Proceedings. 28th Euromicro Conference.