Meeting SLA Availability Guarantees through Engineering Margin

This research deals with availability of Service Level Agreements (SLA) between information technology service providers, such as carriers, and user corporations. A combination of different reliability/maintainability scenarios and time intervals have been used to generate availability distributions in order to assess the efficacy of availability guarantees. Markov and semi-Markov models are used to provide a comparative analysis of the availability distributions and probability of service level violation. In the semi-Markov model, long tail and short tail lognormal distributions are used for the repair time. A Monte Carlo simulation program is developed and from the generated distributions it is found that there are significant chances of SLA violation, given a wide range of different reliability/maintainability levels used to achieve typical goals such as 0.99999 availability for a variety of time intervals. The results are fairly sensitive to the tail of the repair distribution, meaning it is essential to understand repair distribution in order to assess chances of SLA violation. Results indicate that the way to dramatically decrease the probability of SLA violation is through availability engineering margin. Delivering availability beyond 0.99999 through redundancy and responding more rapidly to failures greatly diminishes the chances of violating a 0.99999 SLA.

[1]  R. Schafer Bayesian Reliability Analysis , 1983 .

[2]  D. Coit,et al.  Gamma distribution parameter estimation for field reliability data with missing failure times , 2000 .

[3]  H. Martz Bayesian reliability analysis , 1982 .

[4]  A AnandaMalwaneM.,et al.  On steady state availability of a system with lognormal repair time , 2004 .

[5]  Malwane M. A. Ananda Confidence intervals for steady state availability of a system with exponential operating time and lognormal repair time , 2003, Appl. Math. Comput..

[6]  Ching-Lai Hwang,et al.  Availability of Maintained Systems: A State-of-the-Art Survey , 1977 .

[7]  Gary R. Weckman,et al.  What Are the Chances an Availability SLA will be Violated? , 2007, Sixth International Conference on Networking (ICN'07).

[8]  Malwane M. A. Ananda,et al.  On steady state availability of a system with lognormal repair time , 2004, Appl. Math. Comput..

[9]  Lorenzo Donatiello,et al.  Closed-Form Solution for System Availability Distribution , 1987, IEEE Transactions on Reliability.

[10]  M. C. van der Heijden,et al.  Preventive maintenance and the interval availability distribution of an unreliable production system , 1999 .

[11]  Chris Oggerino High Availability Network Fundamentals , 2001 .

[12]  R. Natarajan,et al.  Confidence limits for steady state availability of systems with lognormal operating time and inverse Gaussian repair time , 1997 .

[13]  A. Goyal,et al.  A Measure of Guaranteed Availability and its Numerical Evaluation , 1988, IEEE Trans. Computers.

[14]  Nathan J. Muller Managing service level agreements , 1999, Int. J. Netw. Manag..

[15]  Singiresu S Rao,et al.  Availability Analysis of Single-Component Systems with Various Failure and Repair Distributions , 2003 .