Guaranteeing service availability in SLAs; a study of the risk associated with contract period and failure process

Service Level Agreements (SLAs) are a common means to define the obligations of network/service providers and users in business relationships. The terms that define the guaranteed availability for a given period are an important element of these contracts. The appropriate values selection is difficult due to the large number of variables involved, the complexities of the network and service provision and the computational challenge posed by the transient solution, as opposed to a steady state, that is needed. A common policy taken to solve it, is using the steady state availability as a reference. Nevertheless this simplification may put on risk the contract fulfillment as stochastic variation of the measured availability is significant over a typical contract period. This paper analyzes the relevance that the interval availability analysis has on SLAs, and provides suggestions to the network providers on the selection of adequate availability guarantees. The interval availability of unprotected and shared protected connections is studied under exponential and Weibull failure and repair distributions. It is observed that for a single path scenario, a small reduction of the guaranteed availability below the steady state value improve the probability to meet the requirements considerably. The same is the case for connections with shared backup protection. However performing this analysis in the transient domain is quite demanding. Hence, to simplify it, it is proposed to obtain the steady state results and introduce a safeguard factor to control that the availability guarantee is meet. For the Weibull distributed times between failures, where the shape factor is less than one (as observed in operational networks), the probability of meeting a guaranteed availability over a finite contract period, decrease more radically than for the commonly assumed Poisson failure process. This increases the importance of making a transient analysis.

[1]  A. Goyal,et al.  A Measure of Guaranteed Availability and its Numerical Evaluation , 1988, IEEE Trans. Computers.

[2]  Lemin Li,et al.  Routing connections with differentiated reliability requirements in WDM mesh networks , 2009, TNET.

[3]  Pin-Han Ho,et al.  Spare Capacity Reprovisioning for Shared Backup Path Protection in Dynamic Generalized Multi-Protocol Label Switched Networks , 2008, IEEE Transactions on Reliability.

[4]  Gerardo Rubino,et al.  Interval Availability Analysis Using Denumerable Markov Processes: Application to Multiprocessor Subject to Breakdowns and Repair , 1995, IEEE Trans. Computers.

[5]  H. Waldman,et al.  Specification of SLA survivability requirements for optical path protected connections , 2006, 2006 Optical Fiber Communication Conference and the National Fiber Optic Engineers Conference.

[6]  B.E. Helvik,et al.  On provision of availability guarantees using shared protection , 2008, 2008 International Conference on Optical Network Design and Modeling.

[7]  Bjarne E. Helvik,et al.  Adaptive management of connections to meet availability guarantees in SLAs , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management.

[8]  Darli A. A. Mello,et al.  A matrix-based analytical approach to connection unavailability estimation in shared backup path protection , 2005, IEEE Communications Letters.

[9]  Edmundo de Souza e Silva,et al.  Calculating Cumulative Operational Time Distributions of Repairable Computer Systems , 1986, IEEE Transactions on Computers.

[10]  Andrea Fumagalli,et al.  Shared path protection with differentiated reliability , 2002, 2002 IEEE International Conference on Communications. Conference Proceedings. ICC 2002 (Cat. No.02CH37333).

[11]  Minzhe Li,et al.  A Scalable Path Protection Mechanism for Guaranteed Network Reliability Under Multiple Failures , 2007, IEEE Transactions on Reliability.

[12]  Chen-Nee Chuah,et al.  Characterization of Failures in an Operational IP Backbone Network , 2008, IEEE/ACM Transactions on Networking.

[13]  B.E. Helvik,et al.  Provision of connection-specific availability guarantees in communication networks , 2007, 2007 6th International Workshop on Design and Reliable Communication Networks.

[14]  Bjarne E. Helvik,et al.  Comparison of Schemes for Provision of Differentiated Availability-Guaranteed Services Using Dedicated Protection , 2008, Seventh International Conference on Networking (icn 2008).

[15]  William H. Sanders,et al.  Dependability evaluation using UltraSAN , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[16]  Bjarne E. Helvik,et al.  A survey of resilience differentiation frameworks in communication networks , 2007, IEEE Communications Surveys & Tutorials.