Computing the Number of Calls Dropped Due to Failures

Defects per million (DPM), defined as the number of calls out of a million dropped due to failures, is an important service (un)reliability measure for telecommunication systems. Most previous research derives the DPM from steady-state system availability model. In this paper, we develop a novel method for DPM computation which takes into consideration not only system availability, but also the impact of service application as well as the transient behavior of failure recovery. We illustrate this approach using a real system which is the IBM SIP SLEE cluster. Our method takes into account software/hardware failures, different stages of recovery, different phases of call flow, retry attempts and the interactions between call flow and failure/recovery behavior.

[1]  Magnos Martinello,et al.  A user-perceived availability evaluation of a web based travel agency , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[2]  Jeremiah F. Hayes,et al.  Modeling and Analysis of Telecommunications Networks , 2004 .

[3]  Magnos Martinello,et al.  Availability modeling and evaluation of web-based services - A pragmatic approach , 2005 .

[4]  Myron Hlynka,et al.  Queueing Networks and Markov Chains (Modeling and Performance Evaluation With Computer Science Applications) , 2007, Technometrics.

[5]  Kishor S. Trivedi,et al.  Performance and reliability evaluation of passive replication schemes in application level fault tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[6]  Galal El Mahdy Disaster Management in Telecommunications, Broadcasting and Computer Systems , 2001 .

[7]  Yaakov Kogan,et al.  VoIP reliability: a service provider's perspective , 2004, IEEE Communications Magazine.

[8]  Kishor S. Trivedi,et al.  Availability Modeling of SIP Protocol on IBM© WebSphere© , 2008, 2008 14th IEEE Pacific Rim International Symposium on Dependable Computing.

[9]  Syed Riffat Ali Digital Switching Systems: System Reliability and Analysis , 1997 .

[10]  Peter Stavroulakis Reliability, survivability and quality of large scale telecommunication systems : case study: Olympic Games , 2002 .

[11]  Veena B. Mendiratta Reliability analysis of clustered computing systems , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).