Measuring the impact of data center failures on a cloud‐based emergency medical call system

Emergency call services are expected to be highly available in order to minimize the loss of urgent calls and, as a consequence, minimize loss of life due to lack of timely medical response. This service availability depends heavily on the cloud data center on which it is hosted. However, availability information alone cannot provide sufficient understanding of how failures impact the service and users' perception. In this paper, we evaluate the impact of failures on an emergency call system, considering service‐level metrics such as the number of affected calls per failure and the time an emergency service takes until it recovers from a failure. We analyze a real data set from an emergency call center for a large Brazilian city. From stochastic models that represent a cloud data center, we evaluate different data center architectures to observe the impact of failures on the emergency call service. Results show that changing data center's architecture in order to improve availability from two to three nines cannot decrease the average number of affected calls per failure. On the other hand, it can decrease the probability to affect a considerable number of calls at the same time.

[1]  T. Altiok On the Phase-Type Approximations of General Distributions , 1985 .

[2]  Myron Hlynka,et al.  Queueing Networks and Markov Chains (Modeling and Performance Evaluation With Computer Science Applications) , 2007, Technometrics.

[3]  Judith Kelner,et al.  Analyzing the IT subsystem failure impact on availability of cloud services , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[4]  Glauco Estácio Gonçalves,et al.  Highly Available Clouds: System Modeling, Evaluations, and Open Challenges , 2017, Research Advances in Cloud Computing.

[5]  Gustavo Rau de Almeida Callou,et al.  Estimating sustainability impact of high dependable data centers: a comparative study between Brazilian and US energy mixes , 2013, Computing.

[6]  Hervé Pingaud,et al.  Improving the Management of an Emergency Call Service by Combining Process Mining and Discrete Event Simulation Approaches , 2015, PRO-VE.

[7]  Judith Kelner,et al.  Evaluating the cooling subsystem availability on a Cloud data center , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[8]  Gustavo Rau de Almeida Callou,et al.  Models for dependability and sustainability analysis of data center cooling architectures , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012).

[9]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[10]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[11]  Andrea Bobbio,et al.  Reliability and Availability Engineering - Modeling, Analysis, and Applications , 2017 .

[12]  Alan A. Desrochers,et al.  Applications of Petri Nets in Manufacturing Systems: Modeling, Control, and Performance Analysis , 1994 .

[13]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[14]  Judith Kelner,et al.  How to Improve Cloud Services Availability? Investigating the Impact of Power and It Subsystems Failures , 2018, HICSS.

[15]  Paulo Romero Martins Maciel,et al.  Performability evaluation of emergency call center , 2014, Perform. Evaluation.

[16]  Judith Kelner,et al.  Modeling and analyzing power system failures on cloud services , 2017, 2017 13th International Conference on Network and Service Management (CNSM).

[17]  Sandjai Bhulai,et al.  A simulation model for emergency medical services call centers , 2015, 2015 Winter Simulation Conference (WSC).