Impact of a Fault Management Architecture on the Performance of a Component-Based System

Increasingly, the application providers are using a separate fault management system that offers out-of-the-box monitoring and alarms support for application instances. A fault management system is usually distributed in nature and consists of a set of management components that does both fault detection and can trigger actions, for example, automatic restart of monitored components. Such a distributed structure supports scalability and helps to ensure that an application meets its quality requirements. However, successful recovery of an application then depends on the fault management architecture and the status of the management components. This paper presents a simulation model that accounts for the effect of management-architecture based coverage on the mean throughput of an application. Such a model would benefit the application providers for choosing the right fault management architecture for their applications.

[1]  Jörn Freiheit,et al.  Petri Net Modelling and Performability Evaluation with TimeNET 3.0 , 2000, Computer Performance Evaluation / TOOLS.

[2]  Rajkumar Buyya,et al.  Dynamically scaling applications in the cloud , 2011, CCRV.

[3]  Kishor S. Trivedi,et al.  Combining Performance and Availability Analysis in Practice , 2012, Adv. Comput..

[4]  Rajkumar Buyya,et al.  Multi-Cloud Provisioning and Load Distribution for Three-Tier Applications , 2014, TAAS.

[5]  Nader Mohamed,et al.  A Survey of Load Balancing in Cloud Computing: Challenges and Algorithms , 2012, 2012 Second Symposium on Network Cloud Computing and Applications.

[6]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[7]  Daniel A. Menascé,et al.  Understanding Cloud Computing: Experimentation and Capacity Planning , 2009, Int. CMG Conference.

[8]  Rajkumar Buyya,et al.  Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments , 2011, 2011 International Conference on Parallel Processing.

[9]  Kishor S. Trivedi,et al.  Composite Performance and Dependability Analysis , 1992, Perform. Evaluation.

[10]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[11]  Kishor S. Trivedi,et al.  A framework for performability modeling of messaging services in distributed systems , 2002, Eighth IEEE International Conference on Engineering of Complex Computer Systems, 2002. Proceedings..

[12]  Philip S. Yu,et al.  Dynamic Load Balancing on Web-Server Systems , 1999, IEEE Internet Comput..

[13]  Gunter Bolch,et al.  Queueing Networks and Markov Chains - Modeling and Performance Evaluation with Computer Science Applications, Second Edition , 1998 .