Classification and Impact Analysis of Faults in Automated System Management

The reliability of automated system management solutions will increase in importance as the use of cloud computing and data centres expands. As part of a study to improve reliability, this paper provides a classification of faults that can occur in automated system management and proposes a method for determining the severity of such faults. A baseline deployment is compared with an alternate proposed configuration to determine the difference in reliability. The results gained show a significant improvement over the baseline. While it is still in development, the method is able to determine and compare the reliability of deployment configurations from early in the design process.

[1]  Bedir Tekinerdogan,et al.  Software Architecture Reliability Analysis Using Failure Scenarios , 2005, 5th Working IEEE/IFIP Conference on Software Architecture (WICSA'05).

[2]  Hairong Sun,et al.  Impact of fault management server and its failure-related parameters on high-availability communication systems , 2002, Proceedings International Conference on Dependable Systems and Networks.

[3]  P Haapanen,et al.  Failure mode and effects analysis of software-based automation systems , 2002 .

[4]  Raja Parasuraman,et al.  Human-Automation Interaction , 2005 .

[5]  R. Nolan,et al.  Information technology and the board of directors. , 2005, Harvard business review.

[6]  David Sinreich,et al.  An architectural blueprint for autonomic computing , 2006 .

[7]  Michael R. Lyu Software Reliability Engineering: A Roadmap , 2007, Future of Software Engineering (FOSE '07).

[8]  Dilma Da Silva,et al.  Blue Eyes: Scalable and reliable system management for cloud computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Eser Kandogan,et al.  Field studies of computer system administrators: analysis of system management tools and practices , 2004, CSCW.

[10]  Manuel Gil Perez,et al.  Decision Support Console for System Administration Based on an Expert System Approach , 2009, 2009 Second International Conference on Dependability.

[11]  Saïda Benlarbi,et al.  Measuring Software Reliability in Practice: An Industry Case Study , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[12]  Pierre P. Duez,et al.  Trust by design: information requirements for appropriate trust in automation , 2006, CASCON.

[13]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[14]  Wouter Joosen,et al.  Survey of configuration management tools , 2007 .