Reliability prediction for fault-tolerant software architectures

Software fault tolerance mechanisms aim at improving the reliability of software systems. Their effectiveness (i.e., reliability impact) is highly application-specific and depends on the overall system architecture and usage profile. When examining multiple architecture configurations, such as in software product lines, it is a complex and error-prone task to include fault tolerance mechanisms effectively. Existing approaches for reliability analysis of software architectures either do not support modelling fault tolerance mechanisms or are not designed for an efficient evaluation of multiple architecture variants. We present a novel approach to analyse the effect of software fault tolerance mechanisms in varying architecture configurations. We have validated the approach in multiple case studies, including a large-scale industrial system, demonstrating its ability to support architecture design, and its robustness against imprecise input data.

[1]  Dai Pan,et al.  Architecture-based software reliability modeling , 2006, J. Syst. Softw..

[2]  Kishor S. Trivedi,et al.  Accurate and efficient stochastic reliability analysis of composite services using their compact Markov reward model representations , 2007, IEEE International Conference on Services Computing (SCC 2007).

[3]  Kishor S. Trivedi,et al.  Reliability and Performance of Component Based Software Systems with Restarts, Retries, Reboots and Repairs , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[4]  Heiko Koziolek,et al.  Parameterized Reliability Prediction for Component-Based Software Architectures , 2010, QoSA.

[5]  Ralf H. Reussner,et al.  Reliability prediction for component-based software architectures , 2003, J. Syst. Softw..

[6]  Steffen Becker,et al.  The Palladio component model for model-driven performance prediction , 2009, J. Syst. Softw..

[7]  Eila Niemelä,et al.  Survey of reliability and availability prediction methods from the viewpoint of software architecture , 2007, Software & Systems Modeling.

[8]  Karama Kanoun,et al.  Fault-tolerant system dependability-explicit modeling of hardware and software component-interactions , 2000, IEEE Trans. Reliab..

[9]  Bojan Cukic,et al.  Early reliability assessment of UML based software models , 2002, WOSP '02.

[10]  Hany H. Ammar,et al.  Architectural-Level Risk Analysis Using UML , 2003, IEEE Trans. Software Eng..

[11]  Simona Bernardi,et al.  A dependability profile within MARTE , 2011, Software & Systems Modeling.

[12]  Carlo Ghezzi,et al.  Reliability Analysis of Component-Based Systems with Multiple Failure Modes , 2010, CBSE.

[13]  Stefan Kowalewski,et al.  Reliability-Oriented Product Line Engineering of Embedded Systems , 2001, PFE.

[14]  Roger C. Cheung,et al.  A User-Oriented Software Reliability Model , 1978, IEEE Transactions on Software Engineering.

[15]  Kishor S. Trivedi,et al.  Quantifying software performance, reliability and security: An architecture-based approach , 2007, J. Syst. Softw..

[16]  Henry Muccini,et al.  Architecting Fault Tolerant Systems , 2007, 2007 Working IEEE/IFIP Conference on Software Architecture (WICSA'07).

[17]  Heiko Koziolek,et al.  A Large-Scale Industrial Case Study on Architecture-Based Software Reliability Analysis , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[18]  Swapna S. Gokhale,et al.  Architecture-Based Software Reliability Analysis: Overview and Limitations , 2007, IEEE Transactions on Dependable and Secure Computing.

[19]  Vojislav B. Misic,et al.  Extending the ATAM Architecture Evaluation to Product Line Architectures , 2005, 5th Working IEEE/IFIP Conference on Software Architecture (WICSA'05).

[20]  Anne Immonen,et al.  A Method for Predicting Reliability and Availability at the Architecture Level , 2006, Software Product Lines.

[21]  Zibin Zheng,et al.  Collaborative reliability prediction of service-oriented systems , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[22]  Bianca Schroeder,et al.  Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? , 2007, TOS.

[23]  Nenad Medvidovic,et al.  Early prediction of software component reliability , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[24]  Katerina Goseva-Popstojanova,et al.  Architecture-based approach to reliability assessment of software systems , 2001, Perform. Evaluation.

[25]  Josh Dehlinger,et al.  PLFaultCAT: A Product-Line Software Fault Tree Analysis Tool , 2006, Automated Software Engineering.

[26]  Paul Clements,et al.  Software product lines - practices and patterns , 2001, SEI series in software engineering.

[27]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.