Improving Reliability and Safety by Trading off Software Failure Criticalities

A number of voters have been proposed for n-version programming diversity designed software systems. The knowledge about various software failure criticalities is not incorporated in the decisions of these voters. Moreover, failure classes contradict among each other with respect to their fault tolerance requirements, as a result, current voters either consider different failures equally or they mask only certain types of failures. Therefore, the voters need to consider system criticalities to different failures based on their fault tolerance requirements trade-off. We propose an approach for trading off system criticalities to different failures. In this approach, we introduce two implementation parameters: the voter constraint hardness and the number of participants in the voting process. We use failure criticalities trade-off to determine the optimal values of these two parameters. This trade-off enhances the ability of a voter to consider different failure criticalities. It also decreases the rate of performance failures. We provide an analysis for the relationships between the implementation parameters and the failure occurrence rate of each failure class. We derive system reliability and safety based on our approach, and we show gains in both of them. The proposed approach can be used to build fault tolerant systems based on n-version programming that use any generic or hybrid voter.

[1]  Karl-Erwin Großpietsch,et al.  An adaptive approach for n-version systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[2]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[3]  Jean Arlat,et al.  Fault Injection and Dependability Evaluation of Fault-Tolerant Systems , 1993, IEEE Trans. Computers.

[4]  Tim Kelly,et al.  Achieving Integrated Process and Product Safety Arguments , 2007, SSS.

[5]  Bev Littlewood,et al.  Software reliability and dependability: a roadmap , 2000, ICSE '00.

[6]  Stuart Bennett,et al.  A taxonomy for software voting algorithms used in safety-critical systems , 2004, IEEE Transactions on Reliability.

[7]  Jon G. Hall,et al.  Towards Normal Design for Safety-Critical Systems , 2007, FASE.

[8]  Peter G. Bishop,et al.  Software Criticality Analysis of COTS/SOUP , 2002, SAFECOMP.

[9]  Anand Ranganathan,et al.  Towards fault tolerance pervasive computing , 2005, IEEE Technology and Society Magazine.

[10]  Douglas R. Smith Comprehension by Derivation , 2005, IWPC.

[11]  Nancy G Leveson,et al.  Software safety: why, what, and how , 1986, CSUR.

[12]  Norman F. Schneidewind,et al.  Reliability Modeling for Safety Critical Software , 1997, Ada-Europe.

[13]  Jon G. Hall,et al.  Developing critical systems with PLD components , 2005, FMICS '05.

[14]  David Lorge Parnas,et al.  Evaluation of safety-critical software , 1990, CACM.

[15]  Brian Randell,et al.  Dependability and its threats - A taxonomy , 2004, IFIP Congress Topical Sessions.

[16]  I-Ling Yen,et al.  Implementation of a customizable fault tolerance framework , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[17]  Meng-Lai Yin,et al.  A design tool for large scale fault-tolerant software systems , 2004, Annual Symposium Reliability and Maintainability, 2004 - RAMS.

[18]  Laura L. Pullum,et al.  Software Fault Tolerance Techniques and Implementation , 2001 .

[19]  John C. Knight,et al.  The essential synthesis of problem frames and assurance cases , 2006, IWAAPF '06.

[20]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[21]  Pierre Sens,et al.  DARX - a framework for the fault-tolerant support of agent software , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[22]  Tim Kelly A Systematic Approach to Safety Case Management , 2004 .