Use of importance sampling and related techniques to measure very high reliability software

Computer-based control systems have grown more complex over the past two decades. Thus, the software aspects of system reliability are an increasingly important concern. Current methods of software and system reliability prediction-whether measurement based or incorporating reliability growth models-cannot accurately predict failure rates of greater than 10/sup -6/ per mission hour. This paper describes a new methodology for more accurately predicting failure rates of very high reliability systems. The methodology enhances conventional measurement-based reliability assessment with a method incorporating the results of stress testing called importance sampling. By means of importance sampling in conjunction with a system model, acceleration factors can be associated with stress testing much as is currently done with elevated temperature life testing of hardware components.

[1]  Jeff Prosise Programming Windows With MFC , 1996 .

[2]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[3]  John D. Musa,et al.  The operational profile , 1996 .

[4]  Jeffrey M. Voas,et al.  A 'Crystal Ball' for Software Liability , 1997, Computer.

[5]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[6]  Dimitri Kececioglu,et al.  Reliability and Life Testing Handbook , 1992 .

[7]  J. Voas Software fault injection: growing 'safer' systems , 1997, 1997 IEEE Aerospace Conference.

[8]  Ravishankar K. Iyer,et al.  Wear-out simulation environment for VLSI designs , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[9]  Bev Littlewood,et al.  Predicting software reliability , 1989, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.

[10]  H. Hecht Rare conditions-an important cause of failures , 1993, COMPASS '93: Proceedings of the Eighth Annual Conference on Computer.

[11]  Ravishankar K. Iyer,et al.  Experimental analysis of computer system dependability , 1996 .

[12]  Ravishankar K. Iyer,et al.  Analysis of software halts in the tandem GUARDIAN operating system , 1992, [1992] Proceedings Third International Symposium on Software Reliability Engineering.

[13]  Bev Littlewood,et al.  Evaluation of competing software reliability predictions , 1986, IEEE Transactions on Software Engineering.

[14]  Dong Tang,et al.  MEADEP and its applications in evaluating dependability for air traffic control systems , 1998, Annual Reliability and Maintainability Symposium. 1998 Proceedings. International Symposium on Product Quality and Integrity.

[15]  Jean Arlat,et al.  Fault injection for formal testing of fault tolerance , 1996, IEEE Trans. Reliab..

[16]  David A. Yaskin,et al.  Fault tolerance testing in the Advanced Automation System , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[17]  John D. Musa,et al.  Software reliability measurement , 1984, J. Syst. Softw..

[18]  Mark Sullivan,et al.  Software defects and their impact on system availability-a study of field failures in operating systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[19]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[20]  Jean-Claude Laprie,et al.  Dependable computing: concepts, limits, challenges , 1995 .

[21]  Ravishankar K. Iyer,et al.  Analysis of the VAX/VMS error logs in multicomputer environments-a case study of software dependability , 1992, [1992] Proceedings Third International Symposium on Software Reliability Engineering.

[22]  Daniel P. Siewiorek,et al.  Development of a benchmark to measure system robustness , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[23]  I. Lee,et al.  Measurement-based evaluation of operating system fault tolerance , 1993 .

[24]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.

[25]  Jeffrey M. Voas,et al.  Predicting How Badly "Good" Software Can Behave , 1997, IEEE Softw..

[26]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[27]  Mei-Chen Hsueh,et al.  A measurement-based model of software reliability in a production environment , 1987 .

[28]  G. B. Finelli,et al.  The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software , 1993, IEEE Trans. Software Eng..

[29]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[30]  William Farr,et al.  Software reliability modeling survey , 1996 .

[31]  Dong Tang,et al.  Evaluation of software dependability based on stability test data , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[32]  Dong Tang,et al.  A methodology and tool for measurement-based dependability evaluation of digital I and C systems in critical applications , 1995 .

[33]  J-C. Laprie,et al.  DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[34]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[35]  Myron Hecht,et al.  MEADEP and its application in dependability analysis for a nuclear power plant safety system , 1997 .

[36]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[37]  Philip Heidelberger,et al.  A Unified Framework for Simulating Markovian Models of Highly Dependable Systems , 1992, IEEE Trans. Computers.

[38]  Ravishankar K. Iyer,et al.  Software Dependability in the Tandem GUARDIAN System , 1995, IEEE Trans. Software Eng..

[39]  John D. Musa,et al.  Sensitivity of field failure intensity to operational profile errors , 1994, Proceedings of 1994 IEEE International Symposium on Software Reliability Engineering.

[40]  David F. McAllister,et al.  An Experimental Evaluation of Software Redundancy as a Strategy For Improving Reliability , 1991, IEEE Trans. Software Eng..

[41]  Jean-Claude Laprie,et al.  Qualitative and Quantitative Reliability Assessment , 1997, IEEE Softw..

[42]  John D. Musa,et al.  The operational profile in software reliability engineering: an overview , 1992, [1992] Proceedings Third International Symposium on Software Reliability Engineering.

[43]  Alberto Pasquini,et al.  Sensitivity of reliability growth models to operational profile errors , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[44]  H. Hecht,et al.  Rare conditions and their effect on software failures , 1994, Proceedings of Annual Reliability and Maintainability Symposium (RAMS).

[45]  Ram Chillarege,et al.  Understanding large system failures-a fault injection experiment , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[46]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.