Error injection aimed at fault removal in fault tolerance mechanisms-criteria for error selection using field data on software faults

Fault injection allows a detailed study of complex interactions between faults and fault handling mechanisms. It can be a useful complement to analytical modeling and formal verification techniques in the testing of fault tolerant systems. However, work on fault injection has not matured adequately to provide industry with cost effective alternatives for the validation of fault tolerant systems. This study analyzes 408 customer discovered faults (defects) in a release of a large operating system product. We discuss methods to select the error types for an error injection experiment in the system test environment, aimed at fault removal. Using four levels of severity and a total of 24 error types as recorded in the customer defects records, we analyze the faults in terms of fault types and system test triggers as defined in ODC. Our work shows examples of criteria that can be used to select errors for injection that use the information from the field reported defects.

[1]  Albert Endres An analysis of errors and their causes in system programs , 1975 .

[2]  Albert Endres,et al.  An analysis of errors and their causes in system programs , 1975, IEEE Transactions on Software Engineering.

[3]  Edward N. Adams,et al.  Optimizing Preventive Service of Software Products , 1984, IBM J. Res. Dev..

[4]  Ram Chillarege,et al.  Understanding large system failures-a fault injection experiment , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[5]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[6]  Harlan D. Mills,et al.  Engineering software under statistical quality control , 1990, IEEE Software.

[7]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[8]  David A. Yaskin,et al.  Fault tolerance testing in the Advanced Automation System , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[9]  Yinong Chen,et al.  Evaluation of deterministic fault injection for fault-tolerant protocol testing , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[10]  Jean Arlat,et al.  Fault injection for the formal testing of fault tolerance , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[11]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[12]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[13]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[14]  Inderpal S. Bhandari,et al.  Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..

[15]  D. P. Siewiorek,et al.  Evaluation and comparison of fault-tolerant software techniques , 1993 .

[16]  John D. Musa,et al.  Operational profiles in software-reliability engineering , 1993, IEEE Software.

[17]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[18]  Jean Arlat,et al.  Fault Injection and Dependability Evaluation of Fault-Tolerant Systems , 1993, IEEE Trans. Computers.

[19]  Ravishankar K. Iyer,et al.  Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[20]  Yinong Chen Testing and evaluation fault tolerant protocols by deterministic fault injection , 1993 .

[21]  Inderpal S. Bhandari,et al.  In-Process Evaluation for Software Inspection and Test , 1993, IEEE Trans. Software Eng..

[22]  Ravishankar K. Iyer,et al.  Experimental evaluation , 1995 .

[23]  Ram Chillarege,et al.  Measurement of failure rate in widely distributed software , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[24]  Ram Chillarege,et al.  Generation of an error set that emulates software faults based on field data , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.