Predeployment validation of fault-tolerant systems through software-implemented fault insertion

Fault injection-based automated testing (FIAT) environment, which can be used to experimentally characterize and evaluate distributed realtime systems under fault-free and faulted conditions is described. A survey is presented of validation methodologies. The need for fault insertion based on validation methodologies is demonstrated. The origins and models of faults, and motivation for the FIAT concept are reviewed. FIAT employs a validation methodology which builds confidence in the system through first providing a baseline of fault-free performance data and then characterizing the behavior of the system with faults present. Fault insertion is accomplished through software and allows faults or the manifestation of faults to be inserted by either seeding faults into memory or triggering error detection mechanisms. FIAT is capable of emulating a variety of fault-tolerant strategies and architectures, can monitor system activity, and can automatically orchestrate experiments involving insertion of faults. There is a common system interface which allows ease of use to decrease experiment development and run time. Fault models chosen for experiments on FIAT have generated system responses which parallel those observed in real systems under faulty conditions. These capabilities are shown by two example experiments each using a different fault-tolerance strategy.

[1]  P. M. Melliar-Smith,et al.  Formal Specification and Mechanical Verification of SIFT: A Fault-Tolerant Flight Control System , 1982, IEEE Transactions on Computers.

[2]  J. G. Mcgough,et al.  Methodology for measurement of fault latency in a digital avionic miniprocessor , 1981 .

[3]  Daniel P. Siewiorek,et al.  Functional Testing of Digital Systems , 1983, 20th Design Automation Conference Proceedings.

[4]  C. Timoc,et al.  Logical Models of Physical Failures , 1983, ITC.

[5]  John C. Cherniavsky,et al.  Validation, Verification, and Testing of Computer Software , 1982, CSUR.

[6]  Yves Crouzet,et al.  Physical Versus Logical Fault Models MOS LSI Circuits: Impact on Their Testability , 1980, IEEE Transactions on Computers.

[7]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[8]  Frank Feather,et al.  Fault-free performance validation of fault-tolerant multiprocessors , 1987 .

[9]  William E. Howden,et al.  Functional Program Testing , 1978, IEEE Transactions on Software Engineering.

[10]  Daniel P. Siewiorek,et al.  Fault-free behavior of reliable multiprocessor systems: FTMP experiments in AIRLAB , 1985 .

[11]  Daniel L. Palumbo,et al.  A technique for evaluating the application of the pin-level stuck-at fault model to VLSI circuits , 1987 .

[12]  J. G. Mcgough,et al.  New results in fault latency modelling , 1983 .

[13]  R. P. Kurlak,et al.  CPU coverage evaluation using automatic fault injection , 1981 .

[14]  Scott Davidson Fault Simulation at the Architectural Level , 1984, ITC.

[15]  LARRY KWOK-WOON LAI Error-oriented architecture testing* , 1979, 1979 International Workshop on Managing Requirements Knowledge (MARK).

[16]  Y. Malaiya Fault Modeling , 1985, IEEE Design & Test of Computers.

[17]  C. C. Beh,et al.  Do Stuck Fault Models Reflect Manufacturing Defects? , 1982, ITC.

[18]  Fausto Fantini,et al.  Reliability problems with VLSI , 1984 .

[19]  Kang G. Shin,et al.  Measurement and Application of Fault Latency , 1986, IEEE Transactions on Computers.

[20]  A. Avizienis,et al.  Dependable computing: From concepts to design diversity , 1986, Proceedings of the IEEE.

[21]  J. Mcgough,et al.  Measurement of fault latency in a digital avionic mini processor, part 2 , 1983 .

[22]  D. A. Rennels,et al.  Fault-tolerance experiments with the JPL STAR computer. , 1972 .

[23]  J. Duane Northcutt The Design and Implementation of Fault Insertion Capabilities for ISPS , 1980, 17th Design Automation Conference.

[24]  Jacob A. Abraham,et al.  Test Generation for Microprocessors , 1980, IEEE Transactions on Computers.

[25]  John Paul Shen,et al.  Inductive Fault Analysis of MOS Integrated Circuits , 1985, IEEE Design & Test of Computers.

[26]  Richard J. Lipton,et al.  Theoretical and empirical studies on using program mutation to test the functional correctness of programs , 1980, POPL '80.

[27]  G. B. Finelli Characterization of Fault Recovery through Fault Injection on FTMP , 1987, IEEE Transactions on Reliability.

[28]  J. Lloyd,et al.  The Relationship Between Electromigration-Induced Short-Circuit and Open-Circuit Failure Times in Multi-Layer VLSI Technologies , 1984, 22nd International Reliability Physics Symposium.

[29]  H. M. Holt,et al.  Flight critical system design guidelines and validation methods , 1984 .

[30]  Stephen Y. H. Su,et al.  Functional Testing Techniques for Digital LSI/VLSI Systems , 1984, 21st Design Automation Conference Proceedings.

[31]  Jaynarayan H. Lala Fault detection, isolation and reconfiguration ff fimp: methods and experimental results , 1983 .

[32]  William P. Birmingham,et al.  Fault Recovery of Triplicated Software on the Intel iAPX 432 , 1985, ICDCS.

[33]  Jacob A. Abraham,et al.  TEST GENERATION FOR GENERAL MICROPROCESSOR ARCHITECTURES. , 1979 .

[34]  Ilan Y. Spillinger,et al.  The Difference Fault Model : Using Functional Fault Simulation to Obtain Implementation Fault Coverage , 1986, ITC.

[35]  R. L. Wadsack,et al.  Fault modeling and logic simulation of CMOS and MOS integrated circuits , 1978, The Bell System Technical Journal.

[36]  H. Hecht,et al.  The fault-tolerant spaceborne computer , 1977 .

[37]  T.E. Mangir,et al.  Sources of failures and yield improvement for VLSI and restructurable interconnects for RVLSI and WSI: Part I—Sources of failures and yield improvement for VLSI , 1984, Proceedings of the IEEE.

[38]  J.A. Abraham,et al.  Fault and error models for VLSI , 1986, Proceedings of the IEEE.

[39]  Scott Davidson,et al.  ESIM/AFS : A Concurrent Architectural Level Fault Simulator , 1986, ITC.