Framework for testing the fault-tolerance of systems including OS and network aspects

This paper presents an extensible framework for testing the behavior of networked machines running the Linux operating system in the presence of faults. The framework allows to inject a variety of faults, such as faults in the computing core or peripheral devices of a machine or faults in the network connecting the machines. The system under test as well as the fault- and workload run on this system are configurable. The core of the framework is a User Mode Linux, which runs on top of a real world Linux machine as a single process and simulates a single machine. A second process paired with each virtual machine is used for fault injection. The framework will be supported by utility programs to automate testing and evaluate test results.

[1]  Craig A. Knoblock,et al.  Advanced Programming in the UNIX Environment , 1992, Addison-Wesley professional computing series.

[2]  Daniel P. Siewiorek,et al.  Automated robustness testing of off-the-shelf software components , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[3]  Barton P. Miller,et al.  Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities and Services , 1995 .

[4]  C. Ieee IEEE Standard for Information Technology - Portable Operating System Interface (POSIX): System Application Program Interface (API), Amendment 1: Realtime Extension (C Language), IEEE Std 1003.1b-1993 , 1994 .

[5]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[6]  Volkmar Sieh,et al.  Fault-injector Using Unix Ptrace Interface 1. Introduction 2. Ptrace(2) Interface , 1993 .

[7]  Johan Karlsson,et al.  Fault injection into VHDL models: the MEFISTO tool , 1994 .

[8]  Jeff Dike,et al.  A user-mode port of the Linux kernel , 2000, Annual Linux Showcase & Conference.

[9]  Henrique Madeira,et al.  Xception: Software Fault Injection and Monitoring in Processor Functional Units1 , 1995 .

[10]  Andreas Steininger,et al.  On finding an optimal combination of error detection mechanisms based on results of fault injection experiments , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[11]  Jacob A. Abraham,et al.  FERRARI: a tool for the validation of system dependability properties , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[12]  Henrique Madeira,et al.  RIFLE: A General Purpose Pin-level Fault Injector , 1994, EDCC.

[13]  Wilfrido Alejandro Moreno,et al.  A technique for automated validation of fault tolerant designs using laser fault injection (LFI) , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[14]  Jean Arlat,et al.  MAFALDA: Microkernel Assessment by Fault Injection and Design Aid , 1999, EDCC.

[15]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[16]  Volkmar Sieh,et al.  VERIFY: evaluation of reliability using VHDL-models with embedded fault descriptions , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[17]  J. Karlsson,et al.  Application of Three Physical Fault Injection Techniques to the Experimental Assessment of the MARS Architecture , 1995 .

[18]  Johan Karlsson,et al.  Evaluation of error detection schemes using fault injection by heavy-ion radiation , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.