Experimental evaluation of a COTS system for space applications

This paper evaluates the impact of transient errors in the operating system of a COTS-based system (CETIA board with two PowerPC 750 processors running LynxOS) and quantifies their effects at both the OS and at the application level. The study has been conducted using a Software-Implemented Fault Injection tool (Xception) and both realistic programs and synthetic workloads (to focus on specific OS features) have been used. The results provide a comprehensive picture of the impact of faults on LynxOS key features (process scheduling and the most frequent system calls), data integrity, error propagation, application termination, and correctness of application results.

[1]  N. Brody,et al.  1 1 2 , 1996 .

[2]  Jean Arlat,et al.  Fault Injection and Dependability Evaluation of Fault-Tolerant Systems , 1993, IEEE Trans. Computers.

[3]  Daniel S. Katz,et al.  Detailed radiation fault modeling of the Remote Exploration and Experimentation (REE) first generation testbed architecture , 2000, 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484).

[4]  R. R. Some,et al.  REE: a COTS-based fault tolerant parallel processing supercomputer for spacecraft onboard scientific data analysis , 1999, Gateway to the New Millennium. 18th Digital Avionics Systems Conference. Proceedings (Cat. No.99CH37033).

[5]  J. Arlat,et al.  Assessment of COTS microkernels by fault injection , 1999, Dependable Computing for Critical Applications 7.

[6]  Jean Arlat,et al.  MetaKernels and fault containment wrappers , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[7]  Henrique Madeira,et al.  Xception: Software Fault Injection and Monitoring in Processor Functional Units1 , 1995 .

[8]  Daniel P. Siewiorek,et al.  Development of a benchmark to measure system robustness , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[9]  Daniel P. Siewiorek,et al.  Comparing operating systems using robustness benchmarks , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[10]  J. Karlsson,et al.  Application of Three Physical Fault Injection Techniques to the Experimental Assessment of the MARS Architecture , 1995 .

[11]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behavior in computers without error masking , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[12]  Ravishankar K. Iyer,et al.  An experimental evaluation of the REE SIFT environment for spaceborne applications , 2002, Proceedings International Conference on Dependable Systems and Networks.

[13]  Henrique Madeira,et al.  Practical issues in the use of ABFT and a new failure model , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).