Fault-Tolerance Projects at Stanford CRC

This paper describes the fault-tolerant computing research currently active at Stanford University’s Center for Reliable Computing. One focus is on tolerating hardware faults by means of software (software-implemented hardware fault tolerance). This work mainly targets faults caused by radiation induced upsets. An experiment evaluating the techniques that we have developed, is currently running on the ARGOS satellite. Another focus is on fault-tolerance techniques for adaptive computing systems implemented with field-programmable gate arrays (FPGAs).

[1]  James P. Laudon,et al.  Architectural and Implementation Tradeoffs for Multiple-Context Processors , 1995 .

[2]  Michael Paul Kowalski,et al.  USA experiment on the ARGOS satellite: a low-cost instrument for timing x-ray binaries , 1994, Optics & Photonics.

[3]  Edward J. McCluskey,et al.  Software-implemented EDAC protection against SEUs , 2000, IEEE Trans. Reliab..

[4]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[5]  Edward J. McCluskey,et al.  Fault Location in FPGA-Based Reconfigurable Systems , 1998 .

[6]  Edward J. McCluskey,et al.  Finite state machine synthesis with concurrent error detection , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[7]  Nur A. Touba,et al.  A low cost approach for detecting, locating, and avoiding interconnect faults in FPGA-based reconfigurable systems , 1999, Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013).

[8]  Edward J. McCluskey,et al.  PADded cache: a new fault-tolerance technique for cache memories , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[9]  M. S. Hodgart Efficient coding and error monitoring for spacecraft digital memory , 1992 .

[10]  Se June Hong,et al.  Optimal Rectangular Code for High Density Magnetic Tapes , 1974, IBM J. Res. Dev..

[11]  Edward J. McCluskey,et al.  A design diversity metric and reliability analysis for redundant systems , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[12]  R. Koga,et al.  Heavy Ion-Induced Single Event Upsets of Microcircuits; A Summary of the Aerospace Corporation Test Data , 1984, IEEE Transactions on Nuclear Science.

[13]  Jim Gray,et al.  Fault Tolerance in Tandem Computer Systems , 1987 .

[14]  Nur A. Touba,et al.  Configuration self-test in FPGA-based reconfigurable systems , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[15]  Edward J. McCluskey,et al.  Dependable Computing and Online Testing in Adaptive and Configurable Systems , 2000, IEEE Des. Test Comput..