Demonstration of the remote exploration and experimentation (REE) fault-tolerant parallel-processing supercomputer for spacecraft onboard scientific data processing

Concerns a demonstration of the REE Project's work to date. The demonstration is intended to simulate an REE system that might exist on a Mars rover, consisting of multiple COTS processors, a COTS network, a COTS node-level operating system, REE middleware, and an REE application. The specific application performs texture processing of images. It was chosen as a building block of automated geological processing that will eventually be used for both navigation and data processing. Because the COTS hardware is not radiation hardened, single-event-upset-induced soft errors will occur. These errors are simulated in the demonstration by use of a software-implemented fault-injector, and are injected at a rate much higher than is realistic for the sake of viewer interest. Both the application and the middleware contain mechanisms for both detection of and recovery from these faults, and these mechanisms are tested by this very high fault-rate. The consequence of the REE system being able to tolerate this fault rate while continuing to process data is that the system will easily be able to handle the true fault rate.

[1]  Rebecca Castano,et al.  Texture analysis for Mars rover images , 1999, Optics & Photonics.

[2]  Ravishankar K. Iyer,et al.  Chameleon: A Software Infrastructure for Adaptive Fault Tolerance , 1999, IEEE Trans. Parallel Distributed Syst..

[3]  Daniel S. Katz,et al.  Detailed radiation fault modeling of the Remote Exploration and Experimentation (REE) first generation testbed architecture , 2000, 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484).

[4]  Michael J. Turmon,et al.  Algorithm-based fault tolerance for spaceborne computing: basis and implementations , 2000, 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484).

[5]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[6]  Ravishankar K. Iyer,et al.  Hierarchical Error Detection in a Software Implemented Fault Tolerance (SIFT) Environment , 2000, IEEE Trans. Knowl. Data Eng..

[7]  Carl Kesselman,et al.  Generalized communicators in the Message Passing Interface , 1996, Proceedings. Second MPI Developer's Conference.