System Management Services for High-Performance In-situ Aerospace Computing

With the ever-increasing demand for higher bandwidth and processing capacity of today’s space exploration, space science, and defense missions, the ability to efficiently apply commercial-off-the-shelf technology for on-board computing is now a critical need. In response to this need, NASA’s New Millennium Program office has commissioned the development of the Dependable Multiprocessor for use in payload and robotic missions. The Dependable Multiprocessor system provides power-efficient, high-performance, faulttolerant cluster computing resources in a cost-effective and scalable manner. As a major step toward the flight system to be launched in 2009, Honeywell and the University of Florida have successfully investigated and developed a management system and associated middleware components to make the processing of science-mission data as easy in space as it is in ground-based clusters. This paper provides a detailed description of the Dependable Multiprocessor’s middleware technology and experimental results validating the concept and demonstrating the system’s scalability even in the presence of faults.

[1]  Daniel J. Dechant The Advanced Onboard Signal Processor (AOSP) , 1990, J. VLSI Signal Process..

[2]  Hugh L. Dryden,et al.  THE NATIONAL AERONAUTICS AND SPACE ADMINISTRATION , 1958 .

[3]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[4]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[5]  Yuval Tamir,et al.  FAULT-TOLERANT CLUSTER MANAGEMENT FOR RELIABLE HIGH-PERFORMANCE COMPUTING , 2001 .

[6]  Sam Fuller RapidIO®: The Embedded System Interconnect: Fuller/RapidIO: The Embedded System Interconnect , 2005 .

[7]  Pierre Sens,et al.  DARX - a framework for the fault-tolerant support of agent software , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[8]  Anwar Dawood,et al.  Reconfigurable onboard processing and real-time remote sensing , 2003 .

[9]  Larry Rudolph,et al.  Evaluation of Design Choices for Gang Scheduling Using Distributed Hierarchical Control , 1996, J. Parallel Distributed Comput..

[10]  Alan D. George,et al.  Adaptable and Autonomic Mission Manager for Dependable Aerospace Computing , 2006, 2006 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing.

[11]  Alan D. George,et al.  FEMPI: A Lightweight Fault-tolerant MPI for Embedded Cluster Systems , 2006, ESA.

[12]  E. R. Prado,et al.  A standard product approach to spaceborne payload processing , 2001, 2001 IEEE Aerospace Conference Proceedings (Cat. No.01TH8542).

[13]  Alan D. George,et al.  CARMA: A Comprehensive Management Framework for High-Performance Reconfigurable Computing , 2004 .

[14]  Alan D. George,et al.  Compile- and Run-Time Services for Distributed Hetergeneous Reconfigurable Computing , 2006, ERSA.

[15]  Ravishankar K. Iyer,et al.  The Effects of an ARMOR-based SIFT environment on the performance and dependability of user applications , 2004, IEEE Transactions on Software Engineering.

[16]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[17]  Sam Fuller RapidIO: The Embedded System Interconnect , 2004 .

[18]  M. Patel,et al.  High Performance Dependable Multiprocessor II , 2007, 2007 IEEE Aerospace Conference.

[19]  A.D. George,et al.  Hardware/software interface for high-performance space computing with FPGA coprocessors , 2006, 2006 IEEE Aerospace Conference.

[20]  Daniel S. Katz,et al.  Demonstration of the remote exploration and experimentation (REE) fault-tolerant parallel-processing supercomputer for spacecraft onboard scientific data processing , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[21]  V. J. Connors,et al.  Strategic plan. , 1999, Journal of the American Optometric Association.

[22]  Frank Vahid,et al.  Improving Software Performance with Configurable Logic , 2002, Des. Autom. Embed. Syst..

[23]  Michael J. Iacoponi,et al.  The fault tolerance approach of the Advanced Architecture Onboard Processor , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[24]  High-performance, Dependable Multiprocessor , 2006, 2006 IEEE Aerospace Conference.

[25]  Ravishankar K. Iyer,et al.  NFTAPE: networked fault tolerance and performance evaluator , 2002, Proceedings International Conference on Dependable Systems and Networks.