Hybrid, adaptive, and reconfigurable fault tolerance

The main design challenge in developing space computers featuring hybrid system-on-chip (SoC) devices is determining the optimal combination of size, weight, power, cost, performance, and reliability for the target mission, while addressing the complexity associated with combining fixed and reconfigurable logic. This paper focuses upon fault-tolerant computing with adaptive hardware redundancy in fixed and reconfigurable logic, with the goal of providing and evaluating tradeoffs in system reliability, performance, and resource utilization. Our research targets the hybrid Xilinx Zynq SoC as the primary computational device on a flight computer. Typically, flight software on a Zynq runs on the ARM cores that by default operate in symmetric multiprocessing (SMP) mode. However, radiation tests have shown this mode can leave the system prone to upsets. To address this limitation, we present a new framework (HARFT: hybrid adaptive reconfigurable fault tolerance) that enables switching between three operating modes: (1) ARM cores running together in SMP mode; (2) ARM cores running independently in asymmetric multiprocessing (AMP) mode; and (3) an FPGA-enhanced mode for fault tolerance. While SMP is the default mode, AMP mode may be used for fault-tolerant and real-time extensions. Additionally, the FPGA-enhanced mode uses partially reconfigurable regions to vary the level of redundancy and include application- and environment-specific techniques for fault mitigation and application acceleration.

[1]  Alan D. George,et al.  Comparative Analysis of Present and Future Space-Grade Processors with Device Metrics , 2017, J. Aerosp. Inf. Syst..

[2]  John McDougall Simple AMP: Zynq SoC Cortex-A9 Bare-Metal System with MicroBlaze Processor , 2013 .

[3]  D GeorgeAlan,et al.  Reconfigurable Fault Tolerance , 2012 .

[4]  Aaron Gerald Stoddard Configuration Scrubbing Architectures for High-Reliability FPGA Systems , 2015 .

[5]  Kenneth A. LaBel,et al.  Notional Radiation Hardness Assurance (RHA) Planning For NASA Missions: Updated Guidance , 2014 .

[6]  R. Andraka,et al.  A Low Complexity Method for Detecting Configuration Upset in SRAM Based FPGAs , 2003 .

[7]  Alan D. George,et al.  Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing , 2012, TRETS.

[8]  Dagan White Considerations Surrounding Single Event Effects in FPGAs, ASICs, and Processors , 2011 .

[9]  Gary Crum,et al.  RadSat - Radiation Tolerant SmallSat Computer System , 2015 .

[10]  B.L. Bhuva,et al.  Soft Error Considerations for Multicore Microprocessor Design , 2007, 2007 IEEE International Conference on Integrated Circuit Design and Technology.

[11]  Ryan Melton,et al.  Non-radiation hardened microprocessors in space-based remote sensing systems , 2006, SPIE Remote Sensing.

[12]  A. H. Johnston,et al.  Emerging radiation hardness assurance (RHA) issues: a NASA approach for space flight programs , 1998 .

[13]  Heather Quinn,et al.  Single-Event Effects in Low-Cost, Low-Power Microprocessors , 2014, 2014 IEEE Radiation Effects Data Workshop (REDW).

[14]  Martin E. Fraeman,et al.  Harsh environments : space radiation environment, effects, and mitigation , 2008 .

[15]  Christopher Wilson,et al.  A methodology for estimating reliability of SmallSat computers in radiation environments , 2016, 2016 IEEE Aerospace Conference.