Dynamic reliability management: Reconfiguring reliability-levels of hardware designs at runtime

The reliability of FPGA based hardware designs is becoming a challenge with future device technologies and, in particular, for avionic and space applications where FPGAs might get exposed to high radiation levels. Typically, redundancy-based techniques are used to achieve fault-tolerant operation. However, hardware redundancy comes with an overhead in performance factors such as area requirement, latency and power consumption. Based on the observation that reliability requirements vary over time, we propose the concept of Dynamic Reliability Management (DRM). With DRM, we can optimize the tradeoff between reliability and performance factors at runtime. In this paper, we present the DRM concept and a DRM tool flow comprising a design time and a runtime part. At design time, we leverage and extend the BYU-LANL tool to automatically generate several implementations of a single hardware design at different reliability levels and consequently with different performance factors. At runtime, we rely on the ReconOS architecture and multithreaded programming model to switch among different reliability configurations. Finally, a case study is provided with analysis of the trade-offs for varying reliability configurations.

[1]  Richard R. Larson AFTI/F-111 MAW flight control system and redundancy management description , 1987 .

[2]  Michael J. Wirthlin,et al.  Voter insertion algorithms for FPGA designs using triple modular redundancy , 2010, FPGA '10.

[3]  Sandeep K. Shukla,et al.  NANOPRISM: a tool for evaluating granularity vs. reliability trade-offs in nano architectures , 2004, GLSVLSI '04.

[4]  Marco Platzner,et al.  ReconOS: An RTOS Supporting Hard-and Software Threads , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[5]  Chiara Sandionigi,et al.  A Novel Design Methodology for Implementing Reliability-Aware Systems on SRAM-Based FPGAs , 2011, IEEE Transactions on Computers.

[6]  Walter Stechele,et al.  An architecture and an FPGA prototype of a reliable processor pipeline towards multiple soft- and timing errors , 2011, 14th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems.

[7]  Carl Carmichael,et al.  Triple Module Redundancy Design Techniques for Virtex FPGAs, Application Note 197 , 2001 .

[8]  John P. Hayes,et al.  Self-Test and Adaptation for Random Variations in Reliability , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[9]  J. R. Schwank,et al.  Radiation Hardness Assurance Testing of Microelectronic Devices and Integrated Circuits: Radiation Environments, Physical Mechanisms, and Foundations for Hardness Assurance , 2013, IEEE Transactions on Nuclear Science.

[10]  Michael A. Xapsos,et al.  The Space Radiation Environment , 1994 .

[11]  Henry Hoffmann,et al.  Self-Aware Adaptation in FPGA-based Systems , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[12]  Brad L. Hutchings,et al.  Improving functional density using run-time circuit reconfiguration [FPGAs] , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[13]  A. Lesea,et al.  The rosetta experiment: atmospheric soft error rate testing in differing technology FPGAs , 2005, IEEE Transactions on Device and Materials Reliability.

[14]  Andres Upegui,et al.  An FPGA Dynamically Reconfigurable Framework for Modular Robotics , 2005, ARCS Workshops.

[15]  Jürgen Becker,et al.  On-line optimization of FPGA power-dissipation by exploiting run-time adaption of communication primitives , 2006, SBCCI '06.

[16]  Alan D. George,et al.  Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing , 2012, TRETS.

[17]  Marco Platzner,et al.  Cooperative multithreading in dynamically reconfigurable systems , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[18]  Marco Platzner,et al.  Hardware/Software Platform for Self-aware Compute Nodes , 2012 .

[19]  P. Reviriego,et al.  Selection of the Optimal Memory Configuration in a System Affected by Soft Errors , 2009, IEEE Transactions on Device and Materials Reliability.

[20]  Marco Platzner,et al.  ReconOS: Multithreaded programming for reconfigurable computers , 2009, TECS.

[21]  Ronald F. DeMara,et al.  Scalability of Sustainable Self-Repair to Mitigate Aging Induced Degradation in SRAM- based FPGA devices , 2011 .

[22]  Inhwan Lee,et al.  Voting structures for cascaded triple modular redundant modules , 2007, IEICE Electron. Express.