Analysis and optimization of fault-tolerant embedded systems with hardened processors

In this paper we propose an approach to the design optimization of fault-tolerant hard real-time embedded systems, which combines hardware and software fault tolerance techniques. We trade-off between selective hardening in hardware and process re-execution in software to provide the required levels of fault tolerance against transient faults with the lowest-possible system costs. We propose a system failure probability (SFP) analysis that connects the hardening level with the maximum number of re-executions in software. We present design optimization heuristics, to select the fault-tolerant architecture and decide process mapping such that the system cost is minimized, deadlines are satisfied, and the reliability requirements are fulfilled.

[1]  Alan D. George,et al.  Reliable Management Services for COTS-based Space Systems and Applications , 2006, ESA.

[2]  Kartik Mohanram,et al.  Gate sizing to radiation harden combinational logic , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Alberto L. Sangiovanni-Vincentelli,et al.  Fault-Tolerant Distributed Deployment of Embedded Control Software , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Dakai Zhu,et al.  Reliability-Aware Energy Management for Periodic Real-Time Tasks , 2007, 13th IEEE Real Time and Embedded Technology and Applications Symposium (RTAS'07).

[5]  D. Merodio,et al.  Experimental Validation of a Tool for Predicting the Effects of Soft Errors in SRAM-Based FPGAs , 2007, IEEE Transactions on Nuclear Science.

[6]  Hermann Kopetz,et al.  The time-triggered architecture , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[7]  Henrik Theiling,et al.  Reliable and Precise WCET Determination for a Real-Life Processor , 2001, EMSOFT.

[8]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[9]  Naresh R. Shanbhag,et al.  Sequential Element Design With Built-In Soft Error Resilience , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Petru Eles,et al.  Design optimization of time- and cost-constrained fault-tolerant distributed embedded systems , 2005, Design, Automation and Test in Europe.

[11]  John P. Hayes,et al.  An Analysis Framework for Transient-Error Tolerance , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[12]  Gwan S. Choi,et al.  A design approach for radiation-hard digital electronics , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[13]  Kang G. Shin,et al.  A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults , 2003, IEEE Trans. Computers.

[14]  Prachi Patel-Predd Update: Transistors in space , 2008, IEEE Spectrum.

[15]  Kartik Mohanram,et al.  Tunable Transient Filters for Soft Error Rate Reduction in Combinational Circuits , 2008, 2008 13th European Test Symposium.

[16]  Rami G. Melhem,et al.  Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems , 2000, IEEE Trans. Computers.

[17]  Nur A. Touba,et al.  Cost-effective approach for reducing soft error failure rate in logic circuits , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[18]  Nagarajan Kandasamy,et al.  Transparent recovery from intermittent faults in time-triggered distributed systems , 2003 .

[19]  Yves Sorel,et al.  An algorithm for automatically obtaining distributed and fault-tolerant static schedules , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[20]  Mahmut T. Kandemir,et al.  Reliability-aware Co-synthesis for Embedded Systems , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[21]  Charalambos A. Charalambides,et al.  Enumerative combinatorics , 2018, SIGA.

[22]  J. Karlsson,et al.  GOOFI: generic object-oriented fault injection tool , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[23]  Petru Eles,et al.  Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems With Checkpointing and Replication , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.