A SAFE approach towards early design space exploration of fault-tolerant multimedia MPSoCs

With the reduction in feature size, transient errors start to play an important role in modern embedded systems. It is therefore important to make fault-tolerance a first-class citizen in embedded system design. Fault-tolerance patterns are techniques to make an application fault-tolerant. Not only do fault-tolerance patterns affect the quality of the embedded system (like performance, energy and cost), but there also are many ways of applying them. In this paper, we present the SAFE simulation framework that supports the early exploration of the different possibilities to apply fault-tolerance patterns to MPSoC-based embedded multimedia systems. The SAFE model incorporates fault injection, detection and correction. As a result, a Pareto front can be obtained that not only shows the trade-off between metrics like performance, energy, cost, but also captures reliability metrics like frame drops due to soft errors and the number of unresolvable faults.

[1]  Cristiana Bolchini,et al.  An Application-Level Dependability Analysis Framework for Embedded Systems , 2011, 2011 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems.

[2]  Niraj K. Jha,et al.  COFTA : Hardware-Software Co-Synthesis of Heterogeneous Distributed Embedded Systems for Low Overhead Fault Tolerance , 1999 .

[3]  Andy D. Pimentel,et al.  A systematic approach to exploring embedded system architectures at multiple abstraction levels , 2006, IEEE Transactions on Computers.

[4]  Andy D. Pimentel,et al.  A trace-based scenario database for high-level simulation of multimedia MP-SoCs , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[5]  Rolf Ernst,et al.  Reliability analysis for MPSoCs with mixed-critical, hard real-time constraints , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[7]  Vittorio Zaccaria,et al.  Robust optimization of SoC architectures: A multi-scenario approach , 2008, 2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.

[8]  Dhiraj K. Pradhan,et al.  Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture , 1994, IEEE Trans. Computers.

[9]  Martin Lukasiewycz,et al.  Interactive presentation: Reliability-aware system synthesis , 2007 .

[10]  Martin Lukasiewycz,et al.  Reliability-Aware System Synthesis , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[11]  Aviral Shrivastava,et al.  Mitigating the impact of hardware defects on multimedia applications: a cross-layer approach , 2008, ACM Multimedia.

[12]  R. Velazco,et al.  Single-event-upset-like fault injection: a comprehensive framework , 2005, IEEE Transactions on Nuclear Science.

[13]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[14]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[15]  Henk Corporaal,et al.  System-scenario-based design of dynamic embedded systems , 2009, TODE.

[16]  Donatella Sciuto,et al.  A model of soft error effects in generic IP processors , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[17]  Hans G. Kerkhoff,et al.  A Technique for Accelerating Injection of Transient Faults in Complex SoCs , 2011, 2011 14th Euromicro Conference on Digital System Design.

[18]  S. Pae,et al.  Random charge effects for PMOS NBTI in ultra-small gate area devices , 2005, 2005 IEEE International Reliability Physics Symposium, 2005. Proceedings. 43rd Annual..

[19]  Ed F. Deprettere,et al.  Daedalus: Toward composable multimedia MP-SoC design , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[20]  Petru Eles,et al.  Design optimization of time- and cost-constrained fault-tolerant distributed embedded systems , 2005, Design, Automation and Test in Europe.

[21]  Onur Derin,et al.  A Middleware Approach to Achieving Fault Tolerance of Kahn Process Networks on Networks on Chips , 2011, Int. J. Reconfigurable Comput..

[22]  Matthias Gries,et al.  Methods for evaluating and covering the design space during early design development , 2004, Integr..

[23]  Ed F. Deprettere,et al.  A Methodology to Design Programmable Embedded Systems - The Y-Chart Approach , 2001, Embedded Processor Design Challenges.

[24]  Donald E. Thomas,et al.  Scenario-oriented design for single-chip heterogeneous multiprocessors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Alois Knoll,et al.  Reliability-Aware Design Optimization for Multiprocessor Embedded Systems , 2011, 2011 14th Euromicro Conference on Digital System Design.