Adaptive Reliability for Fault Tolerant Multicore Systems

In an era of continuously shrinking technology and escalating power density, Multiprocessor System on Chips (MPSoCs) suffer from a growing prominence of device defects and increase of dependability-related issues. This paper tackles the dependability challenge by suggesting an adaptive reliability enhancement strategy for multicore systems. We dynamically adapt the reliability enhancement to the actual tasks requirements as well as cores runtime operating conditions. As reliability improvement may adversely affect the parameters of embedded systems, we suggest a runtime recovery method. In fact, we implement a 3-mode mapping technique to limit redundancy overheads through judicious task migrating and dropping. Our experiments show promising results in terms of error mitigation with controllable power and thermal overheads.

[1]  A. Burns Towards A More Practical Model for Mixed Criticality Systems , 2013 .

[2]  Alessandro Paccagnella,et al.  Temperature dependence of neutron-induced soft errors in SRAMs , 2012, Microelectron. Reliab..

[3]  Lothar Thiele,et al.  Towards the design of fault-tolerant mixed-criticality systems on multicores , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[4]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Coniferous softwood GENERAL TERMS , 2003 .

[6]  Qiang Xu,et al.  Characterizing the lifetime reliability of manycore processors with core-level redundancy , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Soonhoi Ha,et al.  Static mapping of mixed-critical applications for fault-tolerant MPSoCs , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Stijn Eyerman,et al.  An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..

[9]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[10]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Hannu Tenhunen,et al.  Guest Editors' Introduction: Multiprocessor Systems-on-Chips , 2005, Computer.

[12]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13]  Kevin Skadron,et al.  HotSpot 6.0: Validation, Acceleration and Extension , 2015 .

[14]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[15]  Albert M. K. Cheng,et al.  Scheduling Mixed-Criticality Real-Time Tasks with Fault Tolerance , 2014 .

[16]  Alan Burns,et al.  Response-Time Analysis for Mixed Criticality Systems , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[17]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[18]  Lothar Thiele,et al.  Thermal-aware system analysis and software synthesis for embedded multi-processors , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  Donald E. Thomas,et al.  Lifetime improvement through runtime wear-based task mapping , 2012, CODES+ISSS '12.

[20]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.