Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs

In this paper, we propose a polymorphic fault tolerant architecture that can be tailored to efficiently support the reliability needs of multiple applications at run-time. Today, coarse-grained reconfigurable architectures (CGRAs) host multiple applications with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the applications is neither optimal nor desirable. To reduce the fault-tolerance overhead, adaptive fault-tolerance strategies have been proposed. The proposed techniques access the reliability requirements of each application and adjust the fault-tolerance intensity (and hence overhead), accordingly. However, existing flexible reliability schemes only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) and deal with only a single class of faults (e.g. soft errors). To complement these strategies, we propose energy-aware fault-tolerance that, in addition to modular redundancy, can also provide low cost, sub-modular (e.g. residue mod 3) redundancy, to cater both permanent and temporary faults. Our solution relies on an agent based control layer and a configurable fault-tolerance data path. The control layer identifies the application class and configures the data path to provide the needed reliability. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) showed that the proposed method provides flexible protection with energy overhead ranging from 3.125% to 107% for different reliability levels. Synthesis results have confirmed that the proposed architecture significantly reduces the area overhead for self-checking (59.1%) and fault tolerant (7.1%) versions, compared to the state of the art adaptive reliability techniques.

[1]  Cecilia Metra,et al.  Configurable Error Control Scheme for NoC Signal Integrity , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[2]  Gerard J. M. Smit,et al.  Towards Software Defined Radios Using Coarse-Grained Reconfigurable Hardware , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Stanislaw J. Piestrak Design of Residue Generators and Multioperand Modular Adders Using Carry-Save Adders , 1994, IEEE Trans. Computers.

[4]  Jürgen Becker,et al.  Architecture, memory and interface technology integration of an industrial/ academic configurable system-on-chip (CSoC) , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[5]  Ahmed Hemani,et al.  Classification of Massively Parallel Computer Architectures , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[6]  Paul Ampadu,et al.  Transient and Permanent Error Co-management Method for Reliable Networks-on-Chip , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[7]  Yunheung Paek,et al.  Power-Conscious Configuration Cache Structure and Code Mapping for Coarse-Grained Reconfigurable Architecture , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[8]  Syed M. A. H. Jafri,et al.  Design of a Fault-Tolerant Coarse-Grained Reconfigurable Architecture : A Case Study , 2010 .

[9]  Olivier Sentieys,et al.  Error recovery technique for coarse-grained reconfigurable architectures , 2011, 14th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems.

[10]  Paul Ampadu,et al.  Adaptive error control for nanometer scale network-on-chip links , 2009, IET Comput. Digit. Tech..

[11]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[12]  Hannu Tenhunen,et al.  Energy-Aware Fault-Tolerant Network-on-Chips for Addressing Multiple Traffic Classes , 2012, 2012 15th Euromicro Conference on Digital System Design.

[13]  Mary Jane Irwin,et al.  Adapative Error Protection for Energy Efficiency , 2003, ICCAD 2003.

[14]  Jürgen Becker,et al.  Configware and morphware going mainstream , 2003, J. Syst. Archit..

[15]  Kaushik Roy,et al.  Fault-Tolerance with Graceful Degradation in Quality: A Design Methodology and its Application to Digital Signal Processing Systems , 2008, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems.

[16]  M. Wirthlin,et al.  Fault Tolerant ICAP Controller for High-Reliable Internal Scrubbing , 2008, 2008 IEEE Aerospace Conference.

[17]  Masanori Hashimoto,et al.  Coarse-grained dynamically reconfigurable architecture with flexible reliability , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[18]  Giovanni De Micheli,et al.  An adaptive low-power transmission scheme for on-chip networks , 2002, 15th International Symposium on System Synthesis, 2002..

[19]  Olivier Sentieys,et al.  Design of a fault-tolerant coarse-grained , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[20]  Hannu Tenhunen,et al.  Compact generic intermediate representation (CGIR) to enable late binding in coarse grained reconfigurable architectures , 2011, 2011 International Conference on Field-Programmable Technology.

[21]  Shekhar Y. Borkar,et al.  Microarchitecture and Design Challenges for Gigascale Integration , 2004, MICRO.

[22]  Bertil Svensson,et al.  Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing , 2009, Microprocess. Microsystems.

[23]  Muhammad Ali Shami Dynamically Reconfigurable Resource Array , 2012 .

[24]  Nasim Farahini An Improved Hierarchical Design Flow for Coarse Grain Regular Fabrics , 2011 .

[25]  M. J. Irwin,et al.  Adaptive error protection for energy efficiency , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[26]  Eric Schwarz,et al.  Self Checking in Current Floating-Point Units , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[27]  Kwang-Ting Cheng,et al.  Comprehensive online defect diagnosis in on-chip networks , 2012, 2012 IEEE 30th VLSI Test Symposium (VTS).

[28]  Ricardo P. Jasinski,et al.  Fault-Tolerance Techniques for SRAM-Based FPGAs , 2007, Comput. J..

[29]  Teijo Lehtonen On Fault Tolerance Methods for Networks-on-Chip , 2009 .

[30]  Ahmed Hemani,et al.  39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[31]  Giovanni De Micheli,et al.  A robust self-calibrating transmission scheme for on-chip networks , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[32]  Masanori Hashimoto,et al.  Implementing Flexible Reliability in a Coarse-Grained Reconfigurable Architecture , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[33]  Cao Liang,et al.  SmartCell: An Energy Efficient Coarse-Grained Reconfigurable Architecture for Stream-Based Applications , 2009, EURASIP J. Embed. Syst..