A Gate Level Analysis of Transient Faults Effects on Dual-Core Chip-Multi Processors

With continuous scaling in CMOS technology the number of transistors grows more and more in a single chip and it makes modern processors prone to the risk of transient fault. In this work the effects of transient faults in MIPS-based Chip-Multi Processors (CMPs) are investigated in two phases. In the first phase a low level fault injection is performed and sensitive components is determined. In the next phase, in order to improve the reliability term in CMPs, two simple low overhead fault tolerant techniques are employed on the most vulnerable components in the MIPS-based dual-core processor. Hsiao code was used which is an optimal minimum odd-weight-column single error correction and double error detection SEC-DED code to protect MPI and program counters. TMR (Triple Modular Redundancy) technique is used to improve reliability of the Arbiter. Using fault injection improves 12.8% in error recovery and 16.6% reduction of failure rate with negligible performance overhead.

[1]  Scott A. Mahlke,et al.  Reliability: Fallacy or Reality? , 2007, IEEE Micro.

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Eiji Fujiwara,et al.  Error-control coding for computer systems , 1989 .

[4]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[5]  Koushik Chakraborty,et al.  Adapting to intermittent faults in multicore systems , 2008, ASPLOS.

[6]  Nur A. Touba,et al.  Selecting Error Correcting Codes to Minimize Power in Memory Checker Circuits , 2005, J. Low Power Electron..

[7]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[8]  Kenneth R. Vollmar,et al.  A MIPS assembly language simulator designed for education , 2005 .

[9]  M. Y. Hsiao,et al.  A class of optimal minimum odd-weight-column SEC-DED codes , 1970 .

[10]  Hamid R. Zarandi,et al.  Analysis of Transient Faults on a MIPS-Based Dual-Core Processor , 2010, 2010 International Conference on Availability, Reliability and Security.

[11]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[12]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[13]  Seyed Ghassem Miremadi,et al.  Dependability analysis using a fault injection tool based on synthesizability of HDL models , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[14]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[15]  Shubu Mukherjee,et al.  Architecture Design for Soft Errors , 2008 .

[16]  P. K. Lala Self-Checking and Fault-Tolerant Digital Design , 1995 .

[17]  Alfredo Benso,et al.  Fault Injection Techniques and Tools for Embedded Systems , 2003 .