Transient Fault Tolerant QDI Interconnects Using Redundant Check Code

Asynchronous logic is a promising technology for building the chip-level interconnect of multi-core systems. However, asynchronous circuits are vulnerable to faults. This paper presents a novel scheme to improve the robustness of asynchronous systems. Our first contribution is a fault tolerant delay-insensitive redundant check coding scheme named DIRC. Using DIRC in 4-phase 1-of-n quasi-delay-insensitive (QDI) interconnects, all 1-bit and some multi-bit transient faults can be tolerated. The DIRC and the basic 4-phase 1-of-n pipeline stages are mutually exchangeable so that arbitrary basic stages can be replaced by DIRC stages to strengthen the fault-tolerance of long wires. Our second contribution, RPA, is a redundant technique to protect the acknowledge wires from transient faults - an issue that has long been disregarded by the community. The DIRC pipelines (using DIRC plus RPA) were simulated using the UMC 0.13μm standard cell library and compared with the basic pipelines. Detailed experimental results show that the 128-bit DIRC 1-of-4 pipeline is only 13% slower than the basic one but increases fault-tolerance hundred-folds when multi-bit transient faults are considered.

[1]  Shubu Mukherjee,et al.  Architecture Design for Soft Errors , 2008 .

[2]  Pasi Liljeberg,et al.  Online Reconfigurable Self-Timed Links for Fault Tolerant NoC , 2007, VLSI Design.

[3]  Peiyi Zhao,et al.  Design Asynchronous Circuits for Soft Error Tolerance , 2007, 2007 IEEE International Conference on Integrated Circuit Design and Technology.

[4]  Jehoshua Bruck,et al.  Unordered error-correcting codes and their applications , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[5]  William John Bainbridge,et al.  Glitch Sensitivity and Defense of Quasi Delay-Insensitive Network-on-Chip Links , 2009, 2009 15th IEEE Symposium on Asynchronous Circuits and Systems.

[6]  Alain J. Martin,et al.  SEU-tolerant QDI circuits [quasi delay-insensitive asynchronous circuits] , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[7]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[8]  Daniel J. Sorin,et al.  Fault Tolerant Computer Architecture , 2009, Fault Tolerant Computer Architecture.

[9]  Ney Laert Vilar Calazans,et al.  Adding Temporal Redundancy to Delay Insensitive Codes to Mitigate Single Event Effects , 2012, 2012 IEEE 18th International Symposium on Asynchronous Circuits and Systems.

[10]  Tom Verhoeff,et al.  Delay-insensitive codes — an overview , 1988, Distributed Computing.

[11]  Steve Furber,et al.  Principles of Asynchronous Circuit Design: A Systems Perspective , 2010 .

[12]  Natalie D. Enright Jerger,et al.  On-Chip Networks , 2009, On-Chip Networks.

[13]  Jens Sparsø,et al.  Principles of Asynchronous Circuit Design , 2001 .

[14]  William B. Toms,et al.  Delay-insensitive, point-to-point interconnect using m-of-n codes , 2003, Ninth International Symposium on Asynchronous Circuits and Systems, 2003. Proceedings..

[15]  Resve Saleh,et al.  Simulation and analysis of transient faults in digital circuits , 1992 .

[16]  Rajit Manohar,et al.  Efficient failure detection in pipelined asynchronous circuits , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[17]  Fabien Clermidy,et al.  A fully-asynchronous low-power framework for GALS NoC integration , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[18]  Fu-Chiung Cheng,et al.  Efficient systematic error-correcting codes for semi-delay-insensitive data transmission , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[19]  Alexandre Yakovlev,et al.  Asynchronous transient resilient links for NoC , 2008, CODES+ISSS '08.

[20]  Steven M. Nowick,et al.  An error-correcting unordered code and hardware support for robust asynchronous global communication , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).