On-line detection of the deadlocks caused by permanently faulty links in quasi-delay insensitive networks on chip

Asynchronous networks on chip (NoCs) are promising candidates for supporting the enormous communication needed by future many-core systems due to their low-energy and high-speed. Similar to synchronous NoCs, asynchronous NoCs are vulnerable to faults but their fault-tolerance is not studied adequately, especially the quasi-delay insensitive (QDI) NoCs. One of the key issues neglected by most designers is that permanent faults in QDI NoCs cause deadlocks, which cripples the traditional fault-tolerant techniques using redundant codes. A novel detection method has been proposed to locate the faulty link in a QDI NoC according to a common pattern shared by all fault-related deadlocks. It is shown that this method introduces low hardware overhead and reports permanently faulty links with a short delay and guaranteed accuracy.

[1]  Steven M. Nowick,et al.  An error-correcting unordered code and hardware support for robust asynchronous global communication , 2010, DATE.

[2]  Steve Furber,et al.  Principles of Asynchronous Circuit Design: A Systems Perspective , 2010 .

[3]  Fabien Clermidy,et al.  Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[4]  Alain Greiner,et al.  A reconfigurable routing algorithm for a fault-tolerant 2D-Mesh Network-on-Chip , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[5]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[6]  S. A. Al-Arian,et al.  Physical failures and fault models of CMOS circuits , 1987 .

[7]  Martin Radetzki,et al.  Fault Tolerant Network on Chip Switching With Graceful Performance Degradation , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Tomohiro Yoneda,et al.  Improving Dependability and Performance of Fully Asynchronous On-chip Networks , 2011, 2011 17th IEEE International Symposium on Asynchronous Circuits and Systems.

[9]  Jim D. Garside,et al.  Transient Fault Tolerant QDI Interconnects Using Redundant Check Code , 2013, 2013 Euromicro Conference on Digital System Design.

[10]  Vincent Beroulle,et al.  Design-for-test approach of an asynchronous network-on-chip architecture and its associated test pattern generation and application , 2009, IET Comput. Digit. Tech..

[11]  Axel Jantsch,et al.  Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Doug A. Edwards,et al.  Asynchronous spatial division multiplexing router , 2011, Microprocess. Microsystems.

[13]  Paul Ampadu,et al.  Self-Adaptive System for Addressing Permanent Errors in On-Chip Interconnects , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Paul Ampadu,et al.  Dual-Layer Adaptive Error Control for Network-on-Chip Links , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Ney Laert Vilar Calazans,et al.  Adding Temporal Redundancy to Delay Insensitive Codes to Mitigate Single Event Effects , 2012, 2012 IEEE 18th International Symposium on Asynchronous Circuits and Systems.

[16]  Stephen B. Furber,et al.  An asynchronous on-chip network router with quality-of-service (QoS) support , 2004, IEEE International SOC Conference, 2004. Proceedings..

[17]  Christian Bernard,et al.  A 477mW NoC-based digital baseband for MIMO 4G SDR , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[18]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[19]  Rajit Manohar,et al.  Fault detection and isolation techniques for quasi delay-insensitive circuits , 2004, International Conference on Dependable Systems and Networks, 2004.

[20]  Pasi Liljeberg,et al.  Online Reconfigurable Self-Timed Links for Fault Tolerant NoC , 2007, VLSI Design.