Packet logging mechanism for adaptive online fault detection on Network-on-Chip

The shrinking size of transistors and smaller interconnect elements contribute to higher probability of on-chip faults. In order to sustain the functionality of a system in the presence of faults, fault tolerance becomes one of the key feature in Network-on-Chip (NoC) design methodology. Existing end-to-end (E2E) error detection and correction (EDC) performs well at low error rate whereas switch-to-switch (S2S) EDC performs better at high error rate. Nonetheless, choosing between both techniques is required with changing fault occurrence probability. This paper proposes an adaptive online fault detection based on packet logging mechanism. In this proposed mechanism, each router logs transmitted packets and NACK packets as well as monitors its fault level continuously. Then, the router will determine either to use E2E or S2S EDC based on error probability. Based on experimental results, our proposed adaptive method switches between E2E or S2S relative to error probability performs better than only E2E or S2S.

[1]  Jun Xu,et al.  Large-scale IP traceback in high-speed internet: practical techniques and information-theoretic foundation , 2008, TNET.

[2]  Chita R. Das,et al.  Exploring Fault-Tolerant Network-on-Chip Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[3]  Martin Radetzki,et al.  Fault Localizing End-to-End Flow Control Protocol for Networks-on-Chip , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[4]  Ming-Chien Yang,et al.  RIHT: A Novel Hybrid IP Traceback Scheme , 2012, IEEE Transactions on Information Forensics and Security.

[5]  Craig Partridge,et al.  Single-packet IP traceback , 2002, TNET.

[6]  Heiko Zimmer Fault Modelling and Error-Control Coding in a Network-on-Chip , 2002 .

[7]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Martin Radetzki,et al.  Fault Tolerant Network on Chip Switching With Graceful Performance Degradation , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Lorena Anghel,et al.  Essential Fault-Tolerance Metrics for NoC Infrastructures , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[10]  Axel Jantsch,et al.  A network on chip architecture and design methodology , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[11]  Partha Pratim Pande,et al.  On-line fault detection and location for NoC interconnects , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[12]  Juha Plosila,et al.  Network on Chip Routing Algorithms , 2006 .

[13]  Axel Jantsch,et al.  Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Kamil Saraç,et al.  IP traceback based on packet marking and logging , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[15]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[16]  Kwang-Ting Cheng,et al.  Comprehensive online defect diagnosis in on-chip networks , 2012, 2012 IEEE 30th VLSI Test Symposium (VTS).

[17]  Kwang-Ting Cheng,et al.  End-to-end error correction and online diagnosis for on-chip networks , 2011, 2011 IEEE International Test Conference.