Fault-tolerant schemes for NoC with a network monitor

Global buses in deep-submicron (DSM) system-on-chip designs consume significant amounts of power, have large propagation delays, and are easy to catch transmission errors due to DSM noise. A comprehensive fault-tolerant mechanism for transient and permanent failures is proposed in this paper. Based on the special NoC with a network monitor, a flit level point-to-point error detection scheme is added to routers to handle transient failures on the data links, while a dynamic routing mechanism is produced to deal with permanent link failures as well. In addition, there is some Build-in testing of the monitor to increase the reliability of the architecture. The result of the experiment demonstrates the advantage of the mechanism in terms of throughput and latency, while the consumptions of area and power overheads are acceptable.

[1]  Michele Zorzi,et al.  On the statistics of block errors in bursty channels , 1997, IEEE Trans. Commun..

[2]  Michele Zorzi,et al.  Performance of FEC and ARQ error control in bursty channels under delay constraints , 1998, VTC '98. 48th IEEE Vehicular Technology Conference. Pathway to Global Wireless Revolution (Cat. No.98CH36151).

[3]  Radu Marculescu,et al.  Towards on-chip fault-tolerant communication , 2003, ASP-DAC '03.

[4]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[5]  M. Ali,et al.  Considerations for fault-tolerant network on chips , 2005, 2005 International Conference on Microelectronics.

[6]  Alain Greiner,et al.  A reconfigurable routing algorithm for a fault-tolerant 2D-Mesh Network-on-Chip , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[7]  Jeong-Gun Lee,et al.  Implications of Rent's Rule for NoC Design and Its Fault-Tolerance , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[8]  O. Tayan,et al.  Networks-on-Chip: Challenges, trends and mechanisms for enhancements , 2009, 2009 International Conference on Information and Communication Technologies.

[9]  M.R. Aliabadi,et al.  A novel reliable routing algorithm for network on chips , 2008, 2008 IEEE International Conference on Industrial Engineering and Engineering Management.

[10]  Michael Welzl,et al.  A Fault tolerant mechanism for handling Permanent and Transient Failures in a Network on Chip , 2007, Fourth International Conference on Information Technology (ITNG'07).

[11]  Ming Shae Wu,et al.  Using a periodic square wave test signal to detect crosstalk faults , 2005, IEEE Design & Test of Computers.

[12]  Mahmut T. Kandemir,et al.  Fault tolerant algorithms for network-on-chip interconnect , 2004, IEEE Computer Society Annual Symposium on VLSI.

[13]  M.R. Aliabadi,et al.  Dynamic Intermediate Node Algorithm (DINA); a novel fault tolerance routing methodology for NoC’s , 2008, 2008 International Symposium on Telecommunications.

[14]  Liu Jian Fault-Tolerant Schemes for Networks-on-Chip , 2009 .

[15]  Naresh R. Shanbhag,et al.  Toward achieving energy efficiency in presence of deep submicron noise , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[16]  Amir Hosseini,et al.  A fault-aware dynamic routing algorithm for on-chip networks , 2008, 2008 IEEE International Symposium on Circuits and Systems.