A fault-tolerant deadlock-free adaptive routing for on chip interconnects

Future applications will require processors with many cores communicating through a regular interconnection network. Meanwhile, the Deep submicron technology foreshadows highly defective chips era. In this context, not only fault-tolerant designs become compulsory, but their performance under failures gains importance. In this paper, we present a deadlock-free fault-tolerant adaptive routing algorithm featuring Explicit Path Routing in order to limit the latency degradation under failures. This is particularly interesting for streaming applications, which transfer huge amount of data between the same source-destination pairs. The proposed routing algorithm is able to route messages in the presence of any set of multiple nodes and links failures, as long as a path exists, and does not use any routing table. It is scalable and can be applied to multicore chips with a 2D mesh core interconnect of any size. The algorithm is deadlock-free and avoids infinite looping in fault-free and faulty 2D meshes. We simulated the proposed algorithm using the worst case scenario, with different failure rates. Experimentation results confirmed that the algorithm tolerates multiple failures even in the most extreme failure patterns. Additionally, we monitored the interconnect traffic and average latency for faulty cases. For 20×20 meshes, the proposed algorithm reduces the average latency by up to 50%.

[1]  Nacer-Eddine Zergainoh,et al.  Fault-Tolerant Deadlock-Free Adaptive Routing for Any Set of Link and Node Failures in Multi-cores Systems , 2010, 2010 Ninth IEEE International Symposium on Network Computing and Applications.

[2]  C.M. Cunningham,et al.  Fault-tolerant adaptive routing for two-dimensional meshes , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[3]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Yong-Bin Kim,et al.  Fault Tolerant Source Routing for Network-on-chip , 2007, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007).

[5]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[6]  T. Dumitras,et al.  Towards on-chip fault-tolerant communication , 2003, Proceedings of the ASP-DAC Asia and South Pacific Design Automation Conference, 2003..

[7]  David Blaauw,et al.  A highly resilient routing algorithm for fault-tolerant NoCs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.