d2-LBDR: Distance-driven routing to handle permanent failures in 2D mesh NoCs

With the advent of deep sub-micron technology, fault-tolerant solutions are needed to keep many-core chips operative. In NoCs, Logic Based Distributed Routing (LBDR) proved to be a flexible routing framework for 2D meshes with link and router faults. However, to provide full coverage, LBDR requires a module named FORKS which replicates some messages. This imposes the use of virtual cut-through switching and a complex router arbiter, increasing excessively the router cost, mainly in buffer area. Also, some failure combinations require the use of a non-trivial dynamic reconfiguration strategy to avoid deadlocks. We propose d2-LBDR which adds, on every router, a distance register to the closest failure. This enables the support of more failure combinations without an excessive implementation cost. Indeed, we restore the use of wormhole switching, keeping router architecture simple, while achieving the same fault coverage as the best LBDR version, without requiring complex switching strategies nor any dynamic reconfiguration strategy. Results show that a small area overhead (3%) is enough for the implementation of a fully flexible routing method without any limiting support case when compared with LBDR. d2-LBDR reduces area overhead over the best LBDR approach (300% overhead against 3%) while preserving fault coverage. Results show d2-LBDR performance equal to LBDR.

[1]  José Duato,et al.  An Efficient Implementation of Distributed Routing Algorithms for NoCs , 2008 .

[2]  Axel Jantsch,et al.  Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[4]  Huawei Li,et al.  ZoneDefense: A Fault-Tolerant Routing for 2-D Meshes Without Virtual Channels , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  S. Medardoni,et al.  Flexible DOR routing for virtualization of multicore chips , 2009, 2009 International Symposium on System-on-Chip.

[6]  Michele Favalli,et al.  A complete self-testing and self-configuring NoC infrastructure for cost-effective MPSoCs , 2013, TECS.

[7]  Jie Wu,et al.  A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model , 2003, IEEE Trans. Computers.

[8]  Alessandro Strano,et al.  OSR-Lite: Fast and deadlock-free NoC reconfiguration framework , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[9]  Federico Silla,et al.  Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  José Duato,et al.  On the Potentials of Segment-Based Routing for NoCs , 2008, 2008 37th International Conference on Parallel Processing.

[11]  Lionel M. Ni,et al.  Fault-tolerant wormhole routing in meshes , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[12]  Vijay Laxmi,et al.  A Brief Comment on “A Complete Self-Testing and Self-Configuring NoC Infrastructure for Cost-Effective MPSoCs” [ACM Transactions on Embedded Computing Systems 12 (2013) Article 106] , 2015, ACM Trans. Embed. Comput. Syst..

[13]  Li-Shiuan Peh,et al.  ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[14]  David Blaauw,et al.  A highly resilient routing algorithm for fault-tolerant NoCs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  Yong-Bin Kim,et al.  Fault Tolerant Source Routing for Network-on-chip , 2007, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007).