A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

Survival capability is becoming a crucial factor in designing multicore processors built with on-chip packet networks, or networks on chip (NoCs). In this paper, we propose a lightweight fault-tolerant mechanism for NoCs based on default backup paths (DBPs) designed to maintain, in the presence of failures, network connectivity of both non-faulty routers as well as healthy processor cores which may be connected to faulty routers. The mechanism provides default paths as backup between certain router ports which serve as alternative datapaths to circumvent failed components within a faulty router. Along with a minimal subset of normal network channels, the set of default backup paths internal to faulty routers form - in the worst case - a unidirectional ring topology that provides network-wide connectivity to all processor cores. Routing using the DBP mechanism is proved to be deadlock-free with only two virtual channels even for fault scenarios in which regular networks degrade to irregular (arbitrary) topologies. Evaluation results show that, for a 2-D mesh wormhole NoC, only 12.6% additional hardware resources are needed to implement the proposed DBP mechanism in order to provide graceful performance degradation without chip-wide failure as the number of faults increases to the maximum needed to form ring.

[1]  Pedro López,et al.  Power saving in regular interconnection networks , 2010, Parallel Comput..

[2]  Li-Shiuan Peh,et al.  Design-space exploration of power-aware on/off interconnection networks , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[3]  Hideharu Amano,et al.  Descending layers routing: a deadlock-free deterministic routing using virtual channels in system area networks with irregular topologies , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[4]  Pedro López,et al.  Boosting the Performance of Myrinet Networks , 2002, IEEE Trans. Parallel Distributed Syst..

[5]  José Duato,et al.  A theory for deadlock-free dynamic network reconfiguration. Part I , 2005, IEEE Transactions on Parallel and Distributed Systems.

[6]  Lionel M. Ni,et al.  The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[7]  Pedro López,et al.  Power saving in regular interconnection networks built with high-degree switches , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  Arnab Banerjee,et al.  A Power and Energy Exploration of Network-on-Chip Architectures , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[9]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[10]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[11]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[12]  Antonio Robles,et al.  An effective methodology to improve the performance of the up*/down* routing algorithm , 2004, IEEE Transactions on Parallel and Distributed Systems.

[13]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14]  Michael Burrows,et al.  Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links , 1991, IEEE J. Sel. Areas Commun..

[15]  Hideharu Amano,et al.  Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network , 2007, IEEE Transactions on Parallel and Distributed Systems.

[16]  Valentin Puente,et al.  Immunet: a cheap and robust fault-tolerant packet routing mechanism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[17]  Mithuna Thottethodi,et al.  BLAM: a high-performance routing algorithm for virtual cut-through networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[18]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[19]  George Michelogiannakis,et al.  Approaching Ideal NoC Latency with Pre-Configured Routes , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[20]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[21]  José Duato,et al.  A methodology for developing deadlock-free dynamic network reconfiguration processes. Part II , 2005, IEEE Transactions on Parallel and Distributed Systems.