A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration

Network-on-Chip (NoC) is widely used as a communication scheme in modern many-core systems. To guarantee the reliability of communication, effective fault tolerant techniques are critical for an NoC. In this paper, a novel fault tolerant architecture employing redundant routers is proposed to maintain the functionality of a network in the presence of failures. This architecture consists of a mesh of 2x2 router blocks with a spare router placed in the center of each block. This spare router provides a viable alternative when a router fails in a block. The proposed fault-tolerant architecture is therefore referred to as a quad-spare mesh. The quad-spare mesh can be dynamically reconfigured by changing control signals without altering the underlying topology. This dynamic reconfiguration and its corresponding routing algorithm are demonstrated in detail. Since the topology after reconfiguration is consistent with the original error-free 2D mesh, the proposed design is transparent to operating systems and application software. Experimental results show that the proposed design achieves significant improvements on reliability compared with those reported in the literature. Comparing the error-free system with a single router failure case, the throughput only decreases by 5.19% and latency increases by 2.40%, with about 45.9% hardware redundancy.

[1]  Partha Pratim Pande,et al.  BIST for network-on-chip interconnect infrastructures , 2006, 24th IEEE VLSI Test Symposium.

[2]  Qiang Xu,et al.  On Topology Reconfiguration for Defect-Tolerant NoC-Based Homogeneous Manycore Systems , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Partha Pratim Pande,et al.  Design of a switch for network on chip applications , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[5]  Mahmut T. Kandemir,et al.  Fault tolerant algorithms for network-on-chip interconnect , 2004, IEEE Computer Society Annual Symposium on VLSI.

[6]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[7]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[8]  Hao Chen,et al.  On the Reliability of Computational Structures Using Majority Logic , 2011, IEEE Transactions on Nanotechnology.

[9]  Ching-Te Chiu,et al.  On the design and analysis of fault tolerant NoC architecture using spare routers , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[10]  David Blaauw,et al.  A highly resilient routing algorithm for fault-tolerant NoCs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[11]  Jens Sparsø,et al.  ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology , 2008 .

[12]  Youngsoo Kim,et al.  Designing real-time H.264 decoders with dataflow architectures , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[13]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[14]  Doug Burger,et al.  Exploiting microarchitectural redundancy for defect tolerance , 2003, Proceedings 21st International Conference on Computer Design.

[15]  Alexandre M. Amory,et al.  A scalable test strategy for network-on-chip routers , 2005, IEEE International Conference on Test, 2005..

[16]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[17]  T. Dumitras,et al.  Towards on-chip fault-tolerant communication , 2003, Proceedings of the ASP-DAC Asia and South Pacific Design Automation Conference, 2003..

[18]  Jens Sparsø,et al.  ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[19]  Karam S. Chatha,et al.  Quality-of-service and error control techniques for network-on-chip architectures , 2004, GLSVLSI '04.

[20]  David Blaauw,et al.  Vicis: A reliable network for unreliable silicon , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[21]  Jari Nurmi,et al.  Fault tolerant XGFT network on chip for multi processor system on chip circuits , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[22]  T. N. Vijaykumar,et al.  Rescue: a microarchitecture for testability and defect tolerance , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[23]  Axel Jantsch,et al.  Networks on chip , 2003 .

[24]  Alain Greiner,et al.  A reconfigurable routing algorithm for a fault-tolerant 2D-Mesh Network-on-Chip , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[25]  Wang,et al.  System-on-Chip Test Architectures: Nanometer Design for Testability , 2007 .