Fault-resilient routing unit in NoCs

With aggressive technology scaling in deep submicron era, burgeoning transistors make chips more susceptible to failures. It is inevitable that process variation is gradually becoming a crucial challenge in the IC design. In addition, aging leads to faults, shortening the lifetime of the circuits. Networks-on-chip also come to the problems caused by variations and aging, leading to degraded performance and erroneous behaviors. Faults may occur in numerous locations of the on-chip networks and once they occur in the control path, more severe effects such as deadlock and livelock are expected. In this paper, we present a fine-grained mechanism to tolerate faults in the routing computation units without disabling the faulty routers. By applying this mechanism, routing and packet-receiving services are separated. The faulty routing computation unit is replaced by a light-weight redundant circuit, providing static but reliable routing services. The other components in this router are still functional retaining the on-chip performance. Experimental results indicate that the on-chip network with the proposed mechanism is fault-tolerant when 14% of all routing computation modules are suffering from faults. The area overhead and power consumption of the proposed method is around 7.29% and 6.20% over the baseline approach.

[1]  William J. Dally,et al.  Research Challenges for On-Chip Interconnection Networks , 2007, IEEE Micro.

[2]  Hannu Tenhunen,et al.  LEAR -- A Low-Weight and Highly Adaptive Routing Method for Distributing Congestions in On-chip Networks , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[3]  Sanghamitra Roy,et al.  Proactive aging management in heterogeneous NoCs through a criticality-driven routing approach , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Chita R. Das,et al.  On the Effects of Process Variation in Network-on-Chip Architectures , 2010, IEEE Transactions on Dependable and Secure Computing.

[5]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[6]  Tao Li,et al.  Architecting reliable multi-core network-on-chip for small scale processing technology , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[7]  Sanghamitra Roy,et al.  Towards graceful aging degradation in NoCs through an adaptive routing algorithm , 2012, DAC Design Automation Conference 2012.

[8]  David Blaauw,et al.  Vicis: A reliable network for unreliable silicon , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[9]  Sanghamitra Roy,et al.  Wearout Resilience in NoCs Through an Aging Aware Adaptive Routing Algorithm , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Masoud Daneshtalab,et al.  High Performance Fault-Tolerant Routing Algorithm for NoC-Based Many-Core Systems , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[11]  Huawei Li,et al.  ZoneDefense: A Fault-Tolerant Routing for 2-D Meshes Without Virtual Channels , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Ming Li,et al.  DyXY - a proximity congestion-aware deadlock-free dynamic routing method for network on chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[13]  Sani R. Nassif,et al.  Design for Manufacturability and Statistical Design - A Constructive Approach , 2007, Series on integrated circuits and systems.

[14]  Martin Radetzki,et al.  Fault Localizing End-to-End Flow Control Protocol for Networks-on-Chip , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[15]  Giovanni De Micheli,et al.  Design, synthesis, and test of networks on chips , 2005, IEEE Design & Test of Computers.

[16]  Luca Benini,et al.  ReliNoC: A reliable network for priority-based on-chip communication , 2011, 2011 Design, Automation & Test in Europe.

[17]  Chita R. Das,et al.  Network-on-Chip Architectures - A Holistic Design Exploration , 2010, Lecture Notes in Electrical Engineering.

[18]  Chia Yee Ooi,et al.  Packet logging mechanism for adaptive online fault detection on Network-on-Chip , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[19]  E. Nowak,et al.  High-performance CMOS variability in the 65-nm regime and beyond. IBM J Res And Dev , 2006 .

[20]  Partha Pratim Pande,et al.  Crosstalk-Aware Channel Coding Schemes for Energy Efficient and Reliable NOC Interconnects , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[22]  Chrysostomos Nicopoulos,et al.  NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Hannu Tenhunen,et al.  MAFA: Adaptive Fault-Tolerant Routing Algorithm for Networks-on-Chip , 2012, 2012 15th Euromicro Conference on Digital System Design.

[24]  Vassos Soteriou,et al.  Hermes: Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).