Holistic Approach for Fault-Tolerant Network-on-Chip based Many-Core Systems

In this paper we describe a holistic approach for Fault-Tolerant Network-on-Chip (NoC) based many-core systems that incorporates a System Health Monitoring Unit (SHMU) which collects all the fault information from the system, classifies them and provides different solutions for different fault classes. A Mapper/Scheduler Unit (MSU) is used for online generation of different mapping and scheduling solutions based on the current fault configuration of the system. For detection of faults, we have leveraged concurrent online checkers, able to capture faults with low detection latency and providing the fault information for SHMU, which can be later used for the recovery process. The experimentation setup is performed in an open source tool, able to perform the mapping, scheduling and simulation of the system.

[1]  Chrysostomos Nicopoulos,et al.  ElastiNoC: A Self-Testable Distributed , 2014 .

[2]  Thais Webber,et al.  A fault prediction module for a fault tolerant NoC operation , 2015, Sixteenth International Symposium on Quality Electronic Design.

[3]  Suleyman Tosun,et al.  Fault-Tolerant Topology Generation Method for Application-Specific Network-on-Chips , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Jaan Raik,et al.  Automated minimization of concurrent online checkers for Network-on-Chips , 2015, 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[5]  Santanu Chattopadhyay,et al.  A spare router based reliable Network-on-Chip design , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Chrysostomos Nicopoulos,et al.  NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Thomas Hollstein,et al.  Mixed-criticality NoC partitioning based on the NoCDepend dependability technique , 2015, 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[8]  Santanu Chattopadhyay,et al.  Fault tolerant mesh based Network-on-Chip architecture , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[9]  Jun Zhou,et al.  HARS: A High-Performance Reliable Routing Scheme for 3D NoCs , 2014, 2014 IEEE Computer Society Annual Symposium on VLSI.

[10]  Sudeep Pasricha,et al.  A low overhead fault tolerant routing scheme for 3D Networks-on-Chip , 2011, 2011 12th International Symposium on Quality Electronic Design.

[11]  Ching-Te Chiu,et al.  On the design and analysis of fault tolerant NoC architecture using spare routers , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[12]  José Duato,et al.  An Efficient Implementation of Distributed Routing Algorithms for NoCs , 2008 .

[13]  Hamid R. Zarandi,et al.  A fault-aware low-energy spare core allocation in networks-on-chip , 2012, NORCHIP 2012.

[14]  José Duato,et al.  An Efficient Implementation of Distributed Routing Algorithms for NoCs , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[15]  Hannu Tenhunen,et al.  SHiFA: System-level hierarchy in run-time fault-aware management of many-core systems , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Hannu Tenhunen,et al.  MAFA: Adaptive Fault-Tolerant Routing Algorithm for Networks-on-Chip , 2012, 2012 15th Euromicro Conference on Digital System Design.

[17]  Klaus Hofmann,et al.  NoCDepend: A Flexible and Scalable Dependability Technique for 3D Networks-on-Chip , 2015, 2015 IEEE 18th International Symposium on Design and Diagnostics of Electronic Circuits & Systems.

[18]  Vijay Laxmi,et al.  Fault tolerant routing implementation mechanism for irregular 2D mesh NoCs , 2014, 2014 NORCHIP.

[19]  Frédéric Pétrot,et al.  Elevator-First: A Deadlock-Free Distributed Routing Algorithm for Vertically Partially Connected 3D-NoCs , 2013, IEEE Transactions on Computers.

[20]  Kiyoung Choi,et al.  A deadlock-free routing algorithm requiring no virtual channel on 3D-NoCs with partial vertical connections , 2013, 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[21]  Midia Reshadi,et al.  A fault tolerant approach for application-specific Network-on-Chip , 2013, 2013 NORCHIP.

[22]  Masoumeh Ebrahimi,et al.  An Adaptive, Low Restrictive and Fault Resilient Routing Algorithm for 3D Network-on-Chip , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[23]  Masoumeh Ebrahimi,et al.  Fault-tolerant circular routing algorithm for 3D-NoC , 2014, 2014 International Congress on Technology, Communication and Knowledge (ICTCK).

[24]  Federico Silla,et al.  Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[25]  Katherine Shu-Min Li,et al.  Fault-tolerant mesh for 3D network on chip , 2011, 2011 6th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT).

[26]  Petru Eles,et al.  Fault and energy-aware communication mapping with guaranteed latency for applications implemented on NoC , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[27]  P. Ghosal,et al.  FL2STAR: A novel topology for on-chip routing in NoC with fault tolerance and deadlock prevention , 2013, 2013 IEEE International Conference on Electronics, Computing and Communication Technologies.

[28]  Alessandro Strano,et al.  OSR-Lite: Fast and deadlock-free NoC reconfiguration framework , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[29]  Radu Marculescu,et al.  FARM: Fault-aware resource management in NoC-based multiprocessor platforms , 2011, 2011 Design, Automation & Test in Europe.