Smart reconfiguration approach for fault-tolerant NoC based MPSoCs

Newest technologies of integrated circuits fabrication allow billions of transistors arranged in a single chip enabling to implement a complex parallel system, which requires a high scalable and parallel communication architecture, such as a Network-on-Chip (NoC). These technologies are very close to physical limitations increasing faults in manufacture and at runtime. Thus, it is essential to provide a fault recovery mechanism for NoC operation in the presence of faults. The preprocessing of the most probable fault scenarios and flits retransmission capability enable to anticipate the calculation of deadlock-free routings, reducing the time necessary to interrupt the system in a fault occurrence and maintaining links operating with retransmission capability. This work proposes a smart decisions mechanism for errors on NoC links, which is composed of a hardware part implemented into the links and routers, and a software part implemented inside an operating system kernel of each processor. The mechanism defines thresholds where is better to reconfigure the NoC or to retransmit flits with errors. Experimental results, with several NoC sizes and some error models, suggest when is better to reconfigure the NoC and when is better to maintain some links operating with eventual faults.

[1]  Shashi Kumar,et al.  Deadlock free routing algorithms for irregular mesh topology NoC systems with rectangular regions , 2008, J. Syst. Archit..

[2]  Paul Ampadu,et al.  A Dual-Layer Method for Transient and Permanent Error Co-Management in NoC Links , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[3]  Thais Webber,et al.  A fault prediction module for a fault tolerant NoC operation , 2015, Sixteenth International Symposium on Quality Electronic Design.

[4]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[5]  Federico Silla,et al.  Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Ning Wu,et al.  Fault-tolerant schemes for NoC with a network monitor , 2010, 2010 10th International Symposium on Communications and Information Technologies.

[7]  Sherief Reda,et al.  Within-die process variations: How accurately can they be statistically modeled? , 2008, 2008 Asia and South Pacific Design Automation Conference.

[8]  David Blaauw,et al.  A Reliable Routing Architecture and Algorithm for NoCs , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Pedro López,et al.  Region-Based Routing: A Mechanism to Support Efficient Routing Algorithms in NoCs , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Paulo Cortez,et al.  Preprocessing of Scenarios for Fast and Efficient Routing Reconfiguration in Fault-Tolerant NoCs , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[11]  Fei Xia,et al.  Monitoring circuit based on threshold for fault-tolerant NoC , 2010 .

[12]  Kazumi Hatayama,et al.  A Variability-Aware Adaptive Test Flow for Test Quality Improvement , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Li-Shiuan Peh,et al.  ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[14]  Masaru Fukushi,et al.  A Region-based Fault-Tolerant Routing Algorithmfor 2D Irregular Mesh Network-on-Chip , 2013, J. Electron. Test..

[15]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  Alexandre M. Amory,et al.  Phoenix NoC: A distributed fault tolerant architecture , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[17]  Ran Ginosar,et al.  Routing Table Minimization for Irregular Mesh NoCs , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[18]  Axel Jantsch,et al.  Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Shashi Kumar,et al.  A Method for Router Table Compression for Application Specific Routing in Mesh Topology NoC Architectures , 2006, SAMOS.

[20]  Axel Jantsch,et al.  Methods for fault tolerance in networks-on-chip , 2013, CSUR.

[21]  David Blaauw,et al.  Vicis: A reliable network for unreliable silicon , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[22]  Gerard Ghibaudo,et al.  Evolution of low frequency noise and noise variability through CMOS bulk technology nodes , 2013 .

[23]  José L. Sánchez,et al.  Network-on-Chip virtualization in Chip-Multiprocessor Systems , 2012, J. Syst. Archit..