Adaptive fault-tolerant architecture and routing algorithm for reliable many-core 3D-NoC systems

During the last few decades, Three-dimensional Network-on-Chips (3D-NoCs) have been showing their advantages against 2D-NoC architectures. This is thanks to the reduced average interconnect length and lower interconnect-power consumption inherited from Three-dimensional Integrated Circuits (3D-ICs). On the other hand, questions about their reliability is starting to arise. This issue is mainly caused by their complex nature where a single faulty transistor may cause intolerable performance degradation or even the entire system collapse. To ensure their correct functionality, 3D-NoC systems must be fault-tolerant to any short-term malfunction or permanent physical damage to ensure message delivery on time while minimizing the performance degradation as much as possible.In this paper, we present a fault-tolerant 3D-NoC architecture, called 3D-Fault-Tolerant-OASIS (3D-FTO).11This project is partially supported by Competitive research funding, Ref. P1-5, Fukushima, Japan. With the aid of a light-weight routing algorithm, 3D-FTO manages to avoid the system failure at the presence of a large number of transient, intermittent, and permanent faults. Moreover, the proposed architecture is leveraging on reconfigurable components to handle the fault occurrence in links, input-buffers, and crossbar, where the faults are more often to happen. The proposed 3D-FTO system is able to work around different kinds of faults ensuring graceful performance degradation while minimizing the additional hardware complexity and remaining power-efficient. Adaptive fault-tolerant 3D-Network-on-Chip system architecture.RAB mechanism for deadlock recovery and fault-tolerance in input-buffers.Traffic-Prediction-Unit technique for congestion relief.Bypass-Link-on-Demand to tackle fault-occurrence in the Crossbar.Fault-tolerance and graceful performance degradation obtained at high fault-rates.

[1]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[2]  Alan Burns,et al.  Real-Time Systems and Programming Languages - Ada, Real-Time Java and C / Real-Time POSIX, Fourth Edition , 2009, International computer science series.

[3]  Ahmed Louri,et al.  Tackling Permanent Faults in the Network-on-Chip Router Pipeline , 2013, 2013 25th International Symposium on Computer Architecture and High Performance Computing.

[4]  Axel Jantsch,et al.  Methods for fault tolerance in networks-on-chip , 2013, CSUR.

[5]  Akram Ben Ahmed,et al.  Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[6]  Dan Wu,et al.  The parallel algorithm implementation of matrix multiplication based on ESCA , 2010, 2010 IEEE Asia Pacific Conference on Circuits and Systems.

[7]  Partha Pratim Pande,et al.  Performance Evaluation for Three-Dimensional Networks-On-Chip , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[8]  Akram Ben Ahmed,et al.  Graceful deadlock-free fault-tolerant routing algorithm for 3D Network-on-Chip architectures , 2014, J. Parallel Distributed Comput..

[9]  Ahmed Louri,et al.  An Improved Router Design for Reliable On-Chip Networks , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[10]  An-Yeu Wu,et al.  Traffic- and Thermal-Aware Run-Time Thermal Management Scheme for 3D NoC Systems , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[11]  Scott A. Mahlke,et al.  BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[12]  Rajagopalan Sivaram Queuing delays for uniform and nonuniform traffic patterns in a MIN , 1992, SIML.

[13]  Xiaoxia Wu,et al.  Electrical Characterization for Intertier Connections and Timing Analysis for 3-D ICs , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Akram Ben Ahmed,et al.  LA-XYZ: Low Latency, High Throughput Look-Ahead Routing Algorithm for 3D Network-on-Chip (3D-NoC) Architecture , 2012, 2012 IEEE 6th International Symposium on Embedded Multicore SoCs.

[15]  Akram Ben Ahmed,et al.  Architecture and design of high-throughput, low-latency, and fault-tolerant routing algorithm for 3D-network-on-chip (3D-NoC) , 2013, The Journal of Supercomputing.

[16]  Stanislav G. Sedukhin,et al.  The general matrix multiply-add operation on 2D torus , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[17]  Axel Jantsch,et al.  Networks on chip , 2003 .

[18]  Luca Benini,et al.  Characterization and Implementation of Fault-Tolerant Vertical Links for 3-D Networks-on-Chip , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Sudeep Pasricha,et al.  A low overhead fault tolerant routing scheme for 3D Networks-on-Chip , 2011, 2011 12th International Symposium on Quality Electronic Design.

[20]  Sujit Dey,et al.  Fault modeling and simulation for crosstalk in system-on-chip interconnects , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[21]  Masahiro Sowa,et al.  Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization , 2006 .

[22]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[23]  Akram Ben Ahmed,et al.  Low-overhead Routing Algorithm for 3D Network-on-Chip , 2012, 2012 Third International Conference on Networking and Computing.

[24]  Salima Benbernou,et al.  A survey on service quality description , 2013, CSUR.

[25]  Pasi Liljeberg,et al.  Online Reconfigurable Self-Timed Links for Fault Tolerant NoC , 2007, VLSI Design.

[26]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[27]  Luca Benini,et al.  Networks on chips - technology and tools , 2006, The Morgan Kaufmann series in systems on silicon.

[28]  David Blaauw,et al.  A Reliable Routing Architecture and Algorithm for NoCs , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  Jer Min Jou,et al.  Design of a distributed JPEG encoder on a scalable NoC platform , 2008, 2008 IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[30]  Abderazek Ben Abdallah,et al.  Deadlock-Recovery Support for Fault-tolerant Routing Algorithms in 3D-NoC Architectures , 2013 .