Fault-tolerant communication in invasive networks on chip

Dependability and fault tolerance will play an ever increasing role when using future technology nodes. The paper presents a fault-tolerance strategy for invasive networks on chip (i-NoC). The strategy focuses on permanent faults, resulting from either process fluctuations or aging effects and briefly outlines counter measurements against transient faults. We propose a scalable scheme for detection and localization of defects in NoCs. The localization scheme is used as a basis for disabling faulty routers. We propose a transparent bypass scheme to circumvent faulty routers and regions. It uses an architecture extension in the form of an additional lightweight network layer. The fault tolerance layer can be configured at run time according to the current fault map of the architecture. The presented evaluations analyze the fault coverage of the proposed detection and localization strategy. We also investigate the implementation cost and performance impact of the fault tolerance network layer.

[1]  Kwang-Ting Cheng,et al.  End-to-end error correction and online diagnosis for on-chip networks , 2011, 2011 IEEE International Test Conference.

[2]  Axel Jantsch,et al.  A reconfigurable fault-tolerant deflection routing algorithm based on reinforcement learning for network-on-chip , 2010, NoCArc '10.

[3]  Raimund Ubar,et al.  Test Configurations for Diagnosing Faulty Links in NoC Switches , 2007, 12th IEEE European Test Symposium (ETS'07).

[4]  Jürgen Teich,et al.  Invasive Algorithms and Architectures Invasive Algorithmen und Architekturen , 2008, it Inf. Technol..

[5]  Alexandre M. Amory,et al.  A High-Fault-Coverage Approach for the Test of Data, Control and Handshake Interconnects in Mesh Networks-on-Chip , 2008, IEEE Transactions on Computers.

[6]  Manfred Glesner,et al.  Deadlock-free routing and component placement for irregular mesh-based networks-on-chip , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[7]  Mario Gerla,et al.  Flow Control: A Comparative Survey , 1980, IEEE Trans. Commun..

[8]  Jürgen Teich,et al.  The Invasive Network on Chip - A Multi-Objective Many-Core Communication Infrastructure , 2014, ARCS Workshops.

[9]  Jie Wu,et al.  Fault-tolerant and deadlock-free routing in 2-D meshes using rectilinear-monotone polygonal fault blocks , 2005, Parallel Algorithms Appl..

[10]  Marco Platzner,et al.  Design and architectures for dependable embedded systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[11]  Ahmad Patooghy,et al.  XYX: A Power & Performance Efficient Fault-Tolerant Routing Algorithm for Network on Chip , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[12]  Jürgen Teich,et al.  Invasive Computing: An Overview , 2011, Multiprocessor System-on-Chip.

[13]  Howard Jay Siegel,et al.  OE+IOE: A novel turn model based fault tolerant routing scheme for networks-on-chip , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[14]  Luca Benini,et al.  A distributed and topology-agnostic approach for on-line NoC testing , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[15]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[16]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[17]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[18]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[19]  Scott A. Mahlke,et al.  BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[20]  Jürgen Teich,et al.  DyNoC: A dynamic infrastructure for communication in dynamically reconfugurable devices , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[21]  Jürgen Teich,et al.  CAP: Communication aware programming , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[22]  Robert C. Aitken,et al.  Low Power Methodology Manual - for System-on-Chip Design , 2007 .

[23]  Jürgen Becker,et al.  Providing multiple hard latency and throughput guarantees for packet switching networks on chip , 2013, Comput. Electr. Eng..

[24]  Michael Glaß,et al.  DAARM: Design-time application analysis and run-time mapping for predictable execution in many-core systems , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[25]  Vincenzo Catania,et al.  Implementation and Analysis of a New Selection Strategy for Adaptive Routing in Networks-on-Chip , 2008, IEEE Transactions on Computers.

[26]  Axel Jantsch,et al.  Methods for fault tolerance in networks-on-chip , 2013, CSUR.

[27]  Luca Benini,et al.  A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[28]  Jörg Henkel,et al.  Invasive manycore architectures , 2012, 17th Asia and South Pacific Design Automation Conference.