Design of Fault Tolerant Network Interfaces for NoCs

Networks-on-Chip (NoCs) appeared as a strategy to deal with the communication requirements of complex IP-based System-on-Chips. As the complexity of designs increases and the technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the NoC components increases. This paper focuses on the study and evaluation of techniques for increasing reliability and resilience of Network Interfaces (NIs). NIs act as interfaces between IP cores and the communication infrastructure, a faulty behavior in them could affect therefore the overall system. In this work, we propose a functional fault model for the NI components, and we present a two-level fault tolerant solution that can be employed for mitigating the effects of both single-event upset soft errors and hard errors on the NI. Experiments show that with a limited overhead we can obtain a significant reliability of the NI, while saving up to 83% in area with respect to a standard Triple Modular Redundancy implementation, as well as a significant energy reduction.

[1]  S. Niranjan,et al.  A comparison of fault-tolerant state machine architectures for space-borne electronics , 1996, IEEE Trans. Reliab..

[2]  Jari Nurmi,et al.  Buffer implementation for Proteo network-on-chip , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[3]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[4]  Radu Marculescu Networks-on-chip: the quest for on-chip fault-tolerant communication , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[5]  Pasi Liljeberg,et al.  Online Reconfigurable Self-Timed Links for Fault Tolerant NoC , 2007, VLSI Design.

[6]  Joseph A. Catania Soft Errors in Electronic Memory – A White Paper , 2022 .

[7]  Davide Bertozzi,et al.  Network Interface Sharing Techniques for Area Optimized NoC Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[8]  Xiaojun Li,et al.  Electronic circuit reliability modeling , 2006, Microelectron. Reliab..

[9]  David Blaauw,et al.  Vicis: A reliable network for unreliable silicon , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[10]  Luca Benini,et al.  Networks on chips - technology and tools , 2006, The Morgan Kaufmann series in systems on silicon.

[11]  Sule Ozev,et al.  Tolerating hard faults in microprocessor array structures , 2004, International Conference on Dependable Systems and Networks, 2004.

[12]  J. Plosila,et al.  On Fault Tolerance Techniques towards Nanoscale Circuits and Systems , 2005 .

[13]  Spyros Tragoudas,et al.  Interconnect testing for networks on chips , 2006, 24th IEEE VLSI Test Symposium.

[14]  G. Gasiot,et al.  Impacts of front-end and middle-end process modifications on terrestrial soft error rate , 2005, IEEE Transactions on Device and Materials Reliability.

[15]  Pasi Liljeberg,et al.  Multi network interface architectures for fault tolerant Network-on-Chip , 2009, 2009 International Symposium on Signals, Circuits and Systems.

[16]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[17]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[18]  Federico Silla,et al.  Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[19]  Kanad Chakraborty,et al.  Testing and Reliability Techniques for High-Bandwidth Embedded RAMs , 2004, J. Electron. Test..

[20]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[21]  Alain Greiner,et al.  Bi-Synchronous FIFO for Synchronous Circuit Communication Well Suited for Network-on-Chip in GALS Architectures , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[22]  Luca Benini,et al.  Synthesis of low-overhead configurable source routing tables for network interfaces , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[23]  Pasi Liljeberg,et al.  Fault Tolerance Analysis of NoC Architectures , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[24]  Luigi Carro,et al.  Crosstalk- and SEU-Aware Networks on Chips , 2007, IEEE Design & Test of Computers.

[25]  Kees G. W. Goossens,et al.  An efficient on-chip NI offering guaranteed services, shared-memory abstraction, and flexible network configuration , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Shyue-Kung Lu,et al.  Fault tolerance techniques for high capacity RAM , 2006, IEEE Transactions on Reliability.

[27]  Luca Benini,et al.  Performability/Energy Tradeoff in Error-Control Schemes for On-Chip Networks , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[28]  A. Singh,et al.  Fault-tolerant systems , 1990, Computer.

[29]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[30]  Paul Ampadu,et al.  Transient and Permanent Error Co-management Method for Reliable Networks-on-Chip , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[31]  Armin Alaghi,et al.  Online Network-on-Chip Switch Fault Detection and Diagnosis Using Functional Switch Faults , 2008, J. Univers. Comput. Sci..