Power efficiency of switch architecture extensions for fault tolerant NoC design

The increasingly parallel landscape of embedded computing platforms is bringing the reliability concern for the on-chip interconnection network (NoC) to the forefront. While very few works in the open literature bring their error recovery mechanisms down to microarchitectural and physical implementation, this paper documents the effort of optimizing a baseline NoC switch architecture for different fault-tolerant strategies against single-event upsets. As key contributions achieved, we not only come up with a new efficient fault-tolerant flow control protocol, but also we contrast correction vs. retransmission oriented switch microarchitectures, each implementing both data and control path protection, with physical implementation awareness. The accuracy of the analysis methodology enables us to report counterintuitive power-reliability trade-offs between the design points, serving as guidelines for implementing fault-tolerant communication in a power-constrained environment.

[1]  Luigi Carro,et al.  Dependable Network-on-Chip Router Able to Simultaneously Tolerate Soft Errors and Crosstalk , 2006, 2006 IEEE International Test Conference.

[2]  Federico Angiolini,et al.  /spl times/pipes Lite: a synthesis oriented design library for networks on chips , 2005, Design, Automation and Test in Europe.

[3]  Cecilia Metra,et al.  Configurable Error Control Scheme for NoC Signal Integrity , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[4]  Luca Benini,et al.  Xpipes: a latency insensitive parameterized network-on-chip architecture for multiprocessor SoCs , 2003, Proceedings 21st International Conference on Computer Design.

[5]  Norman P. Jouppi,et al.  Motivating Commodity Multi-Core Processor Design for System-level Error Protection , 2022 .

[6]  Chita R. Das,et al.  Exploring Fault-Tolerant Network-on-Chip Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[7]  Scott A. Mahlke,et al.  BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[8]  Cecilia Metra,et al.  Exploiting ECC redundancy to minimize crosstalk impact , 2005, IEEE Design & Test of Computers.

[9]  Federico Silla,et al.  Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[10]  Luigi Carro,et al.  Fault tolerant mechanism to improve yield in NoCs using a reconfigurable router , 2009, SBCCI.

[11]  Jeffrey T. Draper,et al.  Fault-Tolerant Flow Control in On-chip Networks , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[12]  William J. Dally,et al.  Architecture and implementation of the reliable router , 1994, Symposium Record Hot Interconnects II.

[13]  David Blaauw,et al.  Vicis: A reliable network for unreliable silicon , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[14]  Martin Radetzki,et al.  Fault Tolerant Network on Chip Switching With Graceful Performance Degradation , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Luca Benini,et al.  Fault Tolerance Overhead in Network-on-Chip Flow Control Schemes , 2005, 2005 18th Symposium on Integrated Circuits and Systems Design.

[16]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[17]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[18]  Luca Benini,et al.  Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs , 2003, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[19]  Luca Benini,et al.  Error control schemes for on-chip communication links: the energy-reliability tradeoff , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Paul Ampadu,et al.  Adaptive Error Control for NoC Switch-to-Switch Links in a Variable Noise Environment , 2008, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems.

[21]  Michele Favalli,et al.  A complete self-testing and self-configuring NoC infrastructure for cost-effective MPSoCs , 2013, TECS.

[22]  M. Nicolaidis,et al.  Design for soft error mitigation , 2005, IEEE Transactions on Device and Materials Reliability.