Energy-Aware Fault-Tolerant Network-on-Chips for Addressing Multiple Traffic Classes

This paper presents an energy efficient architecture to provide on-demand fault tolerance to multiple traffic classes, running simultaneously on single network on chip (NoC) platform. Today, NoCs host multiple traffic classes with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the classes is neither optimal nor desirable. To reduce the overheads incurred by fault tolerance, various adaptive strategies have been proposed. The proposed techniques rely on individual packet fields and operating conditions to adjust the intensity and hence the overhead of fault tolerance. Presence of multiple traffic classes undermines the effectiveness of these methods. To complement the existing adaptive strategies, we propose on-demand fault tolerance, capable of providing required reliability, while significantly reducing the energy overhead. Our solution relies on a hierarchical agent based control layer and a reconfigurable fault tolerance data path. The control layer identifies the traffic class and directs the packet to the path providing the needed reliability. Simulation results using representative applications (matrix multiplication, FFT, wavefront, and HiperLAN) showed up to 95% decrease in energy consumption compared to traditional worst case methods. Synthesis results have confirmed a negligible additional overhead, for providing on-demand protection (up to 5.3% area), compared to the overall fault tolerance circuitry.

[1]  Axel Jantsch,et al.  A High Level Power Model for the Nostrum NoC , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[2]  Ulrich Egert,et al.  Network on Chips , 2006 .

[3]  Luca Benini,et al.  ReliNoC: A reliable network for priority-based on-chip communication , 2011, 2011 Design, Automation & Test in Europe.

[4]  Axel Jantsch,et al.  Self-adaptive Noc Power Management with Dual-level Agents - Architecture and Implementation , 2012, PECCS.

[5]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[6]  Jean-Michel Chabloz,et al.  Distributed DVFS using rationally-related frequencies and discrete voltage levels , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[7]  Natalie D. Enright Jerger,et al.  SCARAB: A single cycle adaptive routing and bufferless network , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Shuming Chen,et al.  Run-Time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips , 2010, 2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming.

[9]  Alexandre M. Amory,et al.  A High-Fault-Coverage Approach for the Test of Data, Control and Handshake Interconnects in Mesh Networks-on-Chip , 2008, IEEE Transactions on Computers.

[10]  Shekhar Y. Borkar,et al.  Microarchitecture and Design Challenges for Gigascale Integration , 2004, MICRO.

[11]  Jean-Michel Chabloz,et al.  Lowering the latency of interfaces for rationally-related frequencies , 2010, 2010 IEEE International Conference on Computer Design.

[12]  Axel Jantsch,et al.  NNSE: Nostrum Network-on-Chip Simulation Environment , 2005 .

[13]  Teijo Lehtonen On Fault Tolerance Methods for Networks-on-Chip , 2009 .

[14]  Giovanni De Micheli,et al.  A robust self-calibrating transmission scheme for on-chip networks , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Giovanni De Micheli,et al.  An adaptive low-power transmission scheme for on-chip networks , 2002, 15th International Symposium on System Synthesis, 2002..

[16]  Mahmut T. Kandemir,et al.  Fault tolerant algorithms for network-on-chip interconnect , 2004, IEEE Computer Society Annual Symposium on VLSI.

[17]  Paul Ampadu,et al.  Adaptive error control for nanometer scale network-on-chip links , 2009, IET Comput. Digit. Tech..

[18]  Paul Ampadu,et al.  Transient and Permanent Error Co-management Method for Reliable Networks-on-Chip , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[19]  M. J. Irwin,et al.  Adaptive error protection for energy efficiency , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[20]  Hannu Tenhunen,et al.  Compact generic intermediate representation (CGIR) to enable late binding in coarse grained reconfigurable architectures , 2011, 2011 International Conference on Field-Programmable Technology.

[21]  Edward J. McCluskey,et al.  Fault tolerance in adaptive real-time computing systems , 2001 .

[22]  Shuming Chen,et al.  Supporting Distributed Shared Memory on multi-core Network-on-Chips using a dual microcoded controller , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[23]  Paul Ampadu,et al.  Self-Adaptive System for Addressing Permanent Errors in On-Chip Interconnects , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[24]  Uriel Feige,et al.  Exact analysis of hot-potato routing , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[25]  Frank Vahid,et al.  A configurable logic architecture for dynamic hardware/software partitioning , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[26]  Luca Benini,et al.  Error control schemes for on-chip communication links: the energy-reliability tradeoff , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Axel Jantsch,et al.  A fault model notation and error-control scheme for switch-to-switch buses in a network-on-chip , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[28]  Jürgen Teich,et al.  DyNoC: A dynamic infrastructure for communication in dynamically reconfugurable devices , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[29]  Pasi Liljeberg,et al.  Online Reconfigurable Self-Timed Links for Fault Tolerant NoC , 2007, VLSI Design.

[30]  Mary Jane Irwin,et al.  Adapative Error Protection for Energy Efficiency , 2003, ICCAD 2003.

[31]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[32]  George Michelogiannakis,et al.  An analysis of on-chip interconnection networks for large-scale chip multiprocessors , 2010, TACO.

[33]  Ahmed Louri,et al.  iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures , 2008, 2008 International Symposium on Computer Architecture.

[34]  John Kim,et al.  Low-cost router microarchitecture for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35]  Cecilia Metra,et al.  Configurable Error Control Scheme for NoC Signal Integrity , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).