Optimizing power and performance for reliable on-chip networks

We propose novel techniques to minimize the power and performance penalties in protecting the NoC against soft errors, while giving desired reliability guarantees. Some applications have inherent error tolerance which can be exploited to save power, by turning off the error correction mechanisms for a fraction of the total time without trading off reliability. To further increase the power savings, we bound the vulnerability of a router by throttling the traffic into the router. In order to minimize the throughput loss due to throttling, we propose dividing the die into domains and using multiple vulnerability bounds across these domains. We explore both static and dynamic selection of vulnerability bounds. We find that for applications with an error tolerance of 10% of the raw error rate, the dynamic multiple vulnerability bound scheme can save up to 44% of power expended for error correction at a marginal network throughput loss of 3%.

[1]  Axel Jantsch,et al.  Low-power and error protection coding for network-on-chip traffic , 2008, IET Comput. Digit. Tech..

[2]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[3]  Shubhendu S. Mukherjee,et al.  Measuring Architectural Vulnerability Factors , 2003, IEEE Micro.

[4]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Luca Benini,et al.  Low power error resilient encoding for on-chip data buses , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[6]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[7]  Jaume Segura,et al.  CMOS Electronics: How It Works, How It Fails , 2004 .

[8]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[9]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[10]  Ahmed Louri,et al.  iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures , 2008, 2008 International Symposium on Computer Architecture.

[11]  Priyadarsan Patra,et al.  Impact of Process and Temperature Variations on Network-on-Chip Design Exploration , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[12]  XII. References , 1990 .

[13]  Sharad Malik,et al.  Power-driven Design of Router Microarchitectures in On-chip Networks , 2003, MICRO.

[14]  Hideharu Amano,et al.  A Lightweight Fault-Tolerant Mechanism for Network-on-Chip , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[15]  Chita R. Das,et al.  Exploring Fault-Tolerant Network-on-Chip Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[16]  Mary Jane Irwin,et al.  Adapative Error Protection for Energy Efficiency , 2003, ICCAD 2003.

[17]  Stephen W. Keckler,et al.  Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[18]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.