Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs

This paper proposes an ultra fine-grained run-time power gating of on-chip router, in which power supply to each router component (e.g., VC queue, crossbar MUX, and output latch) can be individually controlled in response to the applied workload.As only the router components which are just transferring a packet are activated, the leakage power of the on-chip network can be reduced to the near-optimal level.However, a certain amount of wakeup latency is required to activate the sleeping components, and the application performance will be degraded.In this paper, we estimate the wakeup latency for each component based on circuit simulations using a 65nm process.Then we propose four early wakeup methods to overcome the wakeup latency.The proposed router with the early wakeup methods is evaluated in terms of the application performance, area, and leakage power.As a result, it reduces the leakage power by 78.9%, at the expense of the 4.3% area and 4.0% performance when we assume a 1GHz operation.

[1]  H. Lhermet,et al.  An Asynchronous Power Aware and Adaptive NoC Based Circuit , 2009, IEEE Journal of Solid-State Circuits.

[2]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[3]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[4]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[5]  Y. Kojima,et al.  Geyser-1: A MIPS R3000 CPU core with fine grain runtime power gating , 2009, 2009 IEEE Asian Solid-State Circuits Conference.

[6]  Kimiyoshi Usami,et al.  A Design Approach for Fine-grained Run-Time Power Gating using Locally Extracted Sleep Signals , 2006, 2006 International Conference on Computer Design.

[7]  Niraj K. Jha,et al.  Garnet : A Detailed Interconnect Model Inside a Full-System Simulation Framework , .

[8]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[9]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[10]  Mike Galles Spider: a high-speed network interconnect , 1997, IEEE Micro.

[11]  Li-Shiuan Peh,et al.  Leakage power modeling and optimization in interconnection networks , 2003, ISLPED '03.

[12]  Hideharu Amano,et al.  Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[13]  Hideharu Amano,et al.  Run-time power gating of on-chip routers using look-ahead routing , 2008, 2008 Asia and South Pacific Design Automation Conference.

[14]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).