Power punch: Towards non-blocking power-gating of NoC routers

As chip designs penetrate further into the dark silicon era, innovative techniques are much needed to power off idle or under-utilized system components while having minimal impact on performance. On-chip network routers are potentially good targets for power-gating, but packets in the network can be significantly delayed as their paths may be blocked by powered-off routers. In this paper, we propose Power Punch, a novel performance-aware, power reduction scheme that aims to achieve non-blocking power-gating of on-chip network routers. Two mechanisms are proposed that not only allow power control signals to utilize existing slack at source nodes to wake up powered-off routers along the first few hops before packets are injected, but also allow these signals to utilize hop count slack by staying ahead of packets to "punch through " any blocked routers along the imminent path of packets, preventing packets from having to suffer router wakeup latency or packet detour latency. Full system evaluation on PARSEC benchmarks shows Power Punch saves more than 83% of router static energy while having an execution time penalty of less than 0.4%, effectively achieving near non-blocking power-gating of on-chip network routers.

[1]  Pradip Bose,et al.  A case for guarded power gating for multi-core processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[2]  José Duato,et al.  A theory for deadlock-free dynamic network reconfiguration. Part I , 2005, IEEE Transactions on Parallel and Distributed Systems.

[3]  José Duato,et al.  Towards an Efficient NoC Topology through Multiple Injection Ports , 2011, 2011 14th Euromicro Conference on Digital System Design.

[4]  Sudhir K. Satpathy,et al.  Catnap: energy proportional multiple network-on-chip , 2013, ISCA.

[5]  Reetuparna Das,et al.  Power-aware NoCs through routing and topology reconfiguration , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Chita R. Das,et al.  Aérgia: exploiting packet latency slack in on-chip networks , 2010, ISCA.

[7]  Karthikeyan Sankaralingam,et al.  On-Chip Interconnection Networks of the TRIPS Chip , 2007, IEEE Micro.

[8]  Hiroshi Nakamura,et al.  Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[9]  Hideharu Amano,et al.  Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks , 2008 .

[10]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[11]  Chen Sun,et al.  DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[12]  Lizhong Chen,et al.  NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[14]  Hideharu Amano,et al.  Run-time power gating of on-chip routers using look-ahead routing , 2008, 2008 Asia and South Pacific Design Automation Conference.

[15]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[16]  Pradip Bose,et al.  Dynamic power gating with quality guarantees , 2009, ISLPED.

[17]  Natalie D. Enright Jerger,et al.  SCARAB: A single cycle adaptive routing and bufferless network , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[19]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[20]  Massoud Pedram,et al.  Smart Butterfly: Reducing static power dissipation of network-on-chip with core-state-awareness , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[21]  Onur Mutlu,et al.  Express Cube Topologies for on-Chip Interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[22]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[23]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[24]  José Duato,et al.  A methodology for developing deadlock-free dynamic network reconfiguration processes. Part II , 2005, IEEE Transactions on Parallel and Distributed Systems.

[25]  Chris Fallin,et al.  CHIPPER: A low-complexity bufferless deflection router , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[26]  Pradip Bose,et al.  Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[27]  Ren Wang,et al.  Energy-efficient interconnect via Router Parking , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[28]  Lizhong Chen,et al.  MP3: Minimizing performance penalty for power-gating of Clos network-on-chip , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[29]  Anantha Chandrakasan,et al.  SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).