Catnap: energy proportional multiple network-on-chip

Multiple networks have been used in several processor implementations to scale bandwidth and ensure protocol-level deadlock freedom for different message classes. In this paper, we observe that a multiple-network design is also attractive from a power perspective and can be leveraged to achieve energy proportionality by effective power gating. Unlike a single-network design, a multiple-network design is more amenable to power gating, as its subnetworks (subnets) can be power gated without compromising the connectivity of the network. To exploit this opportunity, we propose the Catnap architecture which consists of synergistic subnet selection and power-gating policies. Catnap maximizes the number of consecutive idle cycles in a router, while avoiding performance loss due to overloading a subnet. We evaluate a 256-core processor with a concentrated mesh topology using synthetic traffic and 35 applications. We show that the average network power of a power-gating optimized multiple-network design with four subnets could be 44% lower than a bandwidth equivalent single-network design for an average performance cost of about 5%.

[1]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[2]  Natalie D. Enright Jerger,et al.  On-Chip Networks , 2009, On-Chip Networks.

[3]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[4]  Pedro López,et al.  A family of mechanisms for congestion control in wormhole networks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[5]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[6]  Sharad Malik,et al.  A Power Model for Routers: Modeling Alpha 21364 and InfiniBand Routers , 2003, IEEE Micro.

[7]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[8]  Stephen W. Keckler,et al.  Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[9]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[10]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[11]  Jason Howard A 48-core IA-32 processor with on-die message-passing and DVFS in 45nm CMOS , 2010, 2010 IEEE Asian Solid-State Circuits Conference.

[12]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[13]  Pradip Bose,et al.  Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[14]  Ren Wang,et al.  Energy-efficient interconnect via Router Parking , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[15]  Mithuna Thottethodi,et al.  Self-tuned congestion control for multiprocessor networks , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[16]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[17]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[18]  Shekhar Y. Borkar,et al.  Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[19]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[20]  Hideharu Amano,et al.  Run-time power gating of on-chip routers using look-ahead routing , 2008, 2008 Asia and South Pacific Design Automation Conference.

[21]  Giovanni De Micheli,et al.  CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[22]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[23]  Hiroshi Nakamura,et al.  Performance, Area, and Power Evaluations of Ultrafine-Grained Run-Time Power-Gating Routers for CMPs , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Mitchell Hayenga,et al.  Pitfalls of ORION-Based Simulation , 2012 .

[25]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[26]  Jesús Camacho Villanueva,et al.  HPC-Mesh: A Homogeneous Parallel Concentrated Mesh for Fault-Tolerance and Energy Savings , 2011, 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems.

[27]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[28]  Coniferous softwood GENERAL TERMS , 2003 .

[29]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[30]  Lizhong Chen,et al.  NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[31]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[32]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[33]  Hiroshi Nakamura,et al.  Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[34]  Chita R. Das,et al.  Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35]  Sharad Malik,et al.  A power model for routers: modeling Alpha 21364 and InfiniBand routers , 2002, Proceedings 10th Symposium on High Performance Interconnects.