Energy-efficient interconnect via Router Parking

The increase in on-chip core counts in Chip Multi Processors (CMPs) has led to the adoption of interconnects such as Mesh and Torus, which consume an increasing fraction of the chip power. Moreover, as technology and voltage continue to scale down, static power consumes a larger fraction of the total power; reducing it is increasingly important for energy proportional computing. Currently, processor designers strive to send under-utilized cores into deep sleep states in order to reduce idling power and improve overall energy efficiency. However, even in state-of-the-art CMP designs, when a core goes to sleep the router attached to it remains active in order to continue packet forwarding. In this paper, we propose Router Parking - selectively power-gating routers attached to parked cores. Router Parking ensures that network connectivity is maintained, and limits the average interconnect latency impact of packet detouring around parked routers. We present two Router Parking algorithms - an aggressive approach to park as many routers as possible, and a conservative approach that parks a limited set of routers in order to keep the impact on latency increase minimal. Further, we propose an adaptive policy to choose between the two algorithms at run-time. We evaluate our algorithms using both synthetic traffic as well as real workloads taken from SPEC CPU2006 and PARSEC 2.1 benchmark suites. Our evaluation results show that Router Parking can achieve significant savings in the total interconnect energy (average of 32%, 40% and 41% for the synthetic, SPEC CPU2006, and PARSEC 2.1 workloads, respectively).

[1]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  David Wentzlaff,et al.  Energy characterization of a tiled architecture processor with on-chip networks , 2003, ISLPED '03.

[3]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[4]  Srinivasan Seshan,et al.  On-chip networks from a networking perspective: congestion and scalability in many-core interconnects , 2012, SIGCOMM '12.

[5]  John Kim,et al.  FlexiBuffer: Reducing leakage power in on-chip network routers , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Chita R. Das,et al.  A case for heterogeneous on-chip interconnects for CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[7]  Hiroshi Nakamura,et al.  Performance, Area, and Power Evaluations of Ultrafine-Grained Run-Time Power-Gating Routers for CMPs , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Hideharu Amano,et al.  Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks , 2008 .

[9]  José Duato,et al.  A General Theory for Deadlock-Free Adaptive Routing Using a Mixed Set of Resources , 2001, IEEE Trans. Parallel Distributed Syst..

[10]  Chita R. Das,et al.  A case for dynamic frequency tuning in on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Li Shang,et al.  Dynamic voltage scaling with links for power optimization of interconnection networks , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[13]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[14]  PehLi-Shiuan,et al.  Exploring the Design Space of Self-Regulating Power-Aware On/Off Interconnection Networks , 2007 .

[15]  Jie Wu,et al.  A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model , 2003, IEEE Trans. Computers.

[16]  Li-Shiuan Peh,et al.  Exploring the Design Space of Self-Regulating Power-Aware On/Off Interconnection Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[17]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[18]  José Duato,et al.  PC-Mesh: A Dynamic Parallel Concentrated Mesh , 2011, 2011 International Conference on Parallel Processing.

[19]  Éva Tardos,et al.  Algorithm design , 2005 .

[20]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[21]  Li-Shiuan Peh,et al.  ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[22]  José Duato,et al.  An Efficient and Deadlock-Free Network Reconfiguration Protocol , 2008, IEEE Transactions on Computers.

[23]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[24]  Li-Shiuan Peh,et al.  Leakage power modeling and optimization in interconnection networks , 2003, ISLPED '03.

[25]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[26]  Jesús Camacho Villanueva,et al.  HPC-Mesh: A Homogeneous Parallel Concentrated Mesh for Fault-Tolerance and Energy Savings , 2011, 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems.

[27]  Antonio Robles,et al.  An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori , 2004, IEEE Computer Architecture Letters.

[28]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[29]  Arnab Banerjee,et al.  A Power and Energy Exploration of Network-on-Chip Architectures , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[30]  Michael Burrows,et al.  Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links , 1991, IEEE J. Sel. Areas Commun..