Run-time energy management of manycore systems through reconfigurable interconnects

The active on-chip network channel width has a direct impact on the cache and memory access latency in manycore processors. A good choice of channel width improves the application performance and energy efficiency. In manycore systems, where workload patterns change significantly over time, setting the network channel width statically for the average or worst-case traffic gives sub-optimal energy efficiency. This paper proposes a novel, low-cost method to reconfigure the network channel width at run time to maximize energy efficiency of applications. We analyze the effect of channel width choices for two commonly used cache hierarchies, private and distributed L2 caches, on manycore systems with a bus or crossbar architecture running parallel workloads. The proposed reconfiguration policy predicts the energy-delay product (EDP) for the currently running application at various channel widths and chooses the best fitting width to minimize EDP. The experimental results show that in systems with private and distributed L2 caches our policy reduces EDP by 49.3% and 23.9%, and 65.5% and 20.6% on average with bus and crossbar, respectively, in comparison to statically setting the channel width.

[1]  Karthik Ramani,et al.  Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[2]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[3]  Théodore Marescaux,et al.  Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles , 2005, Design, Automation and Test in Europe.

[4]  Mark Y. Liu,et al.  Technology options for 22nm and beyond , 2010, 2010 International Workshop on Junction Technology Extended Abstracts.

[5]  John Kim,et al.  Low-cost router microarchitecture for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Onur Mutlu,et al.  Express Cube Topologies for on-Chip Interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[7]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[8]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[9]  Jens Sparsø,et al.  ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[10]  S. Nassif,et al.  Full chip leakage-estimation considering power supply and temperature variations , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[11]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[12]  Luigi Carro,et al.  The Need for Reconfigurable Routers in Networks-on-Chip , 2009, ARC.

[13]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[14]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[16]  Kai Shen,et al.  Request behavior variations , 2010, ASPLOS XV.

[17]  An-Yeu Wu,et al.  Traffic- and Thermal-Aware Run-Time Thermal Management Scheme for 3D NoC Systems , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[18]  Hamid Sarbazi-Azad,et al.  An efficient dynamically reconfigurable on-chip network architecture , 2010, Design Automation Conference.

[19]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[20]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[21]  Ahmed Louri,et al.  iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures , 2008, 2008 International Symposium on Computer Architecture.

[22]  Rakesh Kumar,et al.  Workload adaptive shared memory multicore processors with reconfigurable interconnects , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[23]  William J. Dally,et al.  The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[24]  Jeffrey T. Draper,et al.  Dynamic packet fragmentation for increased virtual channel utilization in on-chip routers , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[25]  Jean-Philippe Diguet,et al.  Key Research Issues for Reconfigurable Network-on-Chip , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[26]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[27]  Todd M. Austin,et al.  Polymorphic On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[28]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[29]  Shyamkumar Thoziyoor,et al.  CACTI 5 . 1 , 2008 .

[30]  Tajana Simunic,et al.  Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31]  Dean M. Tullsen,et al.  Proximity-aware directory-based coherence for multi-core processor architectures , 2007, SPAA '07.

[32]  Li-Shiuan Peh,et al.  In-network cache coherence , 2006, IEEE Comput. Archit. Lett..

[33]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[34]  Mark D. Hill,et al.  Coherence Ordering for Ring-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[35]  David A. Wood,et al.  IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.

[36]  Natalie D. Enright Jerger,et al.  SCARAB: A single cycle adaptive routing and bufferless network , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  Gianluca Palermo,et al.  MPSoCs run-time monitoring through Networks-on-Chip , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.