Realizing complexity-effective on-chip power delivery for many-core platforms by exploiting optimized mapping

In the recent years, many-core platforms have emerged to boost performance while meeting tight power constraints. Per-core Dynamic Voltage and Frequency Scaling (DVFS) maximizes energy savings and meets the performance requirements of a given workload. Given a limited number of I/O pins and the need for finer control of voltage and frequency settings per core, there is a substantial cost in using off-chip voltage regulators. Consequently, there has been increased attention on the use of on-chip voltage regulators (OCVR) in many-core systems. However, integrating OCVRs comes at a cost of reduced power conversion efficiency (PCE) and increased complexity in the power delivery network and management of the OCVRs. In this paper, the effect of PCE on the thread-to-core mapping algorithm is investigated and the importance of the PCE-aware mapping scheme to optimize energy-efficiency is highlighted. Based on the results, up to 38% more energy savings is achieved as compared to PCE-agnostic algorithms. Moreover, the impact of core clustering granularity and process variation on the total efficiency of the system is explored. When relaxing the energy constraints by just 10%, an effective mapping reduces the complexity of the power delivery system by allowing the use of a significantly smaller number of voltage regulators, as compared to per-core OCVR. The results provided in the paper indicate an important opportunity for system and circuit co-design to implement energy-efficient and complexity-effective platforms for a target workload.

[1]  Sriram R. Vangal,et al.  A 2 Tb/s 6$\,\times\,$ 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, IEEE Journal of Solid-State Circuits.

[2]  Håkan Grahn,et al.  ParMiBench - An Open-Source Benchmark for Embedded Multiprocessor Systems , 2010, IEEE Computer Architecture Letters.

[3]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[4]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[5]  Minyi Guo,et al.  AgileRegulator: A hybrid voltage regulator scheme redeeming dark silicon for power efficiency in a multicore architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[6]  Houman Homayoun,et al.  ElasticCore: Enabling dynamic heterogeneity with joint core and voltage/frequency scaling , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Krishna K. Rangan,et al.  Achieving uniform performance and maximizing throughput in the presence of heterogeneity , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[8]  Gu-Yeon Wei,et al.  A Fully-Integrated 3-Level DC-DC Converter for Nanosecond-Scale DVFS , 2012, IEEE Journal of Solid-State Circuits.

[9]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Stijn Eyerman,et al.  Fine-grained DVFS using on-chip regulators , 2011, TACO.

[11]  Zhiyu Zeng,et al.  Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation , 2010, Design Automation Conference.

[12]  Alireza Ejlali,et al.  DRVS: Power-efficient reliability management through Dynamic Redundancy and Voltage Scaling under variations , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[13]  Mahmut T. Kandemir,et al.  Process variation aware thread mapping for Chip Multiprocessors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[14]  Nam Sung Kim,et al.  Low-Cost Per-Core Voltage Domain Support for Power-Constrained High-Performance Processors , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Eby G. Friedman,et al.  Active Filter-Based Hybrid On-Chip DC–DC Converter for Point-of-Load Voltage Regulation , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[17]  R. Kumar,et al.  An Integrated Quad-Core Opteron Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[18]  Houman Homayoun,et al.  Energy-efficient mapping of biomedical applications on domain-specific accelerator under process variation , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[19]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[20]  Yong Kim,et al.  The 12-Core POWER8™ Processor With 7.6 Tb/s IO Bandwidth, Integrated Voltage Regulation, and Resonant Clocking , 2015, IEEE Journal of Solid-State Circuits.

[21]  Saurabh Dighe,et al.  Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor , 2011, IEEE Journal of Solid-State Circuits.

[22]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[23]  Eby G. Friedman,et al.  Heterogeneous Methodology for Energy Efficient Distribution of On-Chip Power Supplies , 2013, IEEE Transactions on Power Electronics.

[24]  Lothar Thiele,et al.  Dynamic Power-Aware Mapping of Applications onto Heterogeneous MPSoC Platforms , 2010, IEEE Transactions on Industrial Informatics.

[25]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26]  Zhiyi Yu,et al.  A 167-Processor Computational Platform in 65 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.

[27]  Siddharth Garg,et al.  Cherry-picking: Exploiting process variations in dark-silicon homogeneous chip multi-processors , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Stéphan Jourdan,et al.  Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.