Heterogeneous microarchitectures trump voltage scaling for low-power cores

Heterogeneous architectures offer many potential avenues for improving energy efficiency in today's low-power cores. Two common approaches are dynamic voltage/frequency scaling (DVFS) and heterogeneous microarchitectures (HMs). Traditionally both approaches have incurred large switching overheads, which limit their applicability to coarse-grain program phases. However, recent research has demonstrated low-overhead mechanisms that enable switching at granularities as low as 1K instructions. The question remains, in this fine-grained switching regime, which form of heterogeneity offers better energy efficiency for a given level of performance? The effectiveness of these techniques depend critically on both efficient architectural implementation and accurate scheduling to maximize energy efficiency for a given level of performance. Therefore, we develop PaTH, an offline analysis tool, to compute (near-)optimal schedules, allowing us to determine Pareto-optimal energy savings for a given architecture. We leverage PaTH to study the potential energy efficiency of fine-grained DVFS and HMs, as well as a hybrid approach. We show that HMs achieve higher energy savings than DVFS for a given level of performance. While at a coarse granularity the combination of DVFS and HMs still proves beneficial, for fine-grained scheduling their combination makes little sense as HMs alone provide the bulk of the energy efficiency.

[1]  Xiang Pan,et al.  Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[2]  Mark Horowitz,et al.  Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.

[3]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[4]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[5]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  John Crossley,et al.  A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering 0.19W/mm2 at 73% efficiency , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[7]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[8]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  John Paul Shen,et al.  Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[10]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[11]  Eric Rotenberg,et al.  Core-Selectability in Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[12]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[13]  Eric Rotenberg,et al.  Architectural Contesting , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[14]  Gu-Yeon Wei,et al.  A Fully-Integrated 3-Level DC-DC Converter for Nanosecond-Scale DVFS , 2012, IEEE Journal of Solid-State Circuits.

[15]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[16]  Niket Kumar Choudhary,et al.  Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era , 2012, TACO.

[17]  Sharad Malik,et al.  Bounds on power savings using runtime dynamic voltage scaling: an exact algorithm and a linear-time heuristic approximation , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[18]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[19]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[20]  Nam Sung Kim,et al.  Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[21]  Vanish Talwar,et al.  Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems , 2008, IEEE Micro.

[22]  Srilatha Manne,et al.  Power and energy reduction via pipeline balancing , 2001, ISCA 2001.

[23]  Hiroto Yasuura,et al.  Voltage scheduling problem for dynamically variable voltage processors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[24]  Stijn Eyerman,et al.  Fine-grained DVFS using on-chip regulators , 2011, TACO.

[25]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26]  Sharad Malik,et al.  Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.

[27]  Michael F. P. O'Boyle,et al.  A Predictive Model for Dynamic Microarchitectural Adaptivity Control , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[28]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[29]  Philippe Flatresse Planar fully depleted silicon technology to design competitive SOC at 28 nm and beyond , 2012 .

[30]  Ronald Dreslinski Near Threshold Computing: From Single Core to Many-Core Energy Efficient Architectures , 2011 .

[31]  Mark Horowitz,et al.  Scaling, Power and the Future of CMOS , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).

[32]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  Michael C. Huang,et al.  Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.

[34]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[35]  Michel Dubois,et al.  Dynamic MIPS rate stabilization in out-of-order processors , 2009, ISCA '09.

[36]  Hai Li,et al.  VSV: L2-miss-driven variable supply-voltage scaling for low power , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[37]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[38]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[39]  References , 1971 .

[40]  Sharad Malik,et al.  Efficient behavior-driven runtime dynamic voltage scaling policies , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).