SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems

Reducing the energy to carry out computational tasks is key to almost any computing application. We focus in this paper on iterative applications that have explicit computational deadlines per iteration. Our objective is to meet the computational deadlines while minimizing energy. We leverage the vast configuration space offered by heterogeneous multicore platforms which typically expose three dimensions for energy saving configurability: Voltage/frequency levels, thread count and core type (e.g. ARM big/LITTLE). We note that when choosing the most energy-efficient configuration that meets the computational deadline, an iteration will typically finish before the deadline and execution-time slack will build up across iterations. Our proposed slack management policy - SaC (Slack as a Currency) - proactively explores the configuration space to select configurations that can save substantial amounts of energy. To avoid the overheads of an exhaustive search of the configuration space, our proposal also comprises a low-overhead, on-line method by which one can assess each point in the configuration space by linearly interpolating between the endpoints in each configuration-space dimension. Overall, we show that our proposed slack management policy and linear-interpolation configuration assessment method can yield 62% energy savings on top of race-to-idle without missing any deadlines.

[1]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Houman Homayoun,et al.  Power conversion efficiency-aware mapping of multithreaded applications on heterogeneous architectures: A comprehensive parameter tuning , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[3]  Marios C. Papaefthymiou,et al.  Utilizing Dark Silicon to Save Energy with Computational Sprinting , 2013, IEEE Micro.

[4]  Christina Delimitrou,et al.  QoS-Aware scheduling in heterogeneous datacenters with paragon , 2013, TOCS.

[5]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[6]  Stephen L. Olivier,et al.  Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[7]  Michel Dubois,et al.  Dynamic MIPS Rate Stabilization for Complex Processors , 2015, TACO.

[8]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Li Shen,et al.  PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Naehyuck Chang,et al.  Accurate Modeling of the Delay and Energy Overhead of Dynamic Voltage and Frequency Scaling in Modern Microprocessors , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Susanne Albers,et al.  Race to idle: New algorithms for speed scaling with a sleep state , 2012, TALG.

[12]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[13]  Nikil D. Dutt,et al.  SPARTA: Runtime task allocation for energy efficient heterogeneous manycores , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[14]  Rami G. Melhem,et al.  Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems , 2015, ACM Trans. Embed. Comput. Syst..

[15]  Yale N. Patt,et al.  Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.

[16]  Sally A. McKee,et al.  An LTE Uplink Receiver PHY benchmark and subframe-based power management , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[17]  Stijn Eyerman,et al.  Fine-grained DVFS using on-chip regulators , 2011, TACO.

[18]  Nikola Markovic Hardware thread scheduling algorithms for single-ISA asymmetric CMPs , 2015 .

[19]  Daniel Mossé,et al.  Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[20]  Hongsuk Chung Heterogeneous MultiProcessing Solution of Exynos 5 Octa with ARM ® big , 2013 .

[21]  Marco Danelutto,et al.  A Reconfiguration Algorithm for Power-Aware Parallel Applications , 2016, ACM Trans. Archit. Code Optim..

[22]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[23]  Ivan Lin,et al.  ARM platform for performance and power efficiency — Hardware and software perspectives , 2016, 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[24]  Christopher J. Hughes,et al.  Saving energy with architectural and frequency adaptations for multimedia applications , 2001, MICRO.

[25]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[26]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[27]  David E. Culler,et al.  Power Optimization - a Reality Check , 2009 .