Impact of Cache Voltage Scaling on Energy-Time Pareto Frontier in Multicore Systems

Abstract High performance computing centers need to keep up with the growing workload of varying computational characteristics. Due to their high computation rates, these computing systems consume vast amounts of energy with increasing electricity costs. As an approach to balancing computation demand with energy consumption, state-of-the-art dynamic voltage and frequency scaling (DVFS) methodologies are used for improving the energy efficiency of computing systems. However, these studies often do not explore the extent to which their solutions are close to theoretically optimal limits. This work formulates optimal boundaries for energy-time performance with a Linear Programming (LP) approach. The formulation utilizes per-core energy consumptions and execution times obtained during the profiling phase to optimize the voltage and frequency (V/F) level assignments at runtime. For each of the four benchmarks considered in this work, the optimized V/F assignments are used to bound Pareto frontiers, which trade off energy consumption and execution time. In particular, this work studies the impact of scaling the voltage and frequency of the cache subsystem in a multicore system on establishing the energy-time Pareto frontier. An unexpected result of our study is that when the frequencies of caches are not scaled with that of the cores (i.e., fixed at 2.0 GHz), the proposed LP-based technique improves the overall Energy-Delay-Product (EDP) as much as 35% compared to the traditional no-DVFS Pareto frontier. Furthermore, this work compares the performance of three heuristic-based energy-efficient DVFS algorithms to demonstrate the differences between heuristics performances and the LP-based optimal Pareto frontier.

[1]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[2]  Anthony A. Maciejewski,et al.  Energy and Makespan Tradeoffs in Heterogeneous Computing Systems using Efficient Linear Programming Techniques , 2016, IEEE Transactions on Parallel and Distributed Systems.

[3]  Li Shang,et al.  Thermal vs Energy Optimization for DVFS-Enabled Processors in Embedded Systems , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[4]  Massoud Pedram,et al.  Prediction and control of bursty cloud workloads: A fractal framework , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[5]  Sharad Malik,et al.  Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.

[6]  Partha Pratim Pande,et al.  Dual-Level DVFS-Enabled Millimeter-Wave Wireless NoC Architectures , 2014, JETC.

[7]  Luca Benini,et al.  Event-driven power management of portable systems , 1999, Proceedings 12th International Symposium on System Synthesis.

[8]  Alan Burns,et al.  A survey of hard real-time scheduling for multiprocessor systems , 2011, CSUR.

[9]  Miodrag Potkonjak,et al.  Optimizing power using transformations , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Fabien Clermidy,et al.  A run-time distributed cooperative approach to optimize power consumption in MPSoCs , 2010, 23rd IEEE International SOC Conference.

[11]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[12]  Paul Bogdan,et al.  Mathematical Modeling and Control of Multifractal Workloads for Data-Center-on-a-Chip Optimization , 2015, NOCS.

[13]  Xiaorui Wang,et al.  DPPC: Dynamic Power Partitioning and Control for Improved Chip Multiprocessor Performance , 2014, IEEE Transactions on Computers.

[14]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Albert Y. Zomaya,et al.  Some observations on optimal frequency selection in DVFS-based energy consumption minimization , 2011, J. Parallel Distributed Comput..

[16]  Partha Pratim Pande,et al.  A dynamic, compiler guided DVFS mechanism to achieve energy-efficiency in multi-core processors , 2016, Sustain. Comput. Informatics Syst..

[17]  Michael S. Hsiao,et al.  Compiler-directed dynamic voltage/frequency scheduling for energy reduction in microprocessors , 2001, ISLPED '01.

[18]  Hung-Cheng Shih,et al.  An adaptive hybrid dynamic power management algorithm for mobile devices , 2012, Comput. Networks.

[19]  Giorgio C. Buttazzo,et al.  Energy-Aware Scheduling for Real-Time Systems , 2016, ACM Trans. Embed. Comput. Syst..

[20]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[21]  Wann-Yun Shieh,et al.  Energy and transition-aware runtime task scheduling for multicore processors , 2013, J. Parallel Distributed Comput..

[22]  Stefanos Kaxiras,et al.  Introducing DVFS-Management in a Full-System Simulator , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[23]  Ulrich Kremer,et al.  Dynamic Voltage and Frequency Scaling for Scientific Applications , 2001, LCPC.

[24]  Fabien Clermidy,et al.  Implementation Analysis of a Dynamic Energy Management Approach Inspired by Game-Theory , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.

[25]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[27]  Nam Sung Kim,et al.  Cost-effective power delivery to support per-core voltage domains for power-constrained processors , 2012, DAC Design Automation Conference 2012.

[28]  Radu Marculescu,et al.  Dynamic power management for multidomain system-on-chip platforms , 2013, ACM Trans. Design Autom. Electr. Syst..

[29]  Massoud Pedram,et al.  Supervised Learning Based Power Management for Multicore Processors , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Siddharth Garg,et al.  Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[31]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[32]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[33]  Radu Marculescu,et al.  Wireless NoC and Dynamic VFI Codesign: Energy Efficiency Without Performance Penalty , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[34]  Mahmut T. Kandemir,et al.  Influence of compiler optimizations on system power , 2000, Proceedings 37th Design Automation Conference.

[35]  Alberto Leva,et al.  Event-Based Power/Performance-Aware Thermal Management for High-Density Microprocessors , 2018, IEEE Transactions on Control Systems Technology.

[36]  Massoud Pedram,et al.  Resource allocation and consolidation in a multi-core server cluster using a Markov decision process model , 2013, International Symposium on Quality Electronic Design (ISQED).