Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.

[1]  David Black-Schaffer,et al.  Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics , 2016, IEEE Transactions on Computers.

[2]  Hyesoon Kim,et al.  OpenCL Performance Evaluation on Modern Multi Core CPUs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[3]  Raffaele Tripiccione,et al.  Evaluation of DVFS techniques on modern HPC processors and accelerators for energy‐aware applications , 2017, Concurr. Comput. Pract. Exp..

[4]  Xin Yao,et al.  Many-Objective Evolutionary Algorithms , 2015, ACM Comput. Surv..

[5]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[6]  Nuno Roma,et al.  GPU Static Modeling Using PTX and Deep Structured Learning , 2019, IEEE Access.

[7]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[8]  Qiang Wang,et al.  GPGPU Performance Estimation with Core and Memory Frequency Scaling , 2017, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[9]  Jie Shen,et al.  An application-centric evaluation of OpenCL on multi-core CPUs , 2013, Parallel Comput..

[10]  Katsumi Inoue,et al.  Relational Reinforcement Learning for Planning with Exogenous Effects , 2017 .