A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs)

Modern computing paradigms, such as cloud computing, are increasingly adopting GPUs to boost their computing capabilities primarily due to the heterogeneous nature of AI/ML/deep learning workloads. However, the energy consumption of GPUs is a critical problem. Dynamic Voltage Frequency Scaling (DVFS) is a widely used technique to reduce the dynamic power of GPUs. Yet, configuring the optimal clock frequency for essential performance requirements is a non-trivial task due to the complex nonlinear relationship between the application’s runtime performance characteristics, energy, and execution time. It becomes more challenging when different applications behave distinctively with similar clock settings. Simple analytical solutions and standard GPU frequency scaling heuristics fail to capture these intricacies and scale the frequencies appropriately. In this regard, we propose a data-driven frequency scaling technique by predicting the power and execution time of a given application over different clock settings. We collect the data from application profiling and train the models to predict the outcome accurately. The proposed solution is generic and can be easily extended to different kinds of workloads and GPU architectures. Furthermore, using this frequency scaling by prediction models, we present a deadline-aware application scheduling algorithm to reduce energy consumption while simultaneously meeting their deadlines. We conduct real extensive experiments on NVIDIA GPUs using several benchmark applications. The experiment results have shown that our prediction models have high accuracy with the average RMSE values of 0.38 and 0.05 for energy and time prediction, respectively. Also, the scheduling algorithm consumes 15.07% less energy as compared to the baseline policies.

[1]  Xinxin Mei,et al.  A measurement study of GPU DVFS on energy conservation , 2013, HotPower '13.

[2]  Nuno Roma,et al.  Modeling and Decoupling the GPU Power Consumption for Cross-Domain DVFS , 2019, IEEE Transactions on Parallel and Distributed Systems.

[3]  Ben H. H. Juurlink,et al.  Predictable GPUs Frequency Scaling for Energy and Performance , 2019, ICPP.

[4]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[5]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[6]  Yong Meng Teo A model-driven approach for time-energy performance of parallel applications , 2015 .

[7]  Wei Chen,et al.  GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures , 2012, 2012 41st International Conference on Parallel Processing.

[8]  Nicola Capodieci,et al.  Deadline-Based Scheduling for GPU with Preemption Support , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[9]  Rong Ge,et al.  Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU , 2013, 2013 42nd International Conference on Parallel Processing.

[10]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Rajkumar Buyya,et al.  GPU PaaS Computation Model in Aneka Cloud Computing Environment , 2019, Smart Data.

[12]  Ján Veselý,et al.  Interference from GPU System Service Requests , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[13]  Pedro Tomás,et al.  DVFS-aware application classification to improve GPGPUs energy efficiency , 2018, Parallel Comput..

[14]  Hai Liu,et al.  Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems , 2017, e-Energy.

[15]  Gerard F. Jones,et al.  A review of data center cooling technology, operating conditions and the corresponding low-grade waste heat recovery opportunities , 2014 .

[16]  Panos M. Pardalos,et al.  Constrained Global Optimization: Algorithms and Applications , 1987, Lecture Notes in Computer Science.

[17]  Hu Chen,et al.  SwiftGPU: Fostering energy efficiency in a near-threshold GPU through a tactical performance boost , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Qiang Wang,et al.  The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical Study , 2019, e-Energy.

[19]  Randy H. Katz,et al.  Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud , 2011, HotCloud.

[20]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[21]  Qiang Wang,et al.  GPGPU Performance Estimation with Core and Memory Frequency Scaling , 2017, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[22]  Derek Chiou,et al.  GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[23]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[24]  Jens H. Krüger,et al.  GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.

[25]  Wencong Xiao,et al.  Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications , 2018 .

[26]  Sabela Ramos,et al.  General‐purpose computation on GPUs for high performance cloud computing , 2013, Concurr. Comput. Pract. Exp..

[27]  Hiroshi Nakamura,et al.  Power capping of CPU-GPU heterogeneous systems through coordinating DVFS and task mapping , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[28]  Marco Platzner,et al.  A Highly Accurate Energy Model for Task Execution on Heterogeneous Compute Nodes , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).