Energy-aware Task Scheduling with Deadline Constraint in DVFS-enabled Heterogeneous Clusters

Energy conservation of large data centers for high performance computing workloads, such as deep learning with big data, is of critical significance, where cutting down a few percent of electricity translates into million-dollar savings. This work studies energy conservation on emerging CPU-GPU hybrid clusters through dynamic voltage and frequency scaling (DVFS). We aim at minimizing the total energy consumption of processing a batch of offline tasks or a sequence of real-time tasks under deadline constraints. We derive a fast and accurate analytical model to compute the appropriate voltage/frequency setting for each task, and assign multiple tasks to the cluster with heuristic scheduling algorithms. In particular, our model stresses the nonlinear relationship between task execution time and processor speed for GPU-accelerated applications, for more accurately capturing real-world GPU energy consumption. In performance evaluation driven by real-world power measurement traces, our scheduling algorithm shows comparable energy savings to the theoretical upper bound. With a GPU scaling interval where analytically at most 36% of energy can be saved, we record 33-35% of energy savings. Our results are applicable to energy management on modern heterogeneous clusters.

[1]  David A. Bader,et al.  A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[2]  Susanne Albers,et al.  Speed Scaling on Parallel Processors , 2007, SPAA '07.

[3]  Qiang Wang,et al.  GPGPU Performance Estimation with Core and Memory Frequency Scaling , 2017, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[4]  Wu-chun Feng,et al.  GPU power prediction via ensemble machine learning for DVFS space exploration , 2018, CF.

[5]  Oscar H. Ibarra,et al.  Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors , 1977, JACM.

[6]  Keqin Li,et al.  Energy-Efficient Scheduling Algorithms for Real-Time Parallel Applications on Heterogeneous Distributed Embedded Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[7]  Henry Hoffmann,et al.  Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics , 2015, 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications.

[8]  Hai Liu,et al.  Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems , 2017, e-Energy.

[9]  Rami G. Melhem,et al.  Dynamic and aggressive scheduling techniques for power-aware real-time systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[10]  Xinxin Mei,et al.  Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[11]  Joseph Y.-T. Leung,et al.  On-line scheduling of real-time tasks , 1988, Proceedings. Real-Time Systems Symposium.

[12]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[13]  Leon Atkins,et al.  Algorithms for power savings , 2014 .

[14]  James W. Layland,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[15]  Huimin Huang,et al.  Energy-Aware Task Scheduling on Heterogeneous Computing Systems With Time Constraint , 2020, IEEE Access.

[16]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[18]  Jian Li,et al.  Power-efficient time-sensitive mapping in heterogeneous systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Rong Ge,et al.  Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU , 2013, 2013 42nd International Conference on Parallel Processing.

[20]  Xinxin Mei,et al.  Benchmarking the Memory Hierarchy of Modern GPUs , 2014, NPC.

[21]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[22]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[23]  Derek Chiou,et al.  GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[24]  Ben H. H. Juurlink,et al.  Predictable GPUs Frequency Scaling for Energy and Performance , 2019, ICPP.

[25]  Wu-chun Feng,et al.  Online Power Estimation of Graphics Processing Units , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[26]  Daniel F. Garcia,et al.  Utilization Bounds for EDF Scheduling on Real-Time Multiprocessor Systems , 2004, Real-Time Systems.

[27]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[28]  Shuaiwen Song,et al.  A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[29]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[30]  Dean M. Tullsen,et al.  The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[31]  Hiroshi Sasaki,et al.  Power and Performance Analysis of GPU-Accelerated Systems , 2012, HotPower.

[32]  Hai Liu,et al.  Energy efficient real-time task scheduling on CPU-GPU hybrid clusters , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[33]  Samee Ullah Khan,et al.  An Energy-Efficient Task Scheduling Algorithm in DVFS-enabled Cloud Environment , 2015, Journal of Grid Computing.

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  Yanhui Huang,et al.  GPU Energy Consumption Optimization With a Global-Based Neural Network Method , 2019, IEEE Access.

[36]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[37]  Ali Karami,et al.  A statistical performance analyzer framework for OpenCL kernels on Nvidia GPUs , 2014, The Journal of Supercomputing.

[38]  Qiang Wang,et al.  The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical Study , 2019, e-Energy.

[39]  Nuno Roma,et al.  GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[40]  Abdullah Gharaibeh,et al.  The energy case for graph processing on hybrid CPU and GPU systems , 2013, IA3 '13.

[41]  FengWu-chun,et al.  The Green500 List , 2007 .

[42]  Hamid Noori,et al.  Fairness-Aware Energy Efficient Scheduling on Heterogeneous Multi-Core Processors , 2021, IEEE Transactions on Computers.

[43]  Qiang Wang,et al.  HKBU Institutional Repository , 2018 .

[44]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[45]  Henry Hoffmann,et al.  Energy-efficient Application Resource Scheduling using Machine Learning Classifiers , 2018, ICPP.

[46]  Qi Yang,et al.  Energy-aware partitioning for multiprocessor real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[47]  Luciano Floridi,et al.  GPT-3: Its Nature, Scope, Limits, and Consequences , 2020, Minds and Machines.

[48]  Pedro Tomás,et al.  DVFS-aware application classification to improve GPGPUs energy efficiency , 2018, Parallel Comput..