Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems

This paper presents the first systematic study on co-scheduling independent jobs on integrated CPU-GPU systems with power caps considered. It reveals the performance degradations caused by the co-run contentions at the levels of both memory and power. It then examines the problem of using job co-scheduling to alleviate the degradations in this less understood scenario. It offers several algorithms and a lightweight co-run performance and power predictive model for computing the performance bounds of the optimal co-schedules and finding appropriate schedules. Results show that the method can efficiently find co-schedules that significantly improve the system throughput (9-46% on average over the default schedules).

[1]  Jie Chen,et al.  The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions , 2011, IEEE Transactions on Parallel and Distributed Systems.

[2]  Dick James,et al.  Intel Ivy Bridge unveiled — The first commercial tri-gate, high-k, metal-gate CPU , 2012, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference.

[3]  Maurice Steinman,et al.  AMD Fusion APU: Llano , 2012, IEEE Micro.

[4]  Xiaohan Ma,et al.  Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .

[5]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Wu-chun Feng,et al.  Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL , 2015, 2015 IEEE International Conference on Cluster Computing.

[7]  Hiroshi Nakamura,et al.  Power capping of CPU-GPU heterogeneous systems through coordinating DVFS and task mapping , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[8]  Antonia Zhai,et al.  Managing shared last-level cache in a heterogeneous multicore processor , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[9]  Henry Hoffmann,et al.  Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques , 2016, ASPLOS.

[10]  Xipeng Shen,et al.  Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors , 2010, HiPEAC.

[11]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[12]  Gagan Agrawal,et al.  Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Kevin Skadron,et al.  Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data , 2011 .

[14]  Wenguang Chen,et al.  To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures , 2015, 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[15]  Sandhya Dwarkadas,et al.  Compatible phase co-scheduling on a CMP of multi-threaded processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[16]  Xipeng Shen,et al.  A step towards transparent integration of input-consciousness into dynamic program optimizations , 2011, OOPSLA '11.

[17]  Bingsheng He,et al.  In-Cache Query Co-Processing on Coupled CPU-GPU Architectures , 2014, Proc. VLDB Endow..

[18]  Michael F. P. O'Boyle,et al.  Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[19]  Xipeng Shen,et al.  A study on optimally co-scheduling jobs of different lengths on chip multiprocessors , 2009, CF '09.

[20]  Keshav Pingali,et al.  Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[21]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  Kavitha Ranganathan,et al.  Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm , 2005, JSSPP.

[23]  Bingsheng He,et al.  Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..

[24]  Hyesoon Kim,et al.  TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[25]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[27]  Li Shen,et al.  Understanding Co-run Degradations on Integrated Heterogeneous Processors , 2014, LCPC.

[28]  Quan Chen,et al.  Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers , 2016, ASPLOS.

[29]  Gary Brown,et al.  Denver: Nvidia's First 64-bit ARM Processor , 2015, IEEE Micro.

[30]  Zhiling Lan,et al.  Job Coscheduling on Coupled High-End Computing Systems , 2011, 2011 40th International Conference on Parallel Processing Workshops.

[31]  Phil Andrews,et al.  Co-scheduling with User-Settable Reservations , 2005, JSSPP.

[32]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[33]  Tarek El-Ghazawi,et al.  Energy Efficient Job Co-scheduling for High-Performance Parallel Computing Clusters , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).