论文信息 - Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems

This paper presents the first systematic study on co-scheduling independent jobs on integrated CPU-GPU systems with power caps considered. It reveals the performance degradations caused by the co-run contentions at the levels of both memory and power. It then examines the problem of using job co-scheduling to alleviate the degradations in this less understood scenario. It offers several algorithms and a lightweight co-run performance and power predictive model for computing the performance bounds of the optimal co-schedules and finding appropriate schedules. Results show that the method can efficiently find co-schedules that significantly improve the system throughput (9-46% on average over the default schedules).

[1] Jie Chen,et al. The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions , 2011, IEEE Transactions on Parallel and Distributed Systems.

[2] Dick James,et al. Intel Ivy Bridge unveiled — The first commercial tri-gate, high-k, metal-gate CPU , 2012, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference.

[3] Maurice Steinman,et al. AMD Fusion APU: Llano , 2012, IEEE Micro.

[4] Xiaohan Ma,et al. Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .

[5] Bronis R. de Supinski,et al. Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6] Wu-chun Feng,et al. Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL , 2015, 2015 IEEE International Conference on Cluster Computing.

[7] Hiroshi Nakamura,et al. Power capping of CPU-GPU heterogeneous systems through coordinating DVFS and task mapping , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[8] Antonia Zhai,et al. Managing shared last-level cache in a heterogeneous multicore processor , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[9] Henry Hoffmann,et al. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques , 2016, ASPLOS.

[10] Xipeng Shen,et al. Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors , 2010, HiPEAC.

[11] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[12] Gagan Agrawal,et al. Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[13] Kevin Skadron,et al. Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data , 2011 .

[14] Wenguang Chen,et al. To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures , 2015, 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[15] Sandhya Dwarkadas,et al. Compatible phase co-scheduling on a CMP of multi-threaded processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[16] Xipeng Shen,et al. A step towards transparent integration of input-consciousness into dynamic program optimizations , 2011, OOPSLA '11.

[17] Bingsheng He,et al. In-Cache Query Co-Processing on Coupled CPU-GPU Architectures , 2014, Proc. VLDB Endow..

[18] Michael F. P. O'Boyle,et al. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[19] Xipeng Shen,et al. A study on optimally co-scheduling jobs of different lengths on chip multiprocessors , 2009, CF '09.

[20] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[21] Lingjia Tang,et al. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[22] Kavitha Ranganathan,et al. Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm , 2005, JSSPP.

[23] Bingsheng He,et al. Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..

[24] Hyesoon Kim,et al. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[25] Jie Chen,et al. Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26] Efraim Rotem,et al. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[27] Li Shen,et al. Understanding Co-run Degradations on Integrated Heterogeneous Processors , 2014, LCPC.

[28] Quan Chen,et al. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers , 2016, ASPLOS.

[29] Gary Brown,et al. Denver: Nvidia's First 64-bit ARM Processor , 2015, IEEE Micro.

[30] Zhiling Lan,et al. Job Coscheduling on Coupled High-End Computing Systems , 2011, 2011 40th International Conference on Parallel Processing Workshops.

[31] Phil Andrews,et al. Co-scheduling with User-Settable Reservations , 2005, JSSPP.

[32] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[33] Tarek El-Ghazawi,et al. Energy Efficient Job Co-scheduling for High-Performance Parallel Computing Clusters , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).