E-OSched: a load balancing scheduler for heterogeneous multicores

The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines. The conventional application scheduling mechanisms allocate most of the applications to GPUs while leaving CPU device underutilized. This underutilization of slower devices (such as CPU) often originates the sub-optimal performance of data-parallel applications in terms of load balance, execution time, and throughput. Moreover, multiple scheduled applications on a heterogeneous system further aggravate the problem of performance inefficiency. This paper is an attempt to evade the aforementioned deficiencies via initiating a novel scheduling strategy named OSched. An enhancement to the OSched named E-OSched is also part of this study. The OSched performs the resource-aware assignment of jobs to both CPUs and GPUs while ensuring a balanced load. The load balancing is achieved via contemplation on computational requirements of jobs and computing potential of a device. The load-balanced execution is beneficiary in terms of lower execution time, higher throughput, and improved utilization. The E-OSched reduces the magnitude of the main memory contention during concurrent job execution phase. The mathematical model of the proposed algorithms is evaluated by comparison of simulation results with different state-of-the-art scheduling heuristics. The results revealed that the proposed E-OSched has performed significantly well than the state-of-the-art scheduling heuristics by obtaining up to 8.09% improved execution time and up to 7.07% better throughput.

[1]  Marco Platzner,et al.  Performance-centric scheduling with task migration for a heterogeneous compute node in the data center , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Romain Dolbeau,et al.  Theoretical peak FLOPS per instruction set: a tutorial , 2017, The Journal of Supercomputing.

[3]  Thomas Fahringer,et al.  An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.

[4]  Keshav Pingali,et al.  Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[5]  Radu Prodan,et al.  Scheduling JavaSymphony Applications on Many-Core Parallel Computers , 2011, Euro-Par.

[6]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[8]  Scott A. Mahlke,et al.  Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[9]  Michael F. P. O'Boyle,et al.  Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[10]  Carlos Eduardo Pereira,et al.  An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[11]  Scott A. Mahlke,et al.  Orchestrating Multiple Data-Parallel Kernels on Multiple Devices , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[12]  Laxmi N. Bhuyan,et al.  A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.

[13]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[14]  Diana Marculescu,et al.  Task Scheduling for Heterogeneous Multicore Systems , 2017, ArXiv.

[15]  R. Govindarajan,et al.  Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.

[16]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[17]  Kevin Skadron,et al.  Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data , 2011 .

[18]  Kevin Skadron,et al.  Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.

[19]  Pabitra Mitra,et al.  Divergence Aware Automated Partitioning of OpenCL Workloads , 2016, ISEC.

[20]  Michael F. P. O'Boyle,et al.  A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.

[21]  Quan Chen,et al.  CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems , 2013, PMAM '13.

[22]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[23]  Gagan Agrawal,et al.  A dynamic scheduling framework for emerging heterogeneous systems , 2011, 2011 18th International Conference on High Performance Computing.

[24]  Jeff S. Brantley,et al.  Contention-Aware Scheduling of Parallel Code for Heterogeneous Systems , 2010 .

[25]  David R. Kaeli,et al.  Enabling task-level scheduling on heterogeneous platforms , 2012, GPGPU-5.

[26]  Surendra Byna,et al.  Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory , 2010, SPAA '10.

[27]  Michael F. P. O'Boyle,et al.  Merge or Separate?: Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms , 2017, GPGPU@PPoPP.

[28]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[29]  Volker Lindenstruth,et al.  An Energy-Efficient Multi-GPU Supercomputer , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).

[30]  Grigori Fursin,et al.  Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[31]  Sean Rul,et al.  An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.

[32]  Xiaohua Shi,et al.  An OpenCL Micro-Benchmark Suite for GPUs and CPUs , 2012, PDCAT.

[33]  Jong-Myon Kim,et al.  An efficient scheduling scheme using estimated execution time for heterogeneous computing systems , 2013, The Journal of Supercomputing.

[34]  Scott A. Mahlke,et al.  SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration , 2015, ACM Trans. Comput. Syst..

[35]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[36]  Ozcan Ozturk,et al.  Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms , 2012, 2012 41st International Conference on Parallel Processing Workshops.