Workload Partitioning Algorithm Based on Performance Curve of GPU in Heterogeneous Platforms

With the development of GPU's general computing power, hybrid systems composed of multi-core CPU and GPU are becoming more and more popular in data parallel applications. Because the performance of GPU is related to the magnitude of the load received, effective load allocation methods are very important for improving the performance of data parallel applications. The existing static load distribution methods fail to use the characteristics effectively GPU performance changed with the load, causing the load unbalanced. Dynamic load distribution methods easily reduce the performance of the system due to the excessive synchronization and data transmission operation. In this paper, we propose a new workload partitioning algorithm, which takes advantage of the characteristics of GPU performance varying with the workload in off-line analysis stage, and uses the successive decreasing method to determine the optimal load allocation ratio between multi-core CPU and GPU. The effectiveness of the load allocation algorithm is verified on the remote sensing data set based on the median filtering algorithm. Keywords—GPU; hybrid system; data parallel applications; workload partitioning

[1]  Ziming Zhong,et al.  Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models , 2015, IEEE Transactions on Computers.

[2]  Wayne Luk,et al.  Dynamic scheduling Monte-Carlo framework for multi-accelerator heterogeneous clusters , 2010, 2010 International Conference on Field-Programmable Technology.

[3]  Kenli Li,et al.  A parallel computing method using blocked format with optimal partitioning for SpMV on GPU , 2018, J. Comput. Syst. Sci..

[4]  Laxmi N. Bhuyan,et al.  A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.

[5]  Osman S. Unsal,et al.  A Machine Learning Approach for Performance Prediction and Scheduling on Heterogeneous CPUs , 2017, 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[6]  Quan Chen,et al.  CPU + GPU scheduling with asymptotic profiling , 2014, Parallel Comput..

[7]  Licheng Yu,et al.  A CPU-GPGPU Scheduler Based on Data Transmission Bandwidth of Workload , 2012, 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[8]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Jie Shen,et al.  Look before You Leap: Using the Right Hardware Resources to Accelerate Applications , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).

[10]  E. Wes Bethel,et al.  GPU-accelerated denoising of 3D magnetic resonance images , 2014, Journal of Real-Time Image Processing.

[11]  Jie Shen,et al.  Workload Partitioning for Accelerating Applications on Heterogeneous Platforms , 2016, IEEE Transactions on Parallel and Distributed Systems.