Dynamic Block Size Adjustment and Workload Balancing Strategy Based on CPU-GPU Heterogeneous Platform

In recent years, with the development of processor architecture, heterogeneous processors including CPUs and GPUs have become mainstream processors. In order to take full advantage of the computing power of heterogeneous cores, it is necessary to maintain workload balance between heterogeneous cores while the system is executing applications. And it is also important to optimize the efficiency of the application execution on the GPU by improving the thread organization. In order to improve performance, we propose a block size adjustment strategy that adapts to the current application and GPU environment. Based on this, we propose a strategy for balancing CPU-GPU workload that preferentially protects the core. These strategies optimize the execution time of applications on CPU-GPU heterogeneous platforms. Finally, we tested the actual effect of the strategy by running four benchmarks in a CPU-GPU heterogeneous environment. The experimental results show that the performance can be significantly improved by block size adjustment and workload balancing strategy. These two strategies reduce application execution time by up to 26.21% and 58.01%, respectively, compared to GPU-only execution time.

[1]  Laxmi N. Bhuyan,et al.  A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.

[2]  Kenli Li,et al.  Efficient CPU‐GPU cooperative computing for solving the subset‐sum problem , 2016, Concurr. Comput. Pract. Exp..

[3]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[4]  Pao-Ann Hsiung,et al.  Feedback Control Optimization for Performance and Energy Efficiency on CPU-GPU Heterogeneous Systems , 2016, ICA3PP.

[5]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[6]  Rafael Asenjo,et al.  Heterogeneous parallel_for Template for CPU–GPU Chips , 2018, International Journal of Parallel Programming.

[7]  Bilel Derbel,et al.  Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments , 2016, Future Gener. Comput. Syst..

[8]  Santonu Sarkar,et al.  Predicting Execution Time of CUDA Kernel Using Static Analysis , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[9]  Hao Wang,et al.  Taming irregular applications via advanced dynamic parallelism on GPUs , 2018, CF.

[10]  Yi Zhou,et al.  Parallel ant colony optimization on multi-core SIMD CPUs , 2018, Future Gener. Comput. Syst..

[11]  Keshav Pingali,et al.  Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[12]  Jie Shen,et al.  Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms , 2015, 2015 44th International Conference on Parallel Processing.

[13]  Rajkishore Barik,et al.  A black-box approach to energy-aware scheduling on integrated CPU-GPU systems , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[14]  Erich Schikuta,et al.  Classification Framework for the Parallel Hash Join with a Performance Analysis on the GPU , 2017, 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC).

[15]  Jean-Luc Gaudiot,et al.  A Runtime Workload Distribution with Resource Allocation for CPU-GPU Heterogeneous Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).