A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL

Heterogeneous multi-core platforms are increasingly prevalent due to their perceived superior performance over homogeneous systems. The best performance, however, can only be achieved if tasks are accurately mapped to the right processors. OpenCL programs can be partitioned to take advantage of all the available processors in a system. However, finding the best partitioning for any heterogeneous system is difficult and depends on the hardware and software implementation. We propose a portable partitioning scheme for OpenCL programs on heterogeneous CPU-GPU systems. We develop a purely static approach based on predictive modelling and program features. When evaluated over a suite of 47 benchmarks, our model achieves a speedup of 1.57 over a state-of-the-art dynamic run-time approach, a speedup of 3.02 over a purely multi-core approach and 1.55 over the performance achieved by using just the GPU.

[1]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[2]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[3]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[4]  Michael Wolfe,et al.  Implementing the PGI Accelerator model , 2010, GPGPU-3.

[5]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[6]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[7]  Ashish Sharma,et al.  Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[8]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[9]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[10]  Collin McCurdy,et al.  Scalable Heterogeneous Computing (SHOC) Benchmark Suite, Version 0.8 , 2009 .

[11]  Grigori Fursin,et al.  Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[12]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[13]  Viktor K. Prasanna,et al.  Heterogeneous computing: challenges and opportunities , 1993, Computer.

[14]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[15]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Dhabaleswar K. Panda,et al.  Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems , 2000, Proceedings 2000. International Workshop on Parallel Processing.

[18]  Jeff S. Brantley,et al.  Contention-Aware Scheduling of Parallel Code for Heterogeneous Systems , 2010 .

[19]  Michael Alexander,et al.  Euro-Par 2009 – Parallel Processing Workshops: HPPC, HeteroPar, PROPER, ROIA, UNICORE, VHPC, Delft, The Netherlands, August 25-28, 2009, Revised Selected Papers , 2010, Euro-Par Workshops.

[20]  Henk Sips,et al.  Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, Delft, The Netherlands, August 25-28, 2009. Proceedings , 2009, Euro-Par.

[21]  Ladislau Bölöni,et al.  A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[22]  Gagan Agrawal,et al.  Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[23]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[24]  Howard Jay Siegel,et al.  A dynamic matching and scheduling algorithm for heterogeneous computing systems , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[25]  Cédric Augonnet,et al.  Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures , 2009, Euro-Par Workshops.

[26]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[27]  Oscar H. Ibarra,et al.  Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors , 1977, JACM.

[28]  Richard W. Vuduc,et al.  Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.