Divergence Aware Automated Partitioning of OpenCL Workloads

Heterogeneous partitioning is a key step for efficient mapping and scheduling of data parallel applications on multi-core computing platforms involving both CPUs and GPUs. Over the last few years, several automated partitioning methodologies, both static as well as dynamic, have been proposed for this purpose. The present work provides an in-depth analysis of control flow divergence and its impact on the quality of such program partitions. We characterize the amount of divergence in a program as an important performance feature and train suitable Machine Learning (ML) based classifiers which statically decide the partitioning of an OpenCL workload for a heterogeneous platform involving a single CPU and a single GPU. Our approach reports improved partitioning results with respect to timing performance when compared with existing approaches for ML based static partitioning of data parallel workloads.

[1]  Michael F. P. O'Boyle,et al.  Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[2]  Michael F. P. O'Boyle,et al.  OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.

[3]  Mike Murphy,et al.  Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.

[4]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Sudhakar Yalamanchili,et al.  Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Dorota H. Kieronska,et al.  Formal Specification of Parallel SIMD Execution , 1996, Theor. Comput. Sci..

[7]  Fernando Magno Quintão Pereira,et al.  Divergence Analysis and Optimizations , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[8]  Michael F. P. O'Boyle,et al.  A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.

[9]  R. Govindarajan,et al.  Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.

[10]  Fernando Magno Quintão Pereira,et al.  Divergence Analysis with Affine Constraints , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[11]  Keshav Pingali,et al.  Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[12]  Michael F. P. O'Boyle,et al.  Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Thomas Fahringer,et al.  An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.

[14]  Gagan Agrawal,et al.  Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[15]  Helge Rhodin A PTX Code Generator for LLVM , 2011 .

[16]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[17]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[18]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[19]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[20]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[21]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[22]  Sebastian Hack,et al.  Improving Performance of OpenCL on CPUs , 2012, CC.