Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer's productivity.

[1]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[2]  Alejandro Duran,et al.  A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures , 2009, IWOMP.

[3]  Michael Gschwind,et al.  Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..

[4]  Alejandro Duran,et al.  A Proposal for Task Parallelism in OpenMP , 2007, IWOMP.

[5]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[6]  Milind Girkar,et al.  EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system , 2007, PLDI '07.

[7]  William J. Dally,et al.  Compilation for explicitly managed memory hierarchies , 2007, PPOPP.

[8]  Kathryn M. O'Brien,et al.  Supporting OpenMP on cell , 2008 .

[9]  Barbara Chapman A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007, Beijing, China, June 3-7, 2007, Proceedings , 2008, IWOMP.

[10]  Bronis R. de Supinski,et al.  Evolving OpenMP in an Age of Extreme Parallelism, 5th International Workshop on OpenMP, IWOMP 2009, Dresden, Germany, June 3-5, 2009, Proceedings , 2009, IWOMP.

[11]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[12]  Jesús Labarta,et al.  CellSs: Making it easier to program the Cell Broadband Engine processor , 2007, IBM J. Res. Dev..

[13]  Andrew Richards,et al.  Offload - Automating Code Migration to Heterogeneous Multicore Systems , 2010, HiPEAC.