Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model

Heterogeneous computing come with tremendous potential and is a leading candidate for scientific applications that are becoming more and more complex. Accelerators such as GPUs whose computing momentum is growing faster than ever offer application performance when compute intensive portions of an application are offloaded to them. It is quite evident that future computing architectures are moving towards hybrid systems consisting of multi-GPUs and multi-core CPUs. A variety of high-level languages and software tools can simplify programming these systems. Directive-based programming models are being embraced since they not only ease programming complex systems but also abstract low-level details from the programmer. We already know that OpenMP has been making programming CPUs easy and portable. Similarly, a directive-based programming model for accelerators is OpenACC that is gaining popularity since the directives play an important role in developing portable software for GPUs. A combination of OpenMP and OpenACC, a hybrid model, is a plausible solution to port scientific applications to heterogeneous architectures especially when there is more than one GPU on a single node to port an application to. However OpenACC meant for accelerators is yet to provide support for multi-GPUs. But using OpenMP we could conveniently exploit features such as for and section to distribute compute intensive kernels to more than one GPU. We demonstrate the effectiveness of this hybrid approach with some case studies in this paper.

[1]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[2]  Alan Gray,et al.  Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers , 2012 .

[3]  Francisco de Sande,et al.  Optimization strategies in different CUDA architectures using llCoMP , 2012, Microprocess. Microsystems.

[4]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[5]  Kevin Skadron,et al.  Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.

[6]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[7]  Christian Terboven,et al.  OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.

[8]  Wenguang Chen,et al.  OpenUH: an optimizing, portable OpenMP compiler , 2007, Concurr. Comput. Pract. Exp..

[9]  Rudolf Eigenmann,et al.  OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Barbara M. Chapman,et al.  Experiences with High-Level Programming Directives for Porting Applications to GPUs , 2011, Facing the Multicore-Challenge.

[11]  D. K. Arvind,et al.  Languages and Compilers for Parallel Computing , 2014, Lecture Notes in Computer Science.

[12]  Tarek S. Abdelrahman,et al.  hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.

[13]  Seyong Lee,et al.  Early evaluation of directive-based GPU programming models for productive exascale computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Francisco de Sande,et al.  accULL: An OpenACC Implementation with CUDA and OpenCL Support , 2012, Euro-Par.