Directive-based Programming Models for Scientific Applications - A Comparison

Accelerators have been considered a viable way by many scientific and technical programmers to program and accelerate huge scientific applications. Accelerators such as GPUs have immense potential in terms of high compute capacity but programming these devices is a challenge. CUDA, OpenCL and other vendor-specific models are definitely a way to go, but these are low-level models that demand excellent programming skills; moreover, they are time consuming to write and debug. In order to simplify GPU programming several directivebased programming models have already been proposed. In this paper, we evaluate and compare several directive-based models such as PGI, HMPP and OpenACC models involving four scientific applications. From our experimental analysis, we conclude that efficient implementations of high-level directivebased models plus user guided optimizations can actually reach the performance obtained via a hand written CUDA code. For example a computer tomography-based algorithm ported to GPUs using a directive-based approach showed that the performance achieved is about 90% to that of CUDA version of the code.

[1]  Christoph F. Eick,et al.  Finding regional co-location patterns for sets of continuous variables in spatial datasets , 2008, GIS '08.

[2]  Alex Fit-Florea,et al.  Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .

[3]  Tarek S. Abdelrahman,et al.  hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.

[4]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[5]  Bradford Nichols,et al.  Pthreads programming , 1996 .

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[8]  Rudolf Eigenmann,et al.  OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  M. Glas,et al.  Principles of Computerized Tomographic Imaging , 2000 .

[10]  D. K. Arvind,et al.  Languages and Compilers for Parallel Computing , 2014, Lecture Notes in Computer Science.

[11]  Timur E. Gureyev,et al.  High-performance tomographic reconstruction using graphics processing units , 2009 .

[12]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[13]  Christoph F. Eick,et al.  Design and Evaluation of a Parallel Execution Framework for the CLEVER Clustering Algorithm , 2011, PARCO.

[14]  Scott B. Baden,et al.  Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.

[15]  Peng-Sheng Chen,et al.  Compiler support for general-purpose computation on GPUs , 2009, The Journal of Supercomputing.

[16]  Hiroki Honda,et al.  OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler , 2010, IWOMP.

[17]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[18]  L. Feldkamp,et al.  Practical cone-beam algorithm , 1984 .

[19]  Michael Wolfe,et al.  Implementing the PGI Accelerator model , 2010, GPGPU-3.

[20]  Dave Shreiner OpenGL programming guide , 2013 .