Swan: A tool for porting CUDA programs to OpenCL

Abstract The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, “Swan” for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance. Program summary Program title: Swan Catalogue identifier: AEIH_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Public License version 2 No. of lines in distributed program, including test data, etc.: 17 736 No. of bytes in distributed program, including test data, etc.: 131 177 Distribution format: tar.gz Programming language: C Computer: PC Operating system: Linux RAM: 256 Mbytes Classification: 6.5 External routines: NVIDIA CUDA, OpenCL Nature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An alternative programming model promoted as an industry standard, OpenCL, provides similar capabilities to CUDA and is also supported on non-NVIDIA hardware (including multicore ×86 CPUs, AMD GPUs and IBM Cell processors). The adaptation of a program from CUDA to OpenCL is relatively straightforward but laborious. The Swan tool facilitates this conversion. Solution method: Swan performs a translation of CUDA kernel source code into an OpenCL equivalent. It also generates the C source code for entry point functions, simplifying kernel invocation from the host program. A concise host-side API abstracts the CUDA and OpenCL APIs. A program adapted to use Swan has no dependency on the CUDA compiler for the host-side program. The converted program may be built for either CUDA or OpenCL, with the selection made at compile time. Restrictions: No support for CUDA C++ features Running time: Nominal

[1]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[2]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[3]  Federico Silla,et al.  rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.

[4]  M J Harvey,et al.  ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. , 2009, Journal of chemical theory and computation.

[5]  David P. Anderson,et al.  High-Throughput All-Atom Molecular Dynamics Simulations Using Distributed Computing , 2010, J. Chem. Inf. Model..

[6]  Marsha Berger,et al.  International Journal of Computational Fluid Dynamics: Preface , 2005 .

[7]  Tim McGraw,et al.  Stochastic DT-MRI Connectivity Mapping on the GPU , 2007, IEEE Transactions on Visualization and Computer Graphics.

[8]  Hiroaki Kobayashi,et al.  CheCUDA: A Checkpoint/Restart Tool for CUDA Applications , 2009, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[9]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[10]  Manfred Krafczyk,et al.  TeraFLOP computing on a desktop PC with GPUs for 3D CFD , 2008 .

[11]  Nassir Navab,et al.  Automatic CT-ultrasound registration for diagnostic imaging and image-guided intervention , 2008, Medical Image Anal..

[12]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[13]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[14]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[15]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[16]  Sven Simon,et al.  Accelerating Simulations of Light Scattering Based on Finite-Difference Time-Domain Method with General Purpose GPUs , 2008, 2008 11th IEEE International Conference on Computational Science and Engineering.

[17]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[18]  L. Kulas,et al.  Implementation of matrix-type FDTD algorithm on a graphics accelerator , 2008, MIKON 2008 - 17th International Conference on Microwaves, Radar and Wireless Communications.

[19]  John D. Owens,et al.  Toward Techniques for Auto-tuning GPU Algorithms , 2010, PARA.

[20]  Matthias Zwicker,et al.  Ieee Transactions on Visualization and Computer Graphics Ewa Splatting , 2002 .

[21]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[22]  Frantisek Zboril,et al.  Towards Accelerated Computation of Atmospheric Equations Using CUDA , 2009, 2009 11th International Conference on Computer Modelling and Simulation.

[23]  Gregory Bryan Computing in Science and Engineering , 1999, IEEE Software.