COMPARING PROGRAMMER PRODUCTIVITY IN OPENACC AND CUDA: AN EMPIRICAL INVESTIGATION

OpenACC has been touted as a "high productivity" API designed to make GPGPU programming accessible to scientific programmers, but to date, no studies have attempted to verify this quantitatively. In this paper, we conduct an empirical investigation of program productivity comparisons between OpenACC and CUDA in the programming time, the execution time and the analysis of independence of OpenACC model in high performance problems. Our results show that, for our programs and our subject pool, this claim is true. We created two assignments called Machine Problem 3(MP3) and Machine Problem 4(MP4) in the classroom environment and instrumented the WebCode website developed by ourselves to record details of students’ coding process. Three hypotheses were supported by the statistical data: for the same parallelizable problem, (1) the OpenACC programming time is at least 37% shorter than CUDA; (2) the CUDA running speed is 9x faster than OpenACC; (3) the OpenACC development work is not significantly affected by previous CUDA experience

[1]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[2]  Marvin V. Zelkowitz,et al.  Measuring Productivity on High Performance Computers , 2005, IEEE METRICS.

[3]  Satoshi Matsuoka,et al.  CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[4]  Marisa López-Vallejo,et al.  A Performance Study of CUDA UVM versus Manual Optimizations in a Real-World Setup: Application to a Monte Carlo Wave-Particle Event-Based Interaction Model , 2016, IEEE Transactions on Parallel and Distributed Systems.

[5]  Iris Vessey,et al.  Expertise in Debugging Computer Programs: A Process Analysis , 1984, Int. J. Man Mach. Stud..

[6]  Jeffrey C. Carver,et al.  Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Matthew C. Jadud A First Look at Novice Compilation Behaviour Using BlueJ , 2005, Comput. Sci. Educ..

[8]  Michael A. Langston,et al.  EntropyExplorer: an R package for computing and comparing differential Shannon entropy, differential coefficient of variation and differential expression , 2015, BMC Research Notes.

[9]  Bettina Schnor,et al.  A comparison of CUDA and OpenACC: Accelerating the Tsunami Simulation EasyWave , 2014, ARCS Workshops.

[10]  Ronald Duarte,et al.  On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Qinru Qiu,et al.  Effective Utilization of CUDA Hyper-Q for Improved Power and Performance Efficiency , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[12]  Satoshi Matsuoka,et al.  An OpenACC Extension for Data Layout Transformation , 2014, 2014 First Workshop on Accelerator Programming using Directives.

[13]  Fumihiko Ino,et al.  An OpenACC Optimizer for Accelerating Histogram Computation on a GPU , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[14]  Anthony Cox,et al.  Programming Style: Influences, Factors, and Elements , 2009, 2009 Second International Conferences on Advances in Computer-Human Interactions.

[15]  Jeffrey C. Carver,et al.  A Pilot Study to Evaluate Development Effort for High Performance Computing , 2004 .

[16]  Ching-Lung Su,et al.  Overview and comparison of OpenCL and CUDA technology for GPGPU , 2012, 2012 IEEE Asia Pacific Conference on Circuits and Systems.

[17]  Satoshi Matsuoka,et al.  Understanding Performance Portability of OpenACC for Supercomputers , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[18]  Stephen A. Jarvis,et al.  Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[19]  D. C. Howell Statistical Methods for Psychology , 1987 .

[20]  Jianbin Fang,et al.  A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.