Improving Utility of GPU in Accelerating Industrial Applications With User-Centered Automatic Code Translation

Small to medium enterprises (SMEs), particularly those whose business is focused on developing innovative produces, are limited by a major bottleneck in the speed of computation in many applications. The recent developments in GPUs have been the marked increase in their versatility in many computational areas. But due to the lack of specialist GPUprogramming skills, the explosion of GPU power has not been fully utilized in general SME applications by inexperienced users. Also, the existing automatic CPU-to-GPU code translators are mainly designed for research purposes with poor user interface design and are hard to use. Little attentions have been paid to the applicability, usability, and learnability of these tools for normal users. In this paper, we present an online automated CPU-to-GPU source translation system (GPSME) for inexperienced users to utilize the GPU capability in accelerating general SME applications. This system designs and implements a directive programming model with a new kernel generation scheme and memory management hierarchy to optimize its performance. A web service interface is designed for inexperienced users to easily and flexibly invoke the automatic resource translator. Our experiments with nonexpert GPU users in four SMEs reflect that a GPSME system can efficiently accelerate real-world applications with at least 4× and have a better applicability, usability, and learnability than the existing automatic CPU-to-GPU source translators.

[1]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[2]  Uday Bondhugula,et al.  A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.

[3]  Xia Zhao,et al.  Evaluation of autoparallelization toolkits for commodity graphics hardware , 2013 .

[4]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[5]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[6]  Adrian Sandu,et al.  Automatic Generation of Multicore Chemical Kernels , 2011, IEEE Transactions on Parallel and Distributed Systems.

[7]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[8]  Willem-Paul Brinkman,et al.  Component-Specific Usability Testing , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[9]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[10]  Scott B. Baden,et al.  Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.

[11]  Hideya Iwasaki,et al.  A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming , 2009, APLAS.

[12]  David Williams,et al.  Accelerating colonic polyp detection using commodity graphics hardware , 2013, 2013 International Conference on Computer Medical Applications (ICCMA).

[13]  Jack J. Dongarra,et al.  Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.

[14]  Benoît Meister,et al.  A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.

[15]  John D. Owens,et al.  General Purpose Computation on Graphics Hardware , 2005, IEEE Visualization.

[16]  Henk Corporaal,et al.  Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons , 2012, GPGPU-5.

[17]  Michael C. Dorneich,et al.  A system design framework-driven implementation of a learning collaboratory , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[18]  Gordon Clapworthy,et al.  Parallel centerline extraction on the GPU , 2014, Comput. Graph..

[19]  Frank Mueller,et al.  Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters , 2013, IEEE Transactions on Parallel and Distributed Systems.

[20]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[21]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[22]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[23]  Mehmet A. Orgun,et al.  From Predefined Consistency to User-Centered Emergent Consistency in Real-Time Collaborative Editing Systems , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.