Effective Extensible Programming: Unleashing Julia on GPUs

GPUs and other accelerators are popular devices for accelerating compute-intensive, parallelizable applications. However, programming these devices is a difficult task. Writing efficient device code is challenging, and is typically done in a low-level programming language. High-level languages are rarely supported, or do not integrate with the rest of the high-level language ecosystem. To overcome this, we propose compiler infrastructure to efficiently add support for new hardware or environments to an existing programming language. We evaluate our approach by adding support for NVIDIA GPUs to the Julia programming language. By integrating with the existing compiler, we significantly lower the cost to implement and maintain the new compiler, and facilitate reuse of existing application code. Moreover, use of the high-level Julia programming language enables new and dynamic approaches for GPU programming. This greatly improves programmer productivity, while maintaining application performance similar to that of the official NVIDIA CUDA toolkit.

[1]  Lennart Ohlsson,et al.  PyGPU: A high-level language for high-speed image processing , 2007 .

[2]  Daniel Zingaro,et al.  Modern Extensible Languages , 2007 .

[3]  Jeffrey Overbey,et al.  COMPARING PROGRAMMER PRODUCTIVITY IN OPENACC AND CUDA: AN EMPIRICAL INVESTIGATION , 2016 .

[4]  Karl Rupp,et al.  ViennaCL-A High Level Linear Algebra Library for GPUs and Multi-Core CPUs , 2010 .

[5]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[6]  Mose Giordano,et al.  Uncertainty propagation with functionally correlated quantities , 2016, 1610.08716.

[7]  Jingyue Wu,et al.  gpucc: An open-source GPGPU compiler , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[8]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  N Solntseff,et al.  A survey of extensible programming languages , 1974 .

[10]  Krunal Patel,et al.  ArrayFire: a GPU acceleration platform , 2012, Defense, Security, and Sensing.

[11]  David M. Ciemiewicz What Do You Mean? - Revisiting Statistics for Web Response Time Measurements , 2001, Int. CMG Conference.

[12]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[13]  RabbahRodric,et al.  Compiling a high-level language for GPUs , 2012 .

[14]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[15]  John R. Mashey,et al.  War of the benchmark means: time for a truce , 2004, CARN.

[16]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[17]  Dimitrios Soudris,et al.  A survey on reconfigurable accelerators for cloud computing , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[18]  Tia Newhall,et al.  Chestnut: a GPU programming language for non-experts , 2012, PMAM '12.

[19]  Lei Xing,et al.  GPU computing in medical physics: a review. , 2011, Medical physics.

[20]  M. Ragan-Kelley,et al.  The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. , 2014 .

[21]  Partha Pratim Pande,et al.  Hardware accelerators for biocomputing: A survey , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[22]  Mike Innes,et al.  Flux: Elegant machine learning with Julia , 2018, J. Open Source Softw..

[23]  Dennis Shasha,et al.  Parakeet: a just-in-time parallel accelerator for python , 2012, HotPar'12.

[24]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[25]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[26]  Bjorn De Sutter,et al.  Case study of multiple trace transform implementations , 2015, Int. J. High Perform. Comput. Appl..

[27]  Manuel M. T. Chakravarty,et al.  Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.

[28]  Jürgen Teich,et al.  HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.

[29]  Srinivas Aluru,et al.  A Review of Hardware Acceleration for Computational Genomics , 2014, IEEE Design & Test.

[30]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[31]  Andreas Herkersdorf,et al.  Enabling FPGAs in Hyperscale Data Centers , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

[32]  Gavin Brown,et al.  Boosting Java Performance Using GPGPUs , 2015, ARCS.

[33]  Philip C. Pratt-Szeliga,et al.  Rootbeer: Seamlessly Using GPUs from Java , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.