A PTX Code Generator for LLVM

Today’s GPGPU architectures and corresponding high level programming languages like CUDA replace the traditionally restricted GPU pipelines. Proprietary compilers allow to translate these languages into native GPU assembly. Unfortunately, these compilers are non-customizable and restricted to static compilation. High performant application currently require particular manual optimizations. To overcome these cumbersome manual optimizations, this thesis develops an open source PTX code generator—PTX is assembly code for NVIDIA GPUs. The code generator is based on the existing open source LLVM compiler. In conjunction, both systems compose a customizable compiler for current GPU architectures. Detailed resource analyzes and PTX shader run-time measurements demonstrate the capacity and quality of generated kernels. At this stage the PTX code generator achieves similar performance to the nvcc compiler. The developed compiler forms a sound basis for a variety of applications and further research topics. Additional feature support, novel optimization techniques, and applications from various fields are conceivable.

[1]  Jimmy Pettersson,et al.  Radar Signal Processing with Graphics Processors (GPUS) , 2010 .

[2]  David K. McAllister,et al.  OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..

[3]  Xiaowei Shen,et al.  Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming , 2013, PPoPP 2013.

[4]  Greg Humphreys,et al.  Physically Based Rendering, Second Edition: From Theory To Implementation , 2010 .

[5]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[6]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[7]  Roy Dz-Ching Ju,et al.  Translating Out of Static Single Assignment Form , 1999, SAS.

[8]  Philipp Slusallek,et al.  AnySL: efficient and portable shading for ray tracing , 2010, HPG '10.

[9]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[10]  Hans-Peter Seidel,et al.  Stackless KD‐Tree Traversal for High Performance GPU Ray Tracing , 2007, Comput. Graph. Forum.

[11]  Zhiyi Yang,et al.  Parallel Image Processing Based on CUDA , 2008, 2008 International Conference on Computer Science and Software Engineering.

[12]  P. Slusallek,et al.  RTfact: Generic concepts for flexible and high performance ray tracing , 2008, 2008 IEEE Symposium on Interactive Ray Tracing.

[13]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[14]  Gregory Junker,et al.  Pro OGRE 3D Programming , 2006 .

[15]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[16]  Henk Corporaal,et al.  Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries , 2012 .

[17]  Ken Perlin,et al.  Improving noise , 2002, SIGGRAPH.

[18]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[19]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[20]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[21]  A. Favero,et al.  Italy , 1996, The Lancet.

[22]  S.G. Parker,et al.  Design for Parallel Interactive Ray Tracing Systems , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[23]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .