VOBLA: a vehicle for optimized basic linear algebra

We present VOBLA, a domain-specific language designed for programming linear algebra libraries. VOBLA is compiled to PENCIL, a domain independent intermediate language designed for efficient mapping to accelerator architectures such as GPGPUs. PENCIL is compiled to efficient, platform-specific OpenCL code using techniques based on the polyhedral model. This approach addresses both the programmer productivity and performance portability concerns associated with accelerator programming. We demonstrate our approach by using VOBLA to implement a BLAS library. We have evaluated the performance of OpenCL code generated using our compilation flow on ARM Mali, AMD Radeon, and AMD Opteron platforms. The generated code is currently on average 1.9x slower than highly hand-optimized OpenCL code, but on average 8.1x faster than straightforward OpenCL code. Given that the VOBLA coding takes significantly less effort compared to hand-optimizing OpenCL code, we believe our approach leads to improved productivity and performance portability.

[1]  Albert Cohen,et al.  PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs , 2013, HiPC 2013.

[2]  Kunle Olukotun,et al.  Implementing Domain-Specific Languages for Heterogeneous Parallel Computing , 2011, IEEE Micro.

[3]  W. Taha,et al.  Plenary talk III Domain-specific languages , 2008, 2008 International Conference on Computer Engineering & Systems.

[4]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[5]  Mikel Luján,et al.  OoLALA: an object oriented analysis and design of numerical linear algebra , 2000, OOPSLA '00.

[6]  Robert A. van de Geijn,et al.  BLAS (Basic Linear Algebra Subprograms) , 2011, Encyclopedia of Parallel Computing.

[7]  Martin Fowler,et al.  Domain-Specific Languages , 2010, The Addison-Wesley signature series.

[8]  Karl Rupp,et al.  An automatic OpenCL compute kernel generator for basic linear algebra operations , 2012, HiPC 2012.

[9]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[10]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[11]  Arie van Deursen,et al.  Domain-specific languages: an annotated bibliography , 2000, SIGP.

[12]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[13]  Steven Kelly,et al.  Worst Practices for Domain- Specific Modeling domain-sp ec ific m o d eling , 2009 .