Firepile: run-time compilation for GPUs in scala

Recent advances have enabled GPUs to be used as general-purpose parallel processors on commodity hardware for little cost. However, the ability to program these devices has not kept up with their performance. The programming model for GPUs has a number of restrictions that make it difficult to program. For example, software running on the GPU cannot perform dynamic memory allocation, requiring the programmer to pre-allocate all memory the GPU might use. To achieve good performance, GPU programmers must also be aware of how data is moved between host and GPU memory and between the different levels of the GPU memory hierarchy. We describe Firepile, a library for GPU programming in Scala. The library enables a subset of Scala to be executed on the GPU. Code trees can be created from run-time function values, which can then be analyzed and transformed to generate GPU code. A key property of this mechanism is that it is modular: unlike with other meta-programming constructs, the use of code trees need not be exposed in the library interface. Code trees are general and can be used by library writers in other application domains. Our experiments show Firepile users can achieve performance comparable to C code targeted to the GPU with shorter, simpler, and easier-to-understand code.

[1]  Nicolas Pinto,et al.  PyCUDA: GPU Run-Time Code Generation for High-Performance Computing , 2009, ArXiv.

[2]  Miguel Garcia,et al.  Extending Scala with Database Query Capability , 2010, J. Object Technol..

[3]  Lennart Ohlsson,et al.  Implementing an embedded GPU language by combining translation and generation , 2006, SAC.

[4]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[5]  David Tarditi,et al.  Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.

[6]  Geoffrey Mainland Why it's nice to be quoted: quasiquoting for haskell , 2007, Haskell '07.

[7]  B. J. Mailloux,et al.  Report on the Algorithmic Language , 1971 .

[8]  Jonathan Rees,et al.  Revised3 report on the algorithmic language scheme , 1986, SIGP.

[9]  Guy L. Steele,et al.  The evolution of Lisp , 1993, HOPL-II.

[10]  Geoffrey Mainland,et al.  Nikola: embedding compiled GPU functions in Haskell , 2010 .

[11]  Guy L. Steele,et al.  Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)) , 2005 .

[12]  Shan Shan Huang,et al.  Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary , 2008, ECOOP.

[13]  Vivek Sarkar,et al.  JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA , 2009, Euro-Par.

[14]  Harold Abelson,et al.  Revised5 report on the algorithmic language scheme , 1998, SIGP.

[15]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[16]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[17]  Kunle Olukotun,et al.  Language virtualization for heterogeneous parallel computing , 2010, OOPSLA.

[18]  Walid Taha,et al.  A Gentle Introduction to Multi-stage Programming , 2003, Domain-Specific Program Generation.

[19]  Nir Shavit,et al.  Noninvasive concurrency with Java STM , 2009 .

[20]  Richard Gabriel,et al.  The Evolution of Lisp , 2008, OOPSLA 2008.

[21]  Alan Bawden,et al.  Quasiquotation in Lisp , 1999, PEPM.

[22]  Peter Thiemann,et al.  Mnemonics: type-safe bytecode generation at run time , 2010, High. Order Symb. Comput..

[23]  Walid Taha,et al.  Multi-stage programming with explicit annotations , 1997, PEPM.

[24]  G. Keller,et al.  GPU Kernels as Data-Parallel Array Computations in Haskell , 2009 .

[25]  Walid Taha,et al.  Multi-stage programming with explicit annotations , 1997 .

[26]  R. Kent Dybvig,et al.  Revised5 Report on the Algorithmic Language Scheme , 1986, SIGP.

[27]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[28]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).