Evaluation of state-of-the-art polyhedral tools for automatic code generation on GPUs

At present, multi-core and manycore platforms lead the computer industry, forcing software developers to adopt new programming paradigms, in order to fully exploit their computing capabilities. Nowadays, Graphics Processing Units (GPUs) are one of representatives of many-core architectures, and certainly the most widespread. This paper evaluates and compares tool frameworks that automatically generate code for GPUs, saving time and effort to programmers. These frameworks take advantage of Polyhedral Model techniques to exploit parallelism and to satisfy the specific GPU constraints. The paper shows the key features of some of these source-to-source compilers and analyzes the code that they generate. Finally we discuss the importance of some key aspects such as data mapping and code quality.

[1]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[2]  Benoît Meister,et al.  A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.

[3]  Chun Chen,et al.  A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.

[4]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[5]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[6]  J. Ramanujam,et al.  Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.

[7]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[8]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[9]  Cédric Bastoul,et al.  Efficient code generation for automatic parallelization and optimization , 2003, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings..

[10]  Paul Feautrier,et al.  Improving Data Locality by Chunking , 2003, CC.

[11]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[12]  Cédric Bastoul,et al.  Reordering methods for data locality improvement , 2003 .

[13]  Albert Cohen,et al.  Putting Automatic Polyhedral Compilation for GPGPU to Work , 2011 .

[14]  Jack Dongarra,et al.  An Improved MAGMA GEMM for Fermi GPUs , 2010 .

[15]  Armin Größlinger Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes , 2009, CC.