An adaptive methodology for multi-GPU programming in OpenCL

Purpose The purpose of this work is to present a methodology that harnesses the computational power of multiple graphics processing units (GPUs) and hides the complexities of tuning GPU parameters from the users. Design/methodology/approach A methodology for auto-tuning OpenCL configuration parameters has been developed. Findings This described process helps simplify coding and generates a significant gain in time for each method execution. Originality/value Most authors develop their GPU applications for specific hardware configurations. In this work, a solution is offered to make the developed code portable to any GPU hardware.

[1]  P. Sadayappan,et al.  High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.

[2]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[3]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[4]  Basilio B. Fraguela,et al.  Automatic Generation of Optimized OpenCL Codes Using OCLoptimizer , 2015, Comput. J..

[5]  Barry W. Peyton,et al.  Progress in Sparse Matrix Methods for Large Linear Systems On Vector Supercomputers , 1987 .

[6]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[7]  Satoshi Matsuoka,et al.  Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[9]  Cedric Nugteren,et al.  CLTune: A Generic Auto-Tuner for OpenCL Kernels , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.

[10]  Andrew Harry Sherman,et al.  On the efficient solution of sparse systems of linear and nonlinear equations. , 1975 .

[11]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[12]  Frank Mueller,et al.  Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.

[13]  J. K. Reid,et al.  MA48: A FORTRAN code for direct solution of sparse unsymmetric linear systems of equations , 1993 .

[14]  Michael Gerndt,et al.  Tuning OpenCL Applications with the Periscope Tuning Framework , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[15]  J. Demmel,et al.  Solving Sparse Linear Systems with Sparse Backward Error , 2015 .

[16]  Vijay K. Naik Factoring Dense And Sparse Matrices , 1993 .

[17]  Eduard Ayguadé,et al.  Auto-Tuning OmpSs-OpenCL Kernels Across GPU Machines , 2015, PARMA-DITAM '15.

[18]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .