On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm☆

Abstract A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from http://intetec.org , has been adapted to run on an Nvidia Tesla general-purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix are performed on the GPU. Out-of-core techniques are used to solve problems larger than the available GPU memory. The code achieved about 10 times speedup in matrix assembly over a single CPU core and about 56 Gflops/s in the LU factorization using only 512 Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.

[1]  Tom Davis,et al.  Opengl programming guide: the official guide to learning opengl , 1993 .

[2]  Robert A. van de Geijn,et al.  Anatomy of a Parallel Out-of-Core Dense Linear Solver , 1995, ICPP.

[3]  Allen Sherrod,et al.  Beginning DirectX 11 Game Programming , 2011 .

[4]  Jack Dongarra,et al.  The Design and Implementation of the Parallel Out-of-coreScaLAPACK LU, QR, and Cholesky Factorization Routines , 1997 .

[5]  Ramani Duraiswami,et al.  Fast multipole methods on graphics processors , 2008, J. Comput. Phys..

[6]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .

[7]  Jack Dongarra,et al.  Key concepts for parallel out-of-core LU factorization , 1998 .

[8]  Jack J. Dongarra,et al.  Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[9]  Randima Fernando,et al.  The CG Tutorial: The Definitive Guide to Programmable Real-Time Graphics , 2003 .

[10]  Sivan Toledo,et al.  The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations , 1996, IOPADS '96.

[11]  S. Nintcheu Fata,et al.  Explicit expressions for 3D boundary integrals in potential theory , 2009 .

[12]  Matthew G. Knepley,et al.  Biomolecular electrostatics using a fast multipole BEM on up to 512 gpus and a billion unknowns , 2010, Comput. Phys. Commun..

[13]  Sivan Toledo Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..

[14]  Jaeyoung Choi,et al.  Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[15]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[16]  M. Bonnet Boundary Integral Equation Methods for Solids and Fluids , 1999 .