GPU-based Arnoldi factorisation for accelerating finite element eigenanalysis

We present a GPU-accelerated implementation of the k-step Arnoldi factorisation [1] that forms the basis of a number of iterative eigenvalue system solvers. These solvers are important for the finite element analysis of the cutoff and dispersion characteristics of waveguide structures as well as cavity resonances [2] and since they contribute significantly to the runtime in computing a solution, their acceleration is of interest. The initial GPU-based implementation makes use of accelerated BLAS [3] routines for the CUDA API from NVIDIA (cublas) [4]. This allows us to utilise the computational power of the GPU at a functional level as a proof of concept with minimal coding effort. The implementation is then refined to make use of enhancements to the matrix-vector multiplication routines proposed by Fujimoto in [5] further improving performance.

[1]  Carretera de Valencia,et al.  The finite element method in electromagnetics , 2000 .

[2]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[3]  N. Fujimoto,et al.  Faster matrix-vector multiplication on GeForce 8800GTX , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[4]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies , 2008 .

[5]  Jens H. Krüger,et al.  GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.

[6]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[7]  Jack J. Dongarra,et al.  A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.

[8]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[9]  Howard C. Reader,et al.  Understanding Microwave Heating Cavities , 2000 .

[10]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[11]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[12]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[13]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[14]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[15]  D. Davidson Computational Electromagnetics for RF and Microwave Engineering: The method of moments and stratified media: theory , 2005 .

[16]  D. Pozar Microwave Engineering , 1990 .

[17]  Yu Zhu,et al.  Multigrid Finite Element Methods for Electromagnetic Field Modeling , 2006 .

[18]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[19]  J. Demmel,et al.  Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices , 2007 .