Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

Numerical methods for elliptic partial differential equations (PDEs) within both continuous and hybridized discontinuous Galerkin (HDG) frameworks share the same general structure: local (elemental) matrix generation followed by a global linear system assembly and solve. The lack of inter-element communication and easily parallelizable nature of the local matrix generation stage coupled with the parallelization techniques developed for the linear system solvers make a numerical scheme for elliptic PDEs a good candidate for implementation on streaming architectures such as modern graphical processing units (GPUs). We propose an algorithmic pipeline for mapping an elliptic finite element method to the GPU and perform a case study for a particular method within the HDG framework. This study provides comparison between CPU and GPU implementations of the method as well as highlights certain performance-crucial implementation details. The choice of the HDG method for the case study was dictated by the computationally-heavy local matrix generation stage as well as the reduced trace-based communication pattern, which together make the method amenable to the fine-grained parallelism of GPUs. We demonstrate that the HDG method is well-suited for GPU implementation, obtaining total speedups on the order of 30–35 times over a serial CPU implementation for moderately sized problems.

[1]  Timothy C. Warburton,et al.  High-Order Discontinuous Galerkin Methods by GPU Metaprogramming , 2012, ArXiv.

[2]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[3]  Robert Strzodka,et al.  Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU , 2009, Int. J. Comput. Sci. Eng..

[4]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[5]  Moshe Dubiner Spectral methods on triangles and other domains , 1991 .

[6]  M. Clemens,et al.  GPU accelerated Discontinuous Galerkin FEM for electromagnetic radio frequency problems , 2009, 2009 IEEE Antennas and Propagation Society International Symposium.

[7]  Douglas N. Arnold,et al.  Unified Analysis of Discontinuous Galerkin Methods for Elliptic Problems , 2001, SIAM J. Numer. Anal..

[8]  Yizhou Yu,et al.  Particle-based simulation of granular materials , 2005, SCA '05.

[9]  Nachum Dershowitz,et al.  Generic Parallel Algorithms , 2014, CiE.

[10]  Rada Mihalcea,et al.  A WordNet-Based Interface to Internet Search Engines , 1998, FLAIRS.

[11]  J. Hesthaven,et al.  Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications , 2007 .

[12]  Rada Mihalcea,et al.  Semantic Indexing using WordNet Senses , 2000 .

[13]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[14]  Jack Dongarra,et al.  A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs , 2012 .

[15]  Stéphane Lanteri,et al.  An implicit hybridized discontinuous Galerkin method for time-domain Maxwell's equations , 2011 .

[16]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Ray Richardson A semantic-based approach to information processing , 1994 .

[18]  Rada Mihalcea,et al.  Improving the Search on the Internet by Using WordNet and Lexical Operators , 1999 .

[19]  Robert Strzodka,et al.  Using GPUs to improve multigrid solver performance on a cluster , 2008, Int. J. Comput. Sci. Eng..

[20]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[21]  Stefan Wermter,et al.  Neural Network Agents for Learning Semantic Text Classification , 2000, Information Retrieval.

[22]  Risto Miikkulainen,et al.  Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map , 1993, IEEE International Conference on Neural Networks.

[23]  Jack Dongarra,et al.  Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures , 2011 .

[24]  Zhisong Fu,et al.  A High-Performance Multi-Element Processing Framework on GPUs , 2013 .

[25]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[26]  George Em Karniadakis,et al.  A triangular spectral element method; applications to the incompressible Navier-Stokes equations , 1995 .

[27]  Bo Dong,et al.  A superconvergent LDG-hybridizable Galerkin method for second-order elliptic problems , 2008, Math. Comput..

[28]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[29]  Robert Michael Kirby,et al.  To CG or to HDG: A Comparative Study , 2012, J. Sci. Comput..

[30]  Markus Clemens,et al.  GPU Accelerated Adams–Bashforth Multirate Discontinuous Galerkin FEM Simulation of High-Frequency Electromagnetic Fields , 2010, IEEE Transactions on Magnetics.

[31]  Raytcho D. Lazarov,et al.  Unified Hybridization of Discontinuous Galerkin, Mixed, and Continuous Galerkin Methods for Second Order Elliptic Problems , 2009, SIAM J. Numer. Anal..

[32]  Haiying Wang,et al.  Superconvergent discontinuous Galerkin methods for second-order elliptic problems , 2009, Math. Comput..

[33]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[34]  Bernd Fritzke,et al.  Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison , 1992, NIPS.

[35]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[36]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[37]  Stan Matwin,et al.  Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[38]  Yaakoub El Khamra,et al.  A Parallel High-Order Discontinuous Galerkin Shallow Water Model , 2009, ICCS.

[39]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[40]  Xevi Roca,et al.  GPU-accelerated sparse matrix-vector product for a hybridizable discontinuous Galerkin method , 2011 .

[41]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[42]  Ian Buck GPU Computing: Programming a Massively Parallel Processor , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[43]  Francisco-Javier Sayas,et al.  A projection-based error analysis of HDG methods , 2010, Math. Comput..

[44]  Hong Luo,et al.  A communication-efficient, distributed memory parallel code using discontinuous Galerkin method for compressible flows , 2010, 2010 6th International Conference on Emerging Technologies (ICET).

[45]  Peter Vos From h to p efficiently : optimising the implementation of spectral/hp element methods , 2011 .

[46]  George Em Karniadakis,et al.  The Development of Discontinuous Galerkin Methods , 2000 .

[47]  G. Karniadakis,et al.  Spectral/hp Element Methods for CFD , 1999 .