OpenCL‐based implementation of an unstructured edge‐based finite element convection‐diffusion solver on graphics hardware

SUMMARY The solution of problems in computational fluid dynamics (CFD) represents a classical field for the application of advanced numerical methods. Many different approaches were developed over the years to address CFD applications. Good examples are finite volumes, finite differences (FD), and finite elements (FE) but also newer approaches such as the lattice-Boltzmann (LB), smooth particle hydrodynamics or the particle finite element method. FD and LB methods on regular grids are known to be superior in terms of raw computing speed, but using such regular discretization represents an important limitation in dealing with complex geometries. Here, we concentrate on unstructured approaches which are less common in the GPU world. We employ a nonstandard FE approach which leverages an optimized edge-based data structure allowing a highly parallel implementation. Such technique is applied to the ‘convection-diffusion’ problem, which is often considered as a first step towards CFD because of similarities to the nonconservative form of the Navier–Stokes equations. In this regard, an existing highly optimized parallel OpenMP solver is ported to graphics hardware based on the OpenCL platform. The optimizations performed are discussed in detail. A number of benchmarks prove that the GPU-accelerated OpenCL code consistently outperforms the OpenMP version. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[2]  Ramon Codina,et al.  A stabilized finite element predictor–corrector scheme for the incompressible Navier–Stokes equations using a nodal‐based implementation , 2004 .

[3]  Eugenio Oñate,et al.  A high-resolution Petrov–Galerkin method for the 1D convection–diffusion–reaction problem , 2010 .

[4]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[5]  Pat Hanrahan,et al.  Ray tracing on programmable graphics hardware , 2002, SIGGRAPH Courses.

[6]  Eugenio Oñate Ibáñez de Navarra,et al.  Implementation of a general algorithm for incompressible and compressible flows within the multi-physics code KRATOS and preparation of fluid-structure coupling , 2008 .

[7]  E. Oñate,et al.  A monolithic Lagrangian approach for fluid–structure interaction problems , 2010 .

[8]  Eugenio Oñate,et al.  Analysis of some partitioned algorithms for fluid‐structure interaction , 2010 .

[9]  Eugenio Oñate,et al.  The particle finite element method: a powerful tool to solve incompressible flows with free‐surfaces and breaking waves , 2004 .

[10]  Antonia Larese,et al.  Validation of the particle finite element method (PFEM) for simulation of free surface flows , 2008 .

[11]  Eugenio Oñate,et al.  Modeling incompressible flows at low and high Reynolds numbers via a finite calculus-finite element approach , 2007, J. Comput. Phys..

[12]  Eugenio Oñate,et al.  An Object-oriented Environment for Developing Finite Element Codes for Multi-disciplinary Applications , 2010 .

[13]  Rainald Löhner,et al.  Running unstructured grid‐based CFD solvers on modern graphics hardware , 2011 .

[14]  A. Huerta,et al.  Finite Element Methods for Flow Problems , 2003 .

[15]  Eugenio Oñate,et al.  Advances in the particle finite element method for the analysis of fluid-multibody interaction and bed erosion in free surface flows , 2008 .

[16]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[17]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[18]  Rainald Löhner,et al.  A stabilized edge-based implicit incompressible flow formulation , 2004 .

[19]  A. Huerta,et al.  Finite Element Methods for Flow Problems , 2003 .

[20]  R. Codina Stabilized finite element approximation of transient incompressible flows using orthogonal subscales , 2002 .

[21]  Erik Lindholm,et al.  A user-programmable vertex engine , 2001, SIGGRAPH.

[22]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.