论文信息 - Vectorized OpenCL implementation of numerical integration for higher order finite elements

Vectorized OpenCL implementation of numerical integration for higher order finite elements

In our work we analyze computational aspects of the problem of numerical integration in finite element calculations and consider an OpenCL implementation of related algorithms for processors with wide vector registers. As a platform for testing the implementation we choose the PowerXCell processor, being an example of the Cell Broadband Engine (CellBE) architecture. Although the processor is considered old for today's standards (its design dates back to year 2001), we investigate its performance due to two features that it shares with recent Xeon Phi family of co-processors: wide vector units and relatively slow connection of computing cores with main global memory. The performed analysis of parallelization options can also be used for designing numerical integration algorithms for other processors with vector registers, such as contemporary x86 microprocessors. We consider higher order finite element approximations and implement the standard algorithm of numerical integration for prismatic elements. Original contributions of the paper include the analysis of data movement and vector operations performed during code execution. Several versions of the implementation are developed and tested in practice.

Filip Kruel | Krzysztof Bana

[1] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[2] Krzysztof Banas,et al. Higher order FEM numerical integration on GPUs with OpenCL , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[3] Robert Michael Kirby,et al. From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations , 2010, J. Comput. Phys..

[4] Lukasz Szustak,et al. Model-driven adaptation of double-precision matrix multiplication to the Cell processor architecture , 2012, Parallel Comput..

[5] Gordon Erlebacher,et al. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[6] Victor M. Calo,et al. Computational complexity and memory usage for multi-frontal direct solvers used in p finite element analysis , 2011, ICCS.

[7] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[8] Jens Markus Melenk,et al. Fully discrete hp-finite elements: fast quadrature , 2001 .

[9] Krzysztof Banas,et al. Numerical integration on GPUs for higher order finite elements , 2013, Comput. Math. Appl..

[10] L. Demkowicz. One and two dimensional elliptic and Maxwell problems , 2006 .

[11] Krzysztof Banas,et al. Finite Element Numerical Integration on GPUs , 2009, PPAM.

[12] Matthew G. Knepley,et al. Finite Element Integration on GPUs , 2013, TOMS.

[13] Eric Darve,et al. Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics , 2011 .

[14] Michal Mrozowski,et al. Generation of large finite-element matrices on multiple graphics processors , 2013 .