Finite element numerical integration on Xeon Phi coprocessor

In the present article we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor is an extension of the idea of the many-core specialized unit for calculations and, by assumption, its performance has to be competitive with the current families of GPUs. Its main advantage is the built-in set of 512-bit vector registers and the ease of transferring existing codes from normal x86 architectures. However, the differences between standard x86 architectures and Xeon Phi do not guarantee performance portability. We choose an alternative approach and, instead of porting standard multithreaded code, we adapt to Xeon Phi previously developed OpenCL algorithms for finite element numerical integration. The algorithm is tested for standard FEM approximations of selected problems. The obtained timing results allow to compare the performance of the OpenCL kernels executed on the Xeon Phi and the contemporary GPUs.

[1]  Sean Rul,et al.  An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.

[2]  N.K. Govindaraju,et al.  A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3]  Nicholas Wilt,et al.  The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .

[4]  Michael Lang,et al.  Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[6]  Lukasz Szustak,et al.  Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture , 2009, PPAM.

[7]  Scott Pakin,et al.  Entering the petaflop era: the architecture and performance of Roadrunner , 2008, HiPC 2008.

[8]  Krzysztof Banas,et al.  Vectorized OpenCL implementation of numerical integration for higher order finite elements , 2013, Comput. Math. Appl..

[9]  David Kaeli,et al.  Heterogeneous Computing with OpenCL , 2011 .

[10]  Krzysztof Banaś,et al.  ModFem : a computational framework for parallel adaptive finite element simulations , 2013 .

[11]  Krzysztof Banas,et al.  Numerical integration on GPUs for higher order finite elements , 2013, Comput. Math. Appl..

[12]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[13]  Filip Kruel,et al.  Vectorized OpenCL implementation of numerical integration for higher order finite elements , 2013 .