Parallel Assembly of ACA BEM Matrices on Xeon Phi Clusters

The paper presents parallelization of the boundary element method in distributed memory of a cluster equipped with many-core based compute nodes. A method for efficient distribution of boundary element matrices among MPI processes based on the cyclic graph decompositions is described. In addition, we focus on the intra-node optimization of the code, which is necessary in order to fully utilize the many-core processors with wide SIMD registers. Numerical experiments carried out on a cluster consisting of the Intel Xeon Phi processors of the Knights Landing generation are presented.

[1]  Petr Kovár,et al.  A parallel fast boundary element method using cyclic graph decompositions , 2015, Numerical Algorithms.

[2]  Ronald Kriemann,et al.  Fast parallel solution of boundary integral equations and related problems , 2005 .

[3]  Mario Bebendorf,et al.  Approximation of boundary element matrices , 2000, Numerische Mathematik.

[4]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[5]  Ondrej Meca,et al.  Intel Xeon Phi acceleration of Hybrid Total FETI solver , 2017, Adv. Eng. Softw..

[6]  Sergej Rjasanow,et al.  Adaptive Low-Rank Approximation of Collocation Matrices , 2003, Computing.

[7]  Olaf Steinbach,et al.  Numerical Approximation Methods for Elliptic Boundary Value Problems: Finite and Boundary Elements , 2007 .

[8]  Jack Dongarra,et al.  Report on the Sunway TaihuLight System , 2016 .

[9]  C. Schwab,et al.  Boundary Element Methods , 2010 .

[10]  Michal Merta,et al.  Boundary element quadrature schemes for multi- and many-core architectures , 2017, Comput. Math. Appl..

[11]  Jirí Jaros,et al.  Many Core Acceleration of the Boundary Element Method , 2015, HPCSE.

[12]  Wim Vanroose,et al.  Efficient Implementation of Total FETI Solver for Graphic Processing Units Using Schur Complement , 2015, HPCSE.

[13]  Gerhard Wellein,et al.  A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..

[14]  S. Rjasanow,et al.  The Fast Solution of Boundary Integral Equations , 2007 .