论文信息 - Parallel Assembly of ACA BEM Matrices on Xeon Phi Clusters

Parallel Assembly of ACA BEM Matrices on Xeon Phi Clusters

The paper presents parallelization of the boundary element method in distributed memory of a cluster equipped with many-core based compute nodes. A method for efficient distribution of boundary element matrices among MPI processes based on the cyclic graph decompositions is described. In addition, we focus on the intra-node optimization of the code, which is necessary in order to fully utilize the many-core processors with wide SIMD registers. Numerical experiments carried out on a cluster consisting of the Intel Xeon Phi processors of the Knights Landing generation are presented.

[1] Petr Kovár,et al. A parallel fast boundary element method using cyclic graph decompositions , 2015, Numerical Algorithms.

[2] Ronald Kriemann,et al. Fast parallel solution of boundary integral equations and related problems , 2005 .

[3] Mario Bebendorf,et al. Approximation of boundary element matrices , 2000, Numerische Mathematik.

[4] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[5] Ondrej Meca,et al. Intel Xeon Phi acceleration of Hybrid Total FETI solver , 2017, Adv. Eng. Softw..

[6] Sergej Rjasanow,et al. Adaptive Low-Rank Approximation of Collocation Matrices , 2003, Computing.

[7] Olaf Steinbach,et al. Numerical Approximation Methods for Elliptic Boundary Value Problems: Finite and Boundary Elements , 2007 .

[8] Jack Dongarra,et al. Report on the Sunway TaihuLight System , 2016 .

[9] C. Schwab,et al. Boundary Element Methods , 2010 .

[10] Michal Merta,et al. Boundary element quadrature schemes for multi- and many-core architectures , 2017, Comput. Math. Appl..

[11] Jirí Jaros,et al. Many Core Acceleration of the Boundary Element Method , 2015, HPCSE.

[12] Wim Vanroose,et al. Efficient Implementation of Total FETI Solver for Graphic Processing Units Using Schur Complement , 2015, HPCSE.

[13] Gerhard Wellein,et al. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..

[14] S. Rjasanow,et al. The Fast Solution of Boundary Integral Equations , 2007 .