论文信息 - Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures

Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures

This study compares the performance of high-order discontinuous Galerkin finite elements on modern hardware. The main computational kernel is the matrix-free evaluation of differential operators by sum factorization, exemplified on the symmetric interior penalty discretization of the Laplacian as a metric for a complex application code in fluid dynamics. State-of-the-art implementations of these kernels stress both arithmetics and memory transfer. The implementations of SIMD vectorization and shared-memory parallelization are detailed. Computational results are presented for dual-socket Intel Haswell CPUs at 28 cores, a 64-core Intel Knights Landing, and a 16-core IBM Power8 processor. Up to polynomial degree six, Knights Landing is approximately twice as fast as Haswell. Power8 performs similarly to Haswell, trading a higher frequency for narrower SIMD units. The performance comparison shows that simple ways to express parallelism through for loops perform better on medium and high core counts than a more elaborate task-based parallelization with dynamic scheduling according to dependency graphs, despite less memory transfer in the latter algorithm.

Katharina Kormann | Martin Kronbichler | Momme Allalen | Igor Pasichnyk

[1] J. Hesthaven,et al. Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications , 2007 .

[2] Martin Kronbichler,et al. A Performance Comparison of Continuous and Discontinuous Galerkin Methods with Fast Multigrid Solvers , 2016, SIAM J. Sci. Comput..

[3] David A. Kopriva,et al. Implementing Spectral Methods for Partial Differential Equations , 2009 .

[4] James Reinders,et al. Intel® threading building blocks , 2008 .

[5] Katharina Kormann,et al. Parallel Finite Element Operator Application: Graph Partitioning and Coloring , 2011, 2011 IEEE Seventh International Conference on eScience.

[6] Martin Kronbichler,et al. A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow , 2016, J. Comput. Phys..

[7] Claus-Dieter Munz,et al. Explicit Discontinuous Galerkin methods for unsteady problems , 2012 .

[8] David Wells,et al. The deal.II Library, Version 8.4 , 2016, J. Num. Math..

[9] G. Karniadakis,et al. Spectral/hp Element Methods for Computational Fluid Dynamics , 2005 .

[10] Katharina Kormann,et al. A generic interface for parallel cell-based finite element operator application , 2012 .

[11] Avinash Sodani,et al. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[12] S. Orszag,et al. High-order splitting methods for the incompressible Navier-Stokes equations , 1991 .