Acceleration of CFD Engineering Software on GPU and MIC

CartSolver is widely used three dimensional Euler solver software for Cartesian grids. In this paper, we use the latest many-core accelerators such as NVIDIA Fermi C2050, NVIDIA Kepler K20 and Intel MIC to do the acceleration, and achieve expected speedup over the serial solver. On the GPU platform, two versions of accelerated CartSolver are implemented and optimized. For MIC, we employ various optimization methods in order to achieve the best performance by an open source performance analysis tool. The differences in architecture and programming model between GPU and MIC are also discussed. In the experiments, the correctness and accuracy of the solvers is validated, and the great effect of optimization methods is also proved. Finally, a new criterion for measuring the workload is proposed, and several recommendations on selecting suitable accelerators for CFD engineering software are given on the base of the comparison of the criteria.

[1]  Philip E. Gibbs Supercomputers, Artificial Intelligence & Brain Power , 2013 .

[2]  Konstantinos I. Karantasis,et al.  Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures , 2010 .

[3]  Paul H. J. Kelly,et al.  Acceleration of a Full-Scale Industrial CFD Application with OP2 , 2014, IEEE Transactions on Parallel and Distributed Systems.

[4]  Stephen M. Longshaw,et al.  DualSPHysics: Open-source parallel CFD solver based on Smoothed Particle Hydrodynamics (SPH) , 2015, Comput. Phys. Commun..

[5]  Gerhard Wellein,et al.  Poster: LIKWID: lightweight performance tools , 2011, SC '11 Companion.

[6]  Yang Liu,et al.  CUDA Implementation of a Euler Solver for Cartesian Grid , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[7]  Kai Lu,et al.  The TianHe-1A Supercomputer: Its Hardware and Software , 2011, Journal of Computer Science and Technology.

[8]  Stephen A. Jarvis,et al.  Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[9]  A. Jameson,et al.  Numerical solution of the Euler equations by finite volume methods using Runge Kutta time stepping schemes , 1981 .

[10]  Wei Liu,et al.  Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications , 2014, The Journal of Supercomputing.

[11]  Andrey Vladimirov,et al.  Cluster-level tuning of a shallow water equation solver on the Intel MIC architecture , 2014, ArXiv.

[12]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[13]  Yuqian Li,et al.  Performance Evaluation and Scalability Analysis of NPB-MZ on Intel Xeon Phi Coprocessor , 2013, NCCET.

[14]  Wei Liu,et al.  Performance Optimization of a CFD Application on Intel Multicore and Manycore Architectures , 2014, ACA.

[15]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[16]  Hans-Joachim Bungartz,et al.  Fast GPGPU Data Rearrangement Kernels using CUDA , 2010, ArXiv.

[17]  John M. Levesque Application Development for Titan - A Multi-Petaflop Hybrid-Multicore MPP System , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.