Experiences of Performance Optimization for Large Eddy Simulation on Intel MIC Platforms

Large Eddy Simulation (LES) is a mathematical model for turbulence used in Computational Fluid Dynamics (CFD). We have implemented LES on multi-core CPUs and General Purpose Graphics Processing Units (GPGPUs). In this work, we port and optimize LES on Intel Many Integrated Core (MIC) platforms. On Intel MIC co-processor (KNC), we implement LES using the main execution modes, including native, offload and symmetric execution modes. The newly emerging second generation of Intel MIC processor (Knights Landing, i.e. KNL) acts as an independent multi-core computing node, it is more convenient to port the application. On both of the MIC platforms, some important performance optimization techniques are implemented and evaluated, such as parallelization with OpenMP threads and MPI processes, single-instruction-multiple-data (SIMD) vectorization, memory access optimization, threads scheduling, etc. The experimental results demonstrate that performance optimization techniques are very important when porting applications on MIC platforms.

[1]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.

[2]  Anthony Nguyen,et al.  Simulating stencil-based application on future Xeon Phi processor , 2015, PMBS '15.

[3]  Rezaur Rahman,et al.  Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .

[4]  Inanc Senocak,et al.  Large-Eddy Simulations of Turbulent Incompressible Flows on GPU Clusters , 2013, Computing in Science & Engineering.

[5]  Christian Pérez,et al.  Performance Evaluation and Tuning of 2D Jacobi Iteration on Many-Core Machines , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[6]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[7]  A. P. Siebesma,et al.  Weather Forecasting Using GPU-Based Large-Eddy Simulations , 2015 .

[8]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[9]  Wei Liu,et al.  Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications , 2014, The Journal of Supercomputing.

[10]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[11]  Thomas D. Economon,et al.  Verification and Validation of HiFiLES: a High-Order LES unstructured solver on multi-GPU platforms , 2014 .

[12]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[13]  Shuaiwen Song,et al.  Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax–Wendroff correction stencil , 2014, Int. J. High Perform. Comput. Appl..

[14]  Xian Wang,et al.  Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs , 2014 .

[15]  Xuhao Chen,et al.  Evaluating Scalability of Emerging Multithreaded Applications on Commodity Multicore Server , 2011, 2011 International Conference of Information Technology, Computer Engineering and Management Sciences.

[16]  Rezaur Rahman Intel® Xeon Phi™ Coprocessor Architecture and Tools , 2013, Apress.

[17]  Lieven Eeckhout,et al.  Undersubscribed threading on clustered cache architectures , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[18]  Karsten Schwan,et al.  Evaluating Scalability of Multi-threaded Applications on a Many-core Platform , 2012 .

[19]  Jack J. Dongarra,et al.  The LINPACK Benchmark: An Explanation , 1988, ICS.

[20]  Qing Zhang,et al.  A parallel lattice Boltzmann method for large eddy simulation on multiple GPUs , 2013, Computing.

[21]  V. Lakshmikantham,et al.  Stability of conditionally invariant sets and controlleduncertain dynamic systems on time scales , 1995 .