OpenMP 4.0 based heterogeneous parallel optimization of an engine combustion simulation application

LESAP is a combustion simulation application capable of simulating the chemical reactions and supersonic flows in the scramjet engines. It can be used to solve practical engineering problems and involves a large amount of computations. In this paper, we port and optimize LESAP with the OpenMP 4.0 accelerator model, targeting the heterogeneous many-core platform composed of general CPU and Intel Many Integrated Core (MIC). Based on the application characteristics, a series of techniques are proposed, including OpenMP 4.0 based task offloading, data movement optimization, grid-partition based load-balancing and SIMD optimization. The performance evaluation is done for a real combustion simulation configuration, with 532,0896 grid cells, on one Tianhe-2 supercomputer node. The results show that the resulting heterogenous code significantly outperforms the original CPU only code. When the heterogenous code runs on two Intel Xeon E5-2692 CPUs and three Intel Xeon Phi 31S1P coprocessors, it achieves a maximum speedup of 3.63x over the original code that only runs on the two Intel Xeon E5-2692 CPUs.

[1]  Ning Qin,et al.  Large eddy simulation based studies of jet–cavity interactions in a supersonic flow , 2014 .

[2]  Jianbin Fang,et al.  Realistic Performance Characterization of CFD Applications on Intel Many Integrated Core Architecture , 2015, Comput. J..

[3]  Jie Shen,et al.  Look before You Leap: Using the Right Hardware Resources to Accelerate Applications , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).

[4]  Niu Dong-shen Three-dimensional combustion numerical simulation of scramjet internal and external flow fields , 2014 .

[5]  Ning Qin,et al.  Simulations of combustion with normal and angled hydrogen injection in a cavity-based supersonic combustor , 2014 .

[6]  Simon McIntosh-Smith,et al.  On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures , 2014, ISC.

[7]  Bronis R. de Supinski,et al.  Early Experiences with the OpenMP Accelerator Model , 2013, IWOMP.

[8]  Liang Deng,et al.  Acceleration of CFD Engineering Software on GPU and MIC , 2015, ICA3PP.

[9]  Ning Qin,et al.  Combustion characteristics in a supersonic combustor with hydrogen injection upstream of cavity flameholder , 2013 .

[10]  王锋,et al.  Programming for scientific computing on peta-scale heterogeneous parallel systems , 2013 .

[11]  Yi Jiang,et al.  Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer , 2014, J. Comput. Phys..

[12]  Ning Qin,et al.  Characteristics of Oscillations in Supersonic Open Cavity Flows , 2013 .

[13]  Alistair Hart First Experiences Porting a Parallel Application to a Hybrid Supercomputer with OpenMP4.0 Device Constructs , 2015, IWOMP.

[14]  Liang Deng,et al.  Kepler GPU vs. Xeon Phi: Performance case study with a high-order CFD application , 2015, 2015 IEEE International Conference on Computer and Communications (ICCC).

[15]  Chi-Wang Shu,et al.  Efficient Implementation of Weighted ENO Schemes , 1995 .

[16]  Canqun Yang,et al.  MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.

[17]  Liu Li,et al.  A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization , 2012 .