The Sunway TaihuLight supercomputer based on the Chinese-designed many-core processors is the world^s fastest system with a peak performance of 125. 4 PFlops. OpenFOAM (open source field operation and manipulation) is one of the most popular open source computational fluid dynamics (CFD) software which is written in C + + and not fully compatible with compilers on the heterogeneous many-core processor SW26010. This paper ported OpenFOAM based on SW 26010’ s MPE(management processing element)/CPE (computing processing element) cluster architec ture. To overcome the compilation incompatibility problem, we adopted the mixed-language application design. We also applied several SW26010^s feature-specific optimizations on the hotspot of OpenFOAM to deliver high performance, such as the register communication, vectorization, and double buffering. The experiments on SW26010 using real data sets show that the single-CG (core group) code runs 8. 03x faster than the well-tuned version on the MPE,and the per formance of single-CG is 1. 18x higher than the serial implementation of Intel(R) Xeon(R) CPU E5-2695 v3. We also optimized the parallel implementation of OpenFOAM and yielded speedups of 184. 9x on 256 CGs. The porting methods and optimizations presented can also be referenced for other complex C + + programs to achieve high performance on SW26010.
[1]
Jonas Tölke,et al.
Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA
,
2009,
Comput. Vis. Sci..
[2]
Ramakrishna Mukkamala,et al.
A Simple Adaptive Transfer Function for Deriving the Central Blood Pressure Waveform from a Radial Blood Pressure Waveform
,
2016,
Scientific Reports.
[3]
C. H. Chen,et al.
Estimation of central aortic pressure waveform by mathematical transformation of radial tonometry pressure. Validation of generalized transfer function.
,
1997,
Circulation.
[4]
Zhao Ha.
Sinus Bradycardia Detection Method Based on Photoplethysmography for Wearable Computing
,
2015
.
[5]
Zhao Zhiqiang,et al.
Study on Pulse Wave Signal Noise Reduction and Feature Point Identification
,
2013
.
[6]
Amit Kumar,et al.
Optimal Selection of Wavelet Function and Decomposition Level for Removal of ECG Signal Artifacts
,
2015
.