Accelerating the Global Nested Air Quality Prediction Modeling System(GNAQPMS) model on Intel Xeon Phi processors

The GNAQPMS model is the global version of the Nested Air Quality Prediction Modelling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present our work of porting and optimizing the GNAQPMS model on the second generation Intel Xeon Phi processor codename “Knights Landing” (KNL). Compared with the first generation Xeon Phi coprocessor, KNL introduced many new hardware features such as a bootable processor, high performance in-package memory and ISA compatibility with Intel Xeon processor. In particular, we described the five optimizations we applied to the key modules of GNAQPMS – CBM-Z gas chemistry, advection, convection and wet deposition. These optimizations work well on both the KNL 7250 processor as well as the Intel Xeon processor E5-2697 V4. They include: 1) updating the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP in emission, advection, convection and chemistry modules; 2) fully employ the 512-bit wide vector processing units (VPU) on the KNL platform; 3) reducing unnecessary memory access to improve caches efficiency; 4) reducing thread local storage (TLS) in CBM-Z gas phase chemistry module to improve its OpenMP performance; 5) changing global communication from interface-files writing/reading to using Message Passing Interface (MPI) functions to improve the performance and the parallel scalability. These optimizations improved GNAQPMS performance great. The same optimizations also work well for the Intel Xeon Broadwell processor, specifically, E5-2697v4. Compared with the baseline version of GNAQPMS, the optimized version is 3.34x faster on KNL and 2.39x faster on CPU. Furthermore, the optimized version on KNL runs at 26 % lower average power compare to CPU. Combining the performance and energy improvement, the KNL platform is 47% more efficient compare to the CPU platform. The optimizations also enables much further parallel scalability on both the CPU cluster and KNL cluster – scale to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 % and 42.2 %, respectively.

[1]  Z. Wang,et al.  Wet deposition of acidifying substances in different regions of China and the rest of East Asia: modeling with updated NAQPMS. , 2014, Environmental pollution.

[2]  Larry Meadows,et al.  Experiments with WRF on Intel® Many Integrated Core (Intel MIC) Architecture , 2012, IWOMP.

[3]  Jie Li,et al.  A nonnegativity preserved efficient algorithm for atmospheric chemical kinetic equations , 2015, Appl. Math. Comput..

[4]  Bormin Huang,et al.  Optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme for Intel Many Integrated Core (MIC) architecture , 2015, Commercial + Scientific Sensing and Imaging.

[5]  Wang Xi,et al.  Development and Application of Nested Air Quality Prediction Modeling System , 2006 .

[6]  Wei Wang,et al.  A numerical study of contributions to air pollution in Beijing during CAREBeijing-2006 , 2011 .

[7]  Adrian Sandu,et al.  Multi-core acceleration of chemical kinetics for simulation and prediction , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  George Chrysos,et al.  Intel® Xeon Phi coprocessor (codename Knights Corner) , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[9]  M. Wesely Parameterization of surface resistances to gaseous dry deposition in regional-scale numerical models , 1989 .

[10]  Alexander Khain,et al.  Microphysics, Radiation and Surface Processes in the Goddard Cumulus Ensemble (GCE) Model , 2003 .

[11]  Avinash Sodani,et al.  Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[12]  Itsushi Uno,et al.  Neutralization of soil aerosol and its impact on the distribution of acid rain over east Asia: Observations and model results , 2002 .

[13]  Bormin Huang,et al.  Optimizing Weather and Research Forecast (WRF) Thompson cloud microphysics on Intel Many Integrated Core (MIC) , 2014, Sensing Technologies + Applications.

[14]  Bormin Huang,et al.  Revisiting Intel Xeon Phi optimization of Thompson cloud microphysics scheme in Weather Research and Forecasting (WRF) model , 2015, SPIE Remote Sensing.

[15]  G. Carmichael,et al.  GNAQPMS-Hg v1.0, a global nested atmospheric mercury transport model: model description, evaluation and application to trans-boundary transport of Chinese anthropogenic emissions , 2014 .