Optimizing Fusion PIC Code Performance at Scale on Cori Phase Two
暂无分享,去创建一个
In this paper we present the results of optimizing the performance of the gyrokinetic full-f fusion PIC code XGC1 on the Cori Phase Two Knights Landing system. The code has undergone substantial development to enable the use of vector instructions in its most expensive kernels within the NERSC Exascale Science Applications Program. We study the single-node performance of the code on an absolute scale using the roofline methodology to guide optimization efforts. We have obtained 2\({\times }\) speedups in single node performance due to enabling vectorization and performing memory layout optimizations. On multiple nodes, the code is shown to scale well up to 4000 nodes, near half the size of the machine. We discuss some communication bottlenecks that were identified and resolved during the work.
[1] Samuel Williams,et al. Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor , 2016, ISC Workshops.
[2] Choong-Seock Chang,et al. Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry , 2009 .
[3] S. Ethier,et al. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms , 2005 .