Accelerating Lattice QCD on Sunway Many-Core Processor

Lattice quantum chromodynamics (lattice QCD) is a mainstream non-perturbative theoretical calculation method. It's for studying quantum chromodynamics (QCD) using lattice quantum field theory, by defining field variables at discrete timespace points and large-scale Monte Carlo numerical simulation calculation. It's computing results can directly compare with the experimental results, but the conventional computing platform is difficult to meet the large-scale and high-precision lattice QCD computational simulation demand. Sunway TaihuLight is the first supercomputer with peak performance over 100Pflops in the world, which provides a new platform for the calculation of lattice QCD. But the efficient large-scale parallel lattice QCD computing faces many difficult problems in implement. In order to realize the efficient calculation of the lattice QCD in Sunway many-core processor, we designs a parallel acceleration calculation method of lattice QCD on Sunway Architecture in this paper. A new parallel computing method is put forward, and the method of data segmentation, data transmission and parallel computing is improved and optimized. Finally, the test data is used to test the optimized parallelization computing method proposed in this paper and the original serial computing method. Experiments show that the parallel optimized computing method can achieve 63 times performance improvement compared with the original serial computing method.

[1]  Kipton Barros,et al.  Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..

[2]  Wenguang Chen,et al.  Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Guangwen Yang,et al.  swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[4]  Wenguang Chen,et al.  Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Uday Bondhugula,et al.  Effective automatic parallelization of stencil computations , 2007, PLDI '07.

[6]  Creutz,et al.  Overrelaxation and Monte Carlo simulation. , 1987, Physical review. D, Particles and fields.

[7]  K. Wilson Confinement of Quarks , 1974 .

[8]  Philip Heidelberger,et al.  The BlueGene/L supercomputer and quantum ChromoDynamics , 2006, SC.

[9]  Hideo Matsufuru,et al.  Wilson and Domainwall Kernels on Oakforest-PACS , 2017, ArXiv.

[10]  Naga K. Govindaraju,et al.  Challenges and Opportunities in Many-Core Computing , 2008, Proceedings of the IEEE.

[11]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.