Design of MILC Lattice QCD Application for GPU Clusters

We present an implementation of the improved staggered quark action lattice QCD computation designed for execution on a GPU cluster. The parallelization strategy is based on dividing the space-time lattice along the time dimension and distributing the sub-lattices among the GPU cluster nodes. We provide a mixed-precision floating-point GPU implementation of the multi-mass conjugate gradient solver. Our single GPU implementation of the conjugate gradient solver achieves a 9x performance improvement over the highly optimized code executed on a state-of-the-art eight-core CPU node. The overall application executes almost six times faster on a GPU-enabled cluster vs. a conventional multi-core cluster. The developed code is currently used for running production QCD calculations with electromagnetic corrections.

[1]  Bálint Joó,et al.  Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Gerard L. G. Sleijpen,et al.  Reliable updated residuals in hybrid Bi-CG methods , 1996, Computing.

[3]  D. Toussaint,et al.  Nonperturbative QCD Simulations with 2+1 Flavors of Improved Staggered Quarks , 2009, 0903.3598.

[4]  R. Sommer,et al.  An 8 parameter representation of SU(3) matrices and its application for simulating lattice qcd , 1986 .

[5]  Kipton Barros,et al.  Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..

[6]  Kipton Barros,et al.  Blasting through lattice calculations using CUDA , 2008, 0810.5365.

[7]  Volodymyr Kindratenko,et al.  MILC on GPUs , 2011 .

[8]  Bjoern Walk Implementation of the Neuberger overlap operator in GPUs , 2010 .

[9]  P. Bialas,et al.  Lattice QCD with distributed GPUs , 2009 .

[10]  Zoltán Fodor,et al.  Lattice QCD as a video game , 2007, Comput. Phys. Commun..

[11]  Volodymyr V. Kindratenko,et al.  The Bottom-Up Implementation of One MILC Lattice QCD Application on the Cell Blade , 2009, International Journal of Parallel Programming.

[12]  Volodymyr Kindratenko,et al.  Accelerating Quantum Chromodynamics Calculations with GPUs , 2011 .

[13]  Yao-Yuan Mao,et al.  GPU-Based Conjugate Gradient Solver for Lattice QCD with Domain-Wall Fermions , 2010 .

[14]  H.-J. Kim GPU Performnace of Conjugate Gradient Solver with Staggered Fermions , 2011 .

[15]  Khaled Z. Ibrahim,et al.  Fine-grained parallelization of lattice QCD kernel routine on GPUs , 2008, J. Parallel Distributed Comput..

[16]  Nicholas J. Wright,et al.  Characterizing Parallel Scaling of Scientific Applications using IPM , 2009 .

[17]  Benjamin Krill,et al.  QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine , 2008, Computing in Science & Engineering.

[18]  B. Jegerlehner Multiple mass solvers , 1998 .