Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multigrid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.

[1]  B. Sheikholeslami,et al.  Improved continuum limit lattice action for QCD with wilson fermions , 1985 .

[2]  Robert G. Edwards,et al.  The Chroma Software System for Lattice QCD , 2004 .

[3]  A. FROMMER,et al.  Adaptive Aggregation-Based Domain Decomposition Multigrid for the Lattice Wilson-Dirac Operator , 2013, SIAM J. Sci. Comput..

[4]  M. Luscher Local coherence and deflation of the low quark modes in lattice QCD , 2007, 0706.2298.

[5]  M. A. Clark,et al.  High-efficiency Lattice QCD computations on the Fermi architecture , 2012, 2012 Innovative Parallel Computing (InPar).

[6]  Volodymyr V. Kindratenko,et al.  Design of MILC Lattice QCD Application for GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[7]  K. Kahl,et al.  An adaptive aggregation based domain decomposition multilevel method for the lattice wilson dirac operator: multilevel results , 2013, 1307.6101.

[8]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[9]  Thomas A. Manteuffel,et al.  Adaptive Smoothed Aggregation (AlphaSA) Multigrid , 2005, SIAM Rev..

[10]  Claudio Rebbi,et al.  PROJECTIVE MULTIGRID FOR WILSON FERMIONS , 1991 .

[11]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[12]  V. Strassen Gaussian elimination is not optimal , 1969 .

[13]  Peter A. Boyle,et al.  Grid: A next generation data parallel C++ QCD library , 2015, ArXiv.

[14]  Matthias Rottmann,et al.  Adaptive algebraic multigrid on SIMD architectures , 2015 .

[15]  T. Manteuffel,et al.  Adaptive Smoothed Aggregation ( α SA ) Multigrid ∗ , 2005 .

[17]  Paulius Micikevicius,et al.  3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[18]  R C Brower,et al.  Adaptive multigrid algorithm for the lattice Wilson-Dirac operator. , 2010, Physical review letters.

[19]  Steven A. Gottlieb,et al.  Scaling lattice QCD beyond 100 GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  V. Rokhlin Rapid solution of integral equations of classical potential theory , 1985 .

[21]  Craig Pelissier,et al.  Efficient Implementation of the Overlap Operator on Multi-GPUs , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[22]  R C Brower,et al.  Adaptive multigrid algorithm for lattice QCD. , 2007, Physical review letters.

[23]  Bálint Joó,et al.  A Framework for Lattice QCD Calculations on GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[24]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[25]  Bálint Joó,et al.  Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Zoltán Fodor,et al.  Lattice QCD as a video game , 2007, Comput. Phys. Commun..

[27]  Samuel Williams,et al.  s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[28]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[29]  Pietro Rossi,et al.  Conditioning techniques for dynamical fermions , 1990 .

[30]  Kipton Barros,et al.  Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..

[31]  C. Rebbi,et al.  Multigrid solver for clover fermions , 2010, 1011.2775.