An efficient CELL library for lattice quantum chromodynamics

Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at modeling the strong nuclear force, which is responsible for the interactions of nuclear particles. Numerical QCD studies are performed through a discrete formalism called LQCD (Lattice Quantum Chromodynamics). Typical simulations involve very large volume of data and numerically sensitive entities, thus the crucial need of high performance computing systems. We propose a set of CELL-accelerated routines for basic LQCD calculations. Our framework is provided as a unified library and is particularly optimized for an iterative use. Each routine is parallelized among the SPUs, and each SPU achieves it task by looping on small chunk of arrays from the main memory. Our SPU implementation is vectorized with double precision data, and the cooperation with the PPU shows a good overlap between data transfers and computations. Moreover, we permanently keep the SPU context and use mailboxes to synchronize between consecutive calls. We validate our library by using it to derive a CELL version of an existing LQCD package (tmLQCD). Experimental results on individual routines show a significant speedup compare to standard processor, 11 times better than a 2.83 GHz INTEL processor for instance (without SSE). This ratio is around 9 (with QS22 blade) when consider a more cooperative context like solving a linear system of equations (usually referred as Wislon-Dirac inversion). Our results clearly demonstrate that the CELL is a very promising way for high-scale LQCD simulations.

[1]  Karl Jansen,et al.  tmLQCD: A program suite to simulate Wilson twisted mass lattice QCD , 2009, Comput. Phys. Commun..

[2]  Frank Wilczek,et al.  What QCD Tells Us About Nature -- and Why We Should Listen , 2000 .

[3]  Philip Heidelberger,et al.  Massively parallel quantum chromodynamics , 2008, IBM J. Res. Dev..

[4]  Karl Jansen,et al.  HMC algorithm with multiple time scale integration and mass preconditioning , 2006, Comput. Phys. Commun..

[5]  Jack J. Dongarra,et al.  QR factorization for the Cell Broadband Engine , 2009, Sci. Program..

[6]  Khaled Z. Ibrahim,et al.  Implementing Wilson-Dirac operator on the cell broadband engine , 2008, ICS '08.

[7]  Kipton Barros,et al.  Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..

[8]  Atsushi Nakamura,et al.  Development of QCD code on a CELL Machine , 2007 .

[9]  Jack J. Dongarra,et al.  Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008, IEEE Transactions on Parallel and Distributed Systems.

[10]  Samuel Williams,et al.  Scientific Computing Kernels on the Cell Processor , 2007, International Journal of Parallel Programming.

[11]  Claude Tadonki,et al.  Parallel Multiplication of a Vector by a Kronecker Product of Matrices , 1999, Scalable Comput. Pract. Exp..

[12]  Jack Dongarra,et al.  QR factorization for the Cell Broadband Engine , 2009, HiPC 2009.

[13]  N. Eicker,et al.  QCD on the Cell Broadband Engine , 2007 .

[14]  Claude Tadonki,et al.  Parallel Multiplication of a Vector by a Kronecker Product of Matrices (Part II) , 2000, Scalable Comput. Pract. Exp..