Peta-scale Lattice Quantum Chromodynamics on a Blue Gene/Q supercomputer

Lattice Quantum Chromodynamics (QCD) is one of the most challenging applications running on massively parallel supercomputers. To reproduce these physical phenomena on a supercomputer, a precise simulation is demanded requiring well optimized and scalable code. We have optimized lattice QCD programs on Blue Gene family supercomputers and shown the strength in lattice QCD simulation. Here we optimized on the third generation Blue Gene/Q supercomputer; i) by changing the data layout, ii) by exploiting new SIMD instruction sets, and iii) by pipelining boundary data exchange to overlap communication and calculation. The optimized lattice QCD program shows excellent weak scalability on the large scale Blue Gene/Q system, and with 16 racks we sustained 1.08 Pflop/s, 32.1% of the theoretical peak performance, including the conjugate gradient solver routines.

[1]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[2]  Ruud Haring,et al.  The Blue Gene/Q Compute chip , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[3]  Pradeep Dubey,et al.  High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Philip Heidelberger,et al.  The BlueGene/L supercomputer and quantum ChromoDynamics , 2006, SC.

[5]  P. Vranas,et al.  The BlueGene/L Supercomputer and Quantum ChromoDynamics , 2006 .

[6]  Alan Gara,et al.  QCDOC: A 10 Teraflops Computer for Tightly-Coupled Calculations , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7]  Peter A. Boyle,et al.  The BAGEL assembler generation library , 2009, Comput. Phys. Commun..

[8]  Alan Gara,et al.  Overview of the QCDSP and QCDOC computers , 2005, IBM J. Res. Dev..

[9]  S. Aoki,et al.  Two-flavor lattice QCD simulation in the epsilon-regime with exact chiral symmetry , 2007 .

[10]  T. Hatsuda,et al.  Nuclear Force from Lattice QCD , 2006, hep-lat/0610002.

[11]  Takeshi Hoshino,et al.  QCDPAX-an MIMD array of vector processors for the numerical simulation of quantum chromodynamics , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[12]  Robert D. Mawhinney The 1 Teraflops QCDSP computer , 1999, Parallel Comput..

[13]  Claude Gomez,et al.  QPACE - a QCD parallel computer based on Cell processors , 2009, ArXiv.

[14]  Jun Doi Performance evaluation and tuning of lattice QCD on the next generation Blue Gene , 2007 .

[15]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).