Multithreaded global address space communication techniques for Gyrokinetic fusion applications on ultra-scale platforms

We present novel parallel language constructs for the communication intensive part of a magnetic fusion simulation code. The focus of this work is the shift phase of charged particles of a tokamak simulation code in toroidal geometry. We introduce new hybrid PGAS/OpenMP implementations of highly optimized hybrid MPI/OpenMP based communication kernels. The hybrid PGAS implementations use an extension of standard hybrid programming techniques, enabling the distribution of high communication work loads of the underlying kernel among OpenMP threads. Building upon lightweight one-sided CAF (Fortran 2008) communication techniques, we also show the benefits of spreading out the communication over a longer period of time, resulting in a reduction of bandwidth requirements and a more sustained communication and computation overlap. Experiments on up to 130560 processors are conducted on the NERSC Hopper system, which is currently the largest HPC platform with hardware support for one-sided communication and show performance improvements of 52% at highest concurrency.

[1]  Leonid Oliker,et al.  Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas , 2008, IBM J. Res. Dev..

[2]  John M. Mellor-Crummey,et al.  Experiences with Sweep3D implementations in Co-array Fortran , 2006, The Journal of Supercomputing.

[3]  John Shalf,et al.  Advanced Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms , 2011 .

[4]  Richard Barrett Co-Array Fortran Experiences with Finite Differencing Methods , 2006 .

[5]  Vickie E. Lynch,et al.  Full torus Landau fluid calculations of ion temperature gradient driven turbulence , 1997 .

[6]  Weixing Wang,et al.  Overlapping communication with computation using OpenMP tasks on the GTS magnetic fusion code , 2010, Sci. Program..

[7]  John K. Reid Co-array Fortran for Full and Sparse Matrices , 2002, PARA.

[8]  Jason Duell,et al.  Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations , 2004, Int. J. High Perform. Comput. Netw..

[9]  Katherine A. Yelick,et al.  Optimizing bandwidth limited problems using one-sided communication and overlap , 2005, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  J. Manickam,et al.  Gyro-kinetic simulation of global turbulent transport properties in tokamak experiments , 2006 .

[11]  Piotr Bala,et al.  Application of Pfortran and Co-Array Fortran in the parallelization of the GROMOS96 molecular dynamics module , 2001, Sci. Program..

[12]  Robert W. Numrich,et al.  Writing a Multigrid Solver Using Co-array Fortran , 1998, PARA.

[13]  John M. Mellor-Crummey,et al.  Co-array Fortran Performance and Potential: An NPB Experimental Study , 2003, LCPC.

[14]  William N. Scherer,et al.  A new vision for coarray Fortran , 2009, PGAS '09.

[15]  J. Mellor-Crummey,et al.  A multi-platform co-array Fortran compiler , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[16]  Robert W. Numrich,et al.  Parallel numerical algorithms based on tensor notation and Co-Array Fortran syntax , 2005, Parallel Comput..

[17]  Dan Bonachea GASNet Specification, v1.1 , 2002 .