From MPI to MPI+OpenACC: Conversion of a legacy FORTRAN PCG solver for the spherical Laplace equation

A real-world example of adding OpenACC to a legacy MPI FORTRAN Preconditioned Conjugate Gradient code is described, and timing results for multi-node multi-GPU runs are shown. The code is used to obtain three-dimensional spherical solutions to the Laplace equation. Its application is finding potential field solutions of the solar corona, a useful tool in space weather modeling. We highlight key tips, strategies, and challenges faced when adding OpenACC, including linking FORTRAN code to the cuSparse library, using CUDA-aware MPI, maintaining portability, and dealing with multi-node, multi-GPU run-time environments. Timing results are shown for the code running with MPI-only (up to 1728 CPU cores) and with MPI+OpenACC (up to 64 NVIDIA P100 GPUs). Performance portability is also addressed, including results using MPI+OpenACC for multi-core x86 CPUs.

[1]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[2]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[3]  Fernando Gustavo Tinetti,et al.  Using OpenMP: Portable Shared Memory Parallel Programming , 2010 .

[4]  Pascal Saint-Hilaire,et al.  Probing the Solar Magnetic Field with a Sun-Grazing Comet , 2013, Science.

[5]  Bart van der Holst,et al.  OBTAINING POTENTIAL FIELD SOLUTIONS WITH SPHERICAL HARMONICS AND FINITE DIFFERENCES , 2011, 1104.5672.

[6]  David Kaeli,et al.  Heterogeneous Computing with OpenCL , 2011 .

[7]  Roberto Lionello,et al.  Advancing parabolic operators in thermodynamic MHD models: Explicit super time-stepping versus implicit schemes with Krylov solvers , 2016, ArXiv.

[8]  K. Schatten Prediction of the Coronal Structure for the Solar Eclipse of March 7, 1970 , 1970, Nature.

[9]  Y. Saad,et al.  Experimental study of ILU preconditioners for indefinite matrices , 1997 .

[10]  Y.-M. Wang,et al.  Solar Implications of Ulysses Interplanetary Field Measurements , 1995 .

[11]  Hong Zhang,et al.  Sparse triangular solves for ILU revisited: data layout crucial to better performance , 2011, Int. J. High Perform. Comput. Appl..

[12]  D. Odstrcil,et al.  Improved Method for Specifying Solar Wind Speed Near the Sun , 2003 .

[13]  Jeff Larkin,et al.  Parallel programming with OpenACC , 2017 .

[14]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[15]  J. Linker,et al.  MULTISPECTRAL EMISSION OF THE SUN DURING THE FIRST WHOLE SUN MONTH: MAGNETOHYDRODYNAMIC SIMULATIONS , 2008 .

[16]  David Kaeli,et al.  Heterogeneous Computing with OpenCL 2.0 , 2015 .

[17]  N. Pogorelov,et al.  An Empirically Driven Time-Dependent Model of the Solar Wind , 2016 .

[18]  K. Schatten Prediction of the Coronal Structure for the Solar Eclipse of September 22, 1968 , 1968, Nature.

[19]  Giuseppe Gambolati,et al.  Is a simple diagonal scaling the best preconditioner for conjugate gradients on supercomputers , 1990 .

[20]  Sunita Chandrasekaran,et al.  OpenACC for Programmers: Concepts and Strategies , 2017 .

[21]  M. Benzi Preconditioning techniques for large linear systems: a survey , 2002 .

[22]  K. Rinzema Playing with nonuniform grids , 2018 .

[23]  J. Linker,et al.  2010 AUGUST 1–2 SYMPATHETIC ERUPTIONS. I. MAGNETIC TOPOLOGY OF THE SOURCE-SURFACE BACKGROUND FIELD , 2012, 1209.5797.