Accelerating the Simulation of Thermal Convection in the Earth's Outer Core on Tianhe-2

Numerical simulation of thermal convection in the Earth's outer core requires extreme-scale computing due to the large temporal and spatial disparity, extreme physical parameters, rapid rotation and spherical geometry. In this work, the numerical simulation of the thermal convection in the Earth's outer core for CPU-MIC heterogeneous many-core systems is studied. Firstly, starting from a legacy parallel code based on the PETSc software package, a framework of the numerical simulation built on CPU-MIC heterogeneous many-core systems has been developed. Secondly, a sparse linear solver for CPUMIC heterogeneous many-core systems, which focuses on solving the two linear systems of the simulation, is presented and optimized. Thirdly, some computational kernels of the simulation, including sparse matrix-vector multiplication (SpMV) and polynomial preconditioner on distributed memory Xeon Phiaccelerated systems are implemented and optimized. In addition, in order to reduce the cost of data movement, we use methods to minimize the memory access, the PCI-E data transfer, and the MPI communication. Finally, some optimized measures are taken to the extended code. Experiments on Tianhe-2 Supercomputer show that as compared to the original code, our Xeon Phiaccelerated design is able to deliver 6.93x and 6.00x speedups for single MIC device and 64 MIC devices, respectively.

[1]  Chin-Teng Lin,et al.  Towards Performance-Portable, Scalable, and Convenient Linear Algebra , 2013, HotPar.

[2]  Jonathan M. Aurnou,et al.  Turbulent convection in rapidly rotating spherical shells: A model for equatorial and high latitude jets on Jupiter and Saturn , 2007 .

[3]  Jianfeng Wang,et al.  GPU Solutions to Multi-scale Problems in Science and Engineering , 2011 .

[4]  Michael Klemm,et al.  OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.

[5]  Chao Yang,et al.  Numerical Simulation of the Thermal Convection in the Earth's Outer Core , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[6]  Shengen Yan,et al.  yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP '14.

[7]  Jonathan Aurnou,et al.  Simulation of equatorial and high-latitude jets on Jupiter in a deep convection model , 2005, Nature.

[8]  Jack J. Dongarra,et al.  Improving the Performance of CA-GMRES on Multicores with Multiple GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[9]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[10]  Akira Kageyama,et al.  Geodynamo and mantle convection simulations on the Earth Simulator using the Yin-Yang grid , 2005 .

[11]  Michael A. Heroux,et al.  Tpetra, and the use of generic programming in scientific computing , 2012, Sci. Program..

[12]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[13]  Francisco F. Rivera,et al.  Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor , 2013, J. Parallel Distributed Comput..

[14]  U. R. Christensena,et al.  A numerical dynamo benchmark , 2001 .

[15]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[16]  Kadir Akbudak,et al.  Locality-Aware Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication on Many-Core Processors , 2016, IEEE Transactions on Parallel and Distributed Systems.

[17]  Matthew G. Knepley,et al.  Preliminary Implementation of PETSc Using GPUs , 2013 .

[18]  Xinhao Liao,et al.  Modelling the core convection using finite element and finite difference methods , 2006 .

[19]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[20]  Gerhard Wellein,et al.  A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..

[21]  Hiroshi Okuda,et al.  Thermal convection analysis in a rotating shell by a parallel finite‐element method—development of a thermal‐hydraulic subsystem of GeoFEM , 2002, Concurr. Comput. Pract. Exp..

[22]  Brian Vinter,et al.  CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.