Parallel-algorithm extension for tsunami and earthquake-cycle simulators for massively parallel execution on the K computer

This article presents a case study on the extension of parallel algorithms in tsunami and earthquake-cycle simulators for massively parallel execution on the K computer. We use two target applications: a tsunami-simulation program, “JAGURS,” and an earthquake-cycle program, “RSGDX.” Our optimization strategy for collective communication is to split the Message Passing Interface (MPI) communicator and perform multistage localized communication to minimize the communication frequency, transferred data size, and network congestion. Moreover, in the case of severe load imbalances, we apply cyclic distribution and extend the axes for parallelization. For each application, we conduct a performance evaluation with massively parallel execution on the K computer. It is shown that our optimized code enables JAGURS to attain a 21.8× speedup for collective communication and a 7.9× speedup for the time-step loop on 8748 nodes (69,984 cores). RSGDX attains a 4.25× speedup for collective communication and an 18.7× speedup for the time-step loop on 8192 nodes (65,536 cores).

[1]  Takane Hori,et al.  A numerical simulation of earthquake cycles along the Nankai Trough in southwest Japan: lateral variation in frictional property due to the slab geometry controls the nucleation position , 2004 .

[2]  Daisuke Matsuoka,et al.  Large-scale, high-speed tsunami prediction for the Great Nankai Trough Earthquake on the K computer , 2016, Int. J. High Perform. Comput. Appl..

[3]  Taisuke Boku,et al.  First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Daisuke Matsuoka,et al.  Parallel Implementation of Dispersive Tsunami Wave Modeling with a Nesting Algorithm for the 2011 Tohoku Tsunami , 2015, Pure and Applied Geophysics.

[5]  Toshiyuki Shimizu,et al.  Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers , 2009, Computer.

[6]  Narumi Takahashi,et al.  Tsunami Inundation Modeling of the 2011 Tohoku Earthquake Using Three-Dimensional Building Data for Sendai, Miyagi Prefecture, Japan , 2014 .

[7]  Taisuke Boku,et al.  Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer , 2014, Int. J. High Perform. Comput. Appl..

[8]  Junichiro Makino,et al.  4.45 Pflops astrophysical N-body simulation on K computer -- The gravitational trillion-body problem , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  M. Ohtani,et al.  Fast Computation of Quasi-Dynamic Earthquake Cycle Simulation with Hierarchical Matrices , 2011, ICCS.

[10]  Fumiyoshi Shoji,et al.  The K computer: Japanese next-generation supercomputer development project , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[11]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[12]  Kenichi Miura,et al.  The design of ultra scalable MPI collective communication on the K computer , 2012, Computer Science - Research and Development.

[13]  Yasuhiro Idomura,et al.  Computation-Communication Overlap Techniques for Parallel Spectral Calculations in Gyrokinetic Vlasov Simulations , 2013 .