Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical Kernel

In recent years, heterogeneous hardware have generalized in almost all supercomputer nodes, requiring a profound shift on the way numerical applications are implemented. This paper, illustrates the design and implementation of a seismic wave propagation simulator, based on the finite-differences numerical scheme, and specifically tailored for such massively parallel hardware infrastructures. The application data-flow is built on top of PaRSEC, a generic task-based runtime system. The numerical kernels, designed for maximizing data reuse can efficiently leverage large SIMD units available in modern CPU cores. A strong scalability study on a cluster of Intel KNL processors illustrates the application performances.

[1]  J. Virieux,et al.  Dynamic faulting studied by a finite difference method : Bull seismol soc am, V72, N2, April 1982, P345–369 , 1982 .

[2]  Robert W. Graves,et al.  Simulating seismic wave propagation in 3D elastic media using staggered-grid finite differences , 1996, Bulletin of the Seismological Society of America.

[3]  Rizos Sakellariou,et al.  Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction , 2000, LCPC.

[4]  J. Kristek,et al.  Seismic-Wave Propagation in Viscoelastic Media with Material Discontinuities: A 3D Fourth-Order Staggered-Grid Finite-Difference Modeling , 2003 .

[5]  T. Furumura,et al.  Large Scale Parallel Simulation and Visualization of 3D Seismic Wavefield \\ Using the Earth Simulator , 2004 .

[6]  Sally A. McKee,et al.  Reflections on the memory wall , 2004, CF '04.

[7]  Jean Roman,et al.  Exploiting Intensive Multithreading for the Efficient Simulation of 3D Seismic Wave Propagation , 2008, 2008 11th IEEE International Conference on Computational Science and Engineering.

[8]  Philip Ross,et al.  Why CPU Frequency Stalled , 2008, IEEE Spectrum.

[9]  Wilfried Kirschenmann,et al.  Multi-target C++ implementation of parallel skeletons , 2009 .

[10]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[11]  Eduard Ayguadé,et al.  Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..

[12]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[13]  F. Dupros,et al.  Finite difference simulations of seismic wave propagation for understanding earthquake physics and predicting ground motions: Advances and challenges , 2013 .

[14]  Dirk Ribbrock,et al.  Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster , 2013, J. Comput. Phys..

[15]  Thomas Hérault,et al.  PTG: An Abstraction for Unhindered Parallelism , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.

[16]  Ananta Tiwari,et al.  Optimizing codes on the Xeon Phi: a case-study with LAMMPS , 2015, XSEDE.

[17]  Pierre Ramet,et al.  3D Cartesian Transport Sweep for Massively Parallel Architectures with PaRSEC , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[18]  Jeffrey G. Arnold,et al.  Code modernization and modularization of APEX and SWAT watershed simulation models , 2015 .

[19]  Philippe Olivier Alexandre Navaux,et al.  Seismic wave propagation simulations on low-power and performance-centric manycores , 2016, Parallel Comput..

[20]  Michael Bader,et al.  Petascale Local Time Stepping for the ADER-DG Finite Element Method , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  Peng Wang,et al.  High-Frequency Nonlinear Earthquake Simulations on Petascale Heterogeneous Supercomputers , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Fabrice Dupros,et al.  A Multi-level Optimization Strategy to Improve the Performance of Stencil Computation , 2017, ICCS.