Seismic wave propagation simulations on low-power and performance-centric manycores

The large processing requirements of seismic wave propagation simulations make High Performance Computing (HPC) architectures a natural choice for their execution. However, to keep both the current pace of performance improvements and the power consumption under a strict power budget, HPC systems must be more energy e than ever. As a response to this need, energy-e and low-power processors began to make their way into the market. In this paper we employ a novel low-power processor, the MPPA-256 manycore, to perform seismic wave propagation simulations. It has 256 cores connected by a NoC, no cache-coherence and only a limited amount of on-chip memory. We describe how its particular architectural characteristics influenced our solution for an energy-e implementation. As a counterpoint to the low-power MPPA-256 architecture, we employ Xeon Phi, a performance-centric manycore. Although both processors share some architectural similarities, the challenges to implement an e seismic wave propagation kernel on these platforms are very di↵erent. In this work we compare the performance and energy e of our implementations for these processors to proven and optimized solutions for other hardware platforms such as general-purpose processors and a GPU. Our experimental results show that MPPA-256 has the best energy e consuming at least 77 % less energy than the other evaluated platforms, whereas the performance of our solution for the Xeon Phi is on par with a state-of-the-art solution for GPUs.

[1]  Benoît Dupont de Dinechin,et al.  A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore Processor , 2013, ICCS.

[2]  Alain J. Martin Towards an energy complexity of computation , 2001, Inf. Process. Lett..

[3]  D. Komatitsch,et al.  The spectral element method: An efficient tool to simulate the seismic response of 2D and 3D geological structures , 1998, Bulletin of the Seismological Society of America.

[4]  Henri Calandra,et al.  Fast seismic modeling and reverse time migration on a graphics processing unit cluster , 2012, Concurr. Comput. Pract. Exp..

[5]  John Lysmer,et al.  A Finite Element Method for Seismology , 1972 .

[6]  Chau-Wen Tseng,et al.  Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[7]  Jean-François Méhaut,et al.  Parallel simulations of seismic wave propagation on NUMA architectures , 2009, PARCO.

[8]  Jonathan Green,et al.  Multi-core and Network Aware MPI Topology Functions , 2011, EuroMPI.

[9]  Dirk Ribbrock,et al.  Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster , 2013, J. Comput. Phys..

[10]  Bob Edwards,et al.  Programming the Adapteva Epiphany 64-core network-on-chip coprocessor , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[11]  Jean-François Méhaut,et al.  A NUMA-Aware Runtime Environment for the Actor Model , 2013, 2013 42nd International Conference on Parallel Processing.

[12]  Yuzhong Shen,et al.  Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[13]  Samuel Williams,et al.  Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  M. Dumbser,et al.  An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes — II. The three-dimensional isotropic case , 2006 .

[15]  Guillaume Mercier,et al.  Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.

[16]  Bradley C. Kuszmaul,et al.  The pochoir stencil compiler , 2011, SPAA '11.

[17]  Simone Secchi,et al.  Efficient Sorting on the Tilera Manycore Architecture , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[18]  Josep Torrellas,et al.  Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[19]  J. Virieux P-SV wave propagation in heterogeneous media: Velocity‐stress finite‐difference method , 1986 .

[20]  Weiqiang Wang,et al.  A Multilevel Parallelization Framework for High-Order Stencil Computations , 2009, Euro-Par.

[21]  Mauro Bianco,et al.  A Generic Strategy for Multi-stage Stencils , 2014, Euro-Par.

[22]  Philippe Olivier Alexandre Navaux,et al.  On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms , 2015, J. Parallel Distributed Comput..

[23]  Paulius Micikevicius,et al.  3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[24]  Ieee Circuits,et al.  Digest of technical papers , 1984 .

[25]  Laxmikant V. Kalé,et al.  A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems , 2012, 2012 41st International Conference on Parallel Processing.

[26]  Benoît Dupont de Dinechin,et al.  Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor , 2013, ICCS.

[27]  Wei Zhang,et al.  Three-dimensional elastic wave numerical modelling in the presence of surface topography by a collocated-grid finite-difference method on curvilinear grids , 2012 .

[28]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[29]  James Reinders,et al.  High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches , 2014 .

[30]  Alex Ramírez,et al.  The low power architecture approach towards exascale computing , 2013, J. Comput. Sci..

[31]  Fabrice Dupros,et al.  On Scalability Issues of the Elastodynamics Equations on Multicore Platforms , 2013, ICCS.

[32]  Dimitri Komatitsch,et al.  Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .

[33]  F. Dupros,et al.  Finite difference simulations of seismic wave propagation for understanding earthquake physics and predicting ground motions: Advances and challenges , 2013 .

[34]  Philippe Olivier Alexandre Navaux,et al.  Energy Efficient Seismic Wave Propagation Simulation on a Low-Power Manycore Processor , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[35]  Matthias Christen,et al.  Patus for convenient high-performance stencils: Evaluation in earthquake simulations , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[36]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[37]  Luís Fabrício Wanderley Góes,et al.  PSkel: A stencil programming framework for CPU‐GPU systems , 2015, Concurr. Comput. Pract. Exp..

[38]  S. Shapiro,et al.  Modeling the propagation of elastic waves using a modified finite-difference grid , 2000 .

[39]  Bob Edwards,et al.  Programming the Adapteva Epiphany 64-Core Network-on-Chip Coprocessor , 2014, IPDPS Workshops.

[40]  Samuel Williams,et al.  Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..