论文信息 - Seismic wave propagation simulations on low-power and performance-centric manycores

Seismic wave propagation simulations on low-power and performance-centric manycores

The large processing requirements of seismic wave propagation simulations make High Performance Computing (HPC) architectures a natural choice for their execution. However, to keep both the current pace of performance improvements and the power consumption under a strict power budget, HPC systems must be more energy e than ever. As a response to this need, energy-e and low-power processors began to make their way into the market. In this paper we employ a novel low-power processor, the MPPA-256 manycore, to perform seismic wave propagation simulations. It has 256 cores connected by a NoC, no cache-coherence and only a limited amount of on-chip memory. We describe how its particular architectural characteristics influenced our solution for an energy-e implementation. As a counterpoint to the low-power MPPA-256 architecture, we employ Xeon Phi, a performance-centric manycore. Although both processors share some architectural similarities, the challenges to implement an e seismic wave propagation kernel on these platforms are very di↵erent. In this work we compare the performance and energy e of our implementations for these processors to proven and optimized solutions for other hardware platforms such as general-purpose processors and a GPU. Our experimental results show that MPPA-256 has the best energy e consuming at least 77 % less energy than the other evaluated platforms, whereas the performance of our solution for the Xeon Phi is on par with a state-of-the-art solution for GPUs.

[1] Benoît Dupont de Dinechin,et al. A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore Processor , 2013, ICCS.

[2] Alain J. Martin. Towards an energy complexity of computation , 2001, Inf. Process. Lett..

[3] D. Komatitsch,et al. The spectral element method: An efficient tool to simulate the seismic response of 2D and 3D geological structures , 1998, Bulletin of the Seismological Society of America.

[4] Henri Calandra,et al. Fast seismic modeling and reverse time migration on a graphics processing unit cluster , 2012, Concurr. Comput. Pract. Exp..

[5] John Lysmer,et al. A Finite Element Method for Seismology , 1972 .

[6] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[7] Jean-François Méhaut,et al. Parallel simulations of seismic wave propagation on NUMA architectures , 2009, PARCO.

[8] Jonathan Green,et al. Multi-core and Network Aware MPI Topology Functions , 2011, EuroMPI.

[9] Dirk Ribbrock,et al. Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster , 2013, J. Comput. Phys..

[10] Bob Edwards,et al. Programming the Adapteva Epiphany 64-core network-on-chip coprocessor , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[11] Jean-François Méhaut,et al. A NUMA-Aware Runtime Environment for the Actor Model , 2013, 2013 42nd International Conference on Parallel Processing.

[12] Yuzhong Shen,et al. Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[13] Samuel Williams,et al. Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14] M. Dumbser,et al. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes — II. The three-dimensional isotropic case , 2006 .

[15] Guillaume Mercier,et al. Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.

[16] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.

[17] Simone Secchi,et al. Efficient Sorting on the Tilera Manycore Architecture , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[18] Josep Torrellas,et al. Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[19] J. Virieux. P-SV wave propagation in heterogeneous media: Velocity‐stress finite‐difference method , 1986 .

[20] Weiqiang Wang,et al. A Multilevel Parallelization Framework for High-Order Stencil Computations , 2009, Euro-Par.

[21] Mauro Bianco,et al. A Generic Strategy for Multi-stage Stencils , 2014, Euro-Par.

[22] Philippe Olivier Alexandre Navaux,et al. On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms , 2015, J. Parallel Distributed Comput..

[23] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[24] Ieee Circuits,et al. Digest of technical papers , 1984 .

[25] Laxmikant V. Kalé,et al. A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems , 2012, 2012 41st International Conference on Parallel Processing.

[26] Benoît Dupont de Dinechin,et al. Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor , 2013, ICCS.

[27] Wei Zhang,et al. Three-dimensional elastic wave numerical modelling in the presence of surface topography by a collocated-grid finite-difference method on curvilinear grids , 2012 .

[28] Hermann Härtig,et al. Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[29] James Reinders,et al. High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches , 2014 .

[30] Alex Ramírez,et al. The low power architecture approach towards exascale computing , 2013, J. Comput. Sci..

[31] Fabrice Dupros,et al. On Scalability Issues of the Elastodynamics Equations on Multicore Platforms , 2013, ICCS.

[32] Dimitri Komatitsch,et al. Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .

[33] F. Dupros,et al. Finite difference simulations of seismic wave propagation for understanding earthquake physics and predicting ground motions: Advances and challenges , 2013 .

[34] Philippe Olivier Alexandre Navaux,et al. Energy Efficient Seismic Wave Propagation Simulation on a Low-Power Manycore Processor , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[35] Matthias Christen,et al. Patus for convenient high-performance stencils: Evaluation in earthquake simulations , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[36] M. Horowitz,et al. Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[37] Luís Fabrício Wanderley Góes,et al. PSkel: A stencil programming framework for CPU‐GPU systems , 2015, Concurr. Comput. Pract. Exp..

[38] S. Shapiro,et al. Modeling the propagation of elastic waves using a modified finite-difference grid , 2000 .

[39] Bob Edwards,et al. Programming the Adapteva Epiphany 64-Core Network-on-Chip Coprocessor , 2014, IPDPS Workshops.

[40] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..