Performance Evaluation of Stencil Computations Based on Source-to-Source Transformations

Stencil computations are commons in High Performance Computing (HPC) applications, they consist in a pattern that replicates the same calculation in a data domain. The Finite-Difference Method is an example of stencil computations and it is used to solve real problems in diverse areas related to Partial Differential Equations (electromagnetics, fluid dynamics, geophysics, etc.). Although a large body of literature on optimization of this class of applications is available, the performance evaluation and its optimization on different HPC architectures remain a challenge. In this work, we implemented the 7-point Jacobian stencil in a Source-to-Source Transformation Framework (BOAST) to evaluate the performance of different HPC architectures. Achieved results present that the same source code can be executed on current architectures with a performance improvement, and it helps the programmer to develop the applications without dependence on hardware features.

[1]  John D. Pryce,et al.  Jacobian code generated by source transformation and vertex elimination can be as efficient as hand-coding , 2004, TOMS.

[2]  Oscar Botero,et al.  Using Source-to-Source Transformation Tools to Provide Distributed Parallel Applications from OpenMP Source Code , 2008, 2008 International Symposium on Parallel and Distributed Computing.

[3]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[4]  Mohamed Wahib,et al.  Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications , 2015, HPDC.

[5]  Samuel Williams,et al.  Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..

[6]  Vania Marangozova-Martin,et al.  BOAST: Bringing Optimization through Automatic Source-to-Source Transformations , 2013, 2013 IEEE 7th International Symposium on Embedded Multicore Socs.

[7]  David B. Loveman,et al.  Program Improvement by Source-to-Source Transformation , 1977, J. ACM.

[8]  Philippe Olivier Alexandre Navaux,et al.  Performance Improvement of Stencil Computations for Multi-core Architectures based on Machine Learning , 2017, ICCS.

[9]  M. Valero,et al.  An overview of selected hybrid and reconfigurable architectures , 2012, 2012 IEEE International Conference on Industrial Technology.

[10]  Fabrice Dupros,et al.  On Scalability Issues of the Elastodynamics Equations on Multicore Platforms , 2013, ICCS.

[11]  Bradley C. Kuszmaul,et al.  The pochoir stencil compiler , 2011, SPAA '11.

[12]  Zhen Li,et al.  Dependence-Based Code Transformation for Coarse-Grained Parallelism , 2015, COSMIC@CGO.

[13]  Michaël Krajecki,et al.  Source-to-Source Code Translator: OpenMP C to CUDA , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[14]  Michael Bader,et al.  Petascale Local Time Stepping for the ADER-DG Finite Element Method , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[15]  Helmar Burkhart,et al.  Automatic code generation and tuning for stencil kernels on modern shared memory architectures , 2011, Computer Science - Research and Development.

[16]  Fabrice Dupros,et al.  Communication-Avoiding Seismic Numerical Kernels on Multicore Processors , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[17]  Volker Kuttruff,et al.  Source-to-Source Transformation in the Large , 2003, JMLC.

[18]  Vincent Heuveline,et al.  A Survey on Hardware-aware and Heterogeneous Computing on Multicore Processors and Accelerators , 2009 .

[19]  N. Radhika,et al.  Understanding source-to-source transformations for frequent porting of applications on changing cloud architectures , 2014, 2014 International Conference on Parallel, Distributed and Grid Computing.

[20]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[21]  Pradeep Dubey,et al.  3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Michael Gerndt,et al.  An architecture for flexible auto-tuning: The Periscope Tuning Framework 2.0 , 2016, 2016 2nd International Conference on Green High Performance Computing (ICGHPC).

[23]  P. Moczo,et al.  The finite-difference time-domain method for modeling of seismic wave propagation , 2007 .

[24]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[25]  Jean-François Méhaut,et al.  BOAST: A metaprogramming framework to produce portable and efficient computing kernels for HPC applications , 2018, Int. J. High Perform. Comput. Appl..

[26]  Samuel Williams,et al.  Auto-Tuning Stencil Computations on Multicore and Accelerators , 2010, Scientific Computing with Multicore and Accelerators.