The Arch Project: Physics Mini-Apps for Algorithmic Exploration and Evaluating Programming Environments on HPC Architectures

The arch project is a suite of mini-apps that have been developed with consistent coding practices, under a common infrastructural layer. Great emphasis has been placed on making the applications concise and easy to manipulate, while capturing the key performance characteristics of their proxied algorithmic classes. The suite is intended for traditional exploration of performance, portability and productivity on modern HPC architectures, but also introduces the potential for focussing on those characteristics of production application stacks that are not generally exposed with isolated mini-app developments. In this paper we discuss the implementation of each of the mini-apps, and present key findings from the development and optimisation process, alongside details of important future research directions.

[1]  R. Bowers,et al.  Numerical Modeling in Applied Physics and Astrophysics , 1991 .

[2]  Ian Karlin,et al.  Memory and Parallelism Tuning Exploration using the LULESH Proxy Application , 2012 .

[3]  Matt Martineau,et al.  The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs , 2017, IWOMP.

[4]  Alistair Hart First Experiences Porting a Parallel Application to a Hybrid Supercomputer with OpenMP4.0 Device Constructs , 2015, IWOMP.

[5]  M. Berger,et al.  Analysis of Slope Limiters on Irregular Grids , 2005 .

[6]  Matt Martineau,et al.  An Evaluation of Emerging Many-Core Parallel Programming Models , 2016, PMAM@PPoPP.

[7]  Daisuke Takahashi An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors , 2009, PPAM.

[8]  Mikhail Shashkov,et al.  Formulations of Artificial Viscosity for Multi-dimensional Shock Wave Computations , 1998 .

[9]  Peter Bauer,et al.  Energy-efficient SCalable Algorithms for weather Prediction at Exascale , 2017 .

[10]  Paul N. Swarztrauber,et al.  Multiprocessor FFTs , 1987, Parallel Comput..

[11]  Kevin O'Brien,et al.  Performance analysis of OpenMP on a GPU using a CORAL proxy application , 2015, PMBS '15.

[12]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[13]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[14]  Simon McIntosh-Smith,et al.  On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures , 2014, ISC.

[15]  Ian Karlin,et al.  Poster: Memory and Parallelism Exploration Using the LULESH Proxy Application , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[16]  Stephen A. Jarvis,et al.  Towards a portable and future-proof particle-in-cell plasma physics code , 2013 .

[17]  Princeton University,et al.  BETHE-Hydro: An Arbitrary Lagrangian-Eulerian Multidimensional Hydrodynamics Code for Astrophysical Simulations , 2008, 0805.3356.

[18]  Erwin Laure,et al.  Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application , 2015, 2015 IEEE International Conference on Cluster Computing.

[19]  Simon McIntosh-Smith,et al.  The OPS Domain Specific Abstraction for Multi-block Structured Grid Computations , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.

[20]  Matt Martineau,et al.  Exploring On-Node Parallelism with Neutral, a Monte Carlo Neutral Particle Transport Mini-App , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).