Evaluating the Arm Ecosystem for High Performance Computing

In recent years, Arm-based processors have arrived on the HPC scene, offering an alternative the existing status quo, which was largely dominated by x86 processors. In this paper, we evaluate the Arm ecosystem, both the hardware offering and the software stack that is available to users, by benchmarking a production HPC platform that uses Marvell's ThunderX2 processors. We investigate the performance of complex scientific applications across multiple nodes, and we also assess the maturity of the software stack and the ease of use from a users' perspective. This papers finds that the performance across our benchmarking applications is generally as good as, or better, than that of well-established platforms, and we can conclude from our experience that there are no major hurdles that might hinder wider adoption of this ecosystem within the HPC community.

[1]  Jack Dongarra,et al.  Introduction to the HPCChallenge Benchmark Suite , 2004 .

[2]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[3]  Robert Michael Kirby,et al.  Nektar++: An open-source spectral/hp element framework , 2015, Comput. Phys. Commun..

[4]  Adrian Jackson,et al.  Shared-memory, distributed-memory, and mixed-mode parallelisation of a CFD simulation code , 2011, Computer Science - Research and Development.

[5]  Matthias S. Müller,et al.  SPEC MPI2007—an application benchmark suite for parallel systems using MPI , 2010, Concurr. Comput. Pract. Exp..

[6]  R. Weisberg A-N-D , 2011 .

[7]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[8]  Simon McIntosh-Smith,et al.  Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard , 2018 .

[9]  J. Dongarra,et al.  HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗ , 2015 .

[10]  Adrian Jackson,et al.  Load balance and parallel I/O: Optimising COSA for large simulations , 2018, Computers & Fluids.

[11]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[12]  Simon McIntosh-Smith,et al.  A performance analysis of the first generation of HPC‐optimized Arm processors , 2019, Concurr. Comput. Pract. Exp..

[13]  Neil D. Sandham,et al.  OpenSBLI: A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures , 2016, J. Comput. Sci..

[14]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[15]  Jack J. Dongarra,et al.  High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems , 2016, Int. J. High Perform. Comput. Appl..