Runtime mechanisms to survive new HPC architectures: A use case in human respiratory simulations

Computational fluid and particle dynamics (CFPD) simulations are of paramount importance for studying and improving drug effectiveness. Computational requirements of CFPD codes demand high-performance computing (HPC) resources. For these reasons, we introduce and evaluate in this article system software techniques for improving performance and tolerating load imbalance on a state-of-the-art production CFPD code. We demonstrate benefits of these techniques on Intel-, IBM- and Arm-based HPC technologies ranked in the Top500 supercomputers, showing the importance of using mechanisms applied at runtime to improve the performance independently of the underlying architecture. We run a real CFPD simulation of particle tracking on the human respiratory system, showing performance improvements of up to 2×, across different architectures, while applying runtime techniques and keeping constant the computational resources.

[1]  Gary H. Ganser,et al.  A rational approach to drag prediction of spherical and nonspherical particles , 1993 .

[2]  Goodarz Ahmadi,et al.  Numerical analysis of stochastic dispersion of micro-particles in turbulent flows in a realistic model of human nasal/upper airway , 2014 .

[3]  Ricard Borrell,et al.  Efficient CFD code implementation for the ARM-based Mont-Blanc architecture , 2018, Future Gener. Comput. Syst..

[4]  Raffaele Tripiccione,et al.  Energy-Performance Tradeoffs for HPC Applications on Low Power Processors , 2015, Euro-Par Workshops.

[5]  Enrico Calore,et al.  Advanced Performance Analysis of HPC Workloads on Cavium ThunderX , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[6]  Mateo Valero,et al.  ALYA: MULTIPHYSICS ENGINEERING SIMULATION TOWARDS EXASCALE , 2014 .

[7]  François Pellegrini,et al.  Parallel mesh adaptation using parallel graph partitionning , 2014 .

[8]  Raffaele Tripiccione,et al.  Massively parallel lattice-Boltzmann codes on large GPU clusters , 2016, Parallel Comput..

[9]  L. Biferalea,et al.  Lattice Boltzmann fluid-dynamics on the QPACE supercomputer , 2012 .

[10]  G. Houzeaux,et al.  A variational subgrid scale model for transient incompressible flows , 2008 .

[11]  Siegfried Höfinger,et al.  Modelling parallel overhead from simple run-time records , 2017, The Journal of Supercomputing.

[12]  Enrico Calore,et al.  Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters , 2018 .

[13]  Charbel Farhat,et al.  A general approach to nonlinear FE computations on shared-memory multiprocessors , 1989 .

[14]  Ayguade Eduard,et al.  The Mont-Blanc Prototype: An Alternative Approach for HPC Systems , 2016 .

[15]  Jesús Labarta,et al.  Framework for a productive performance optimization , 2013, Parallel Comput..

[16]  Guillaume Houzeaux,et al.  Large-scale CFD simulations of the transitional and turbulent regime for the large human airways during rapid inhalation , 2016, Comput. Biol. Medicine.

[17]  Marta Garcia-Gasulla,et al.  Computational Fluid and Particle Dynamics Simulations for Respiratory System: Runtime Optimization on an Arm Cluster , 2018, ICPP Workshops.

[18]  Ricard Borrell,et al.  High-Performance Computing: Dos and Don’ts , 2018 .

[19]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[20]  Jesús Labarta,et al.  Hints to improve automatic load balancing with LeWI for hybrid applications , 2014, J. Parallel Distributed Comput..

[21]  Laxmikant V. Kalé,et al.  Adaptive Load Balancing for MPI Programs , 2001, International Conference on Computational Science.

[22]  Á. Farkas,et al.  Simulation of bronchial mucociliary clearance of insoluble particles by computational fluid and particle dynamics methods , 2013, Inhalation toxicology.

[23]  Zelin Xu,et al.  Computational Fluid-Particle Dynamics Modeling for Unconventional Inhaled Aerosols in Human Respiratory Systems , 2016 .

[24]  Denis Navarro,et al.  Janus II: A new generation application-driven computer for spin-system simulations , 2013, Comput. Phys. Commun..

[25]  Víctor López,et al.  MPI+X: task-based parallelisation and dynamic load balance of finite element assembly , 2018, International Journal of Computational Fluid Dynamics.

[26]  Eduard Ayguadé,et al.  Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads , 2015, IWOMP.